Proteins are macromolecules, constituted of long chains of amino acid residues of varying lengths inferred from the corresponding nucleotide sequences of their genes. Proteins are the building block of our body and they are involved in a wide range of biological functions within organisms, that include DNA replication, catalysis of metabolic reactions, response to stimuli, interaction with other biomolecules for pathway regulation, stability, transport, localization or degradation.
## Protein databases
A biological database is an organized collection of a particular type of datasets compiled from a large number of scientifc publications and discoveries, for example, biological sequences or different -omics (transcriptomics, proteomics, metagenomics) data, specific type of annotations, structural data, chemical compounds, biological pathways etc.
The Protein databases contain entries for each protein sequence from all the known proteome sets. There are few well known protein databases like the National Center for Biotechnology Information Reference Sequence project, UniProtKB/SWISS-Prot and the DNA Databank of Japan Amino Acid Sequence Database.
Protein records are available mainly in text formats that include sequence entries as FASTA and their corresponding annotations in XML formats. The protein entries are generally linked to external resources, allowing users to find relevant data such as literature (Pubmed), genes (NCBI, GenBank database), biological pathways (KEGG database), structures (PDB database), corresponding DNA/RNA sequences, sequence homologs, and expression and variation data.
## Hands-on sessions on protein databases
#### 1. [National Center for Biotechnology Information - NCBI](https://www.ncbi.nlm.nih.gov/)
The NCBI interface provides aceess to several journals and bioinfomatics resources.
In this course, we will use several protein related resources of NCBI.
###### Example proteins:
***Tumor protein P53**: a tumor suppressor protein in human, the absence of which allows many cancers to proliferate.
###### Search method:
* Text/term search in [All fields] (simply type in your query)
* Limiting the search using [filters]
- Organism [ORGN]
- Source database
- Genetic component
- Bio-chemical/physical properties etc.
* Combining multiple search criteria by boolean AND, OR, NOT