Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • mainar/protein-bioinformatics-nov-2016
  • makumar/protein-bioinformatics-nov-2016
  • lang/protein-bioinformatics-nov-2016
  • sharan/protein-bioinformatics-embl-hd
4 results
Show changes
Showing
with 47782 additions and 0 deletions
This diff is collapsed.
# Multiple sequence alignment
A multiple sequence alignment (MSA) is a method for the comparison of three or more biological sequences (protein, DNA, or RNA) by aligning them against each other. In practice, these query sequences would share an evolutionary relationship (common ancestor). With MSA the distances and similarities between the sequences can be inferred, which facilitates the analysis of phylogenetic association such as evolutionary origins.
A MSA allows to visualize the conserved locations in the sequences that hold the functional relevance across species as well as mutation events (that appear as hyphens in one or more of the sequences in the alignment) such as insertion, deletion mutations or sunstitutions to allow calculation the rate of evolution.
MSA is used to define a protein family by assessing sequence conservation of protein domains, tertiary and secondary structures.
[PDF slides](https://git.embl.de/sharan/protein-bioinformatics-nov-2016/blob/master/TeachingMaterials/Multiple_Sequence_Alignment_slides.pdf)
[external slide with comprehensive details on algorithm](http://player.slideplayer.com/17/5286187/#)
## Hands-on session on [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/) for multiple sequence alignment
Clustal omega is the current version of the MSA tools from clustal series. It uses progressive alignment heuristic to build a final MSA, beginning with the most similar pair and progressing to the most distantly related.
The progressive alignment combines all the pairwise alignments in two stages: a first stage in which the relationships between the sequences are represented as a tree (clustering), called a guide tree, and a second step in which the MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree.
**Availability:**
- Clustal Omega can be used via the web interface available at http://www.ebi.ac.uk/Tools/msa/clustalo/.
**Input:**
- It requires protein accession IDs or protein seqences in FASTA format.
[Frequently asked questions](http://www.ebi.ac.uk/Tools/msa/clustalo/help/faq.html#1)
`What substitution matrix/default parameters are used by Clustal Omega?
Clustal Omega uses the HHalign algorithm and its default settings as its core alignment engine. The algorithm is described in Söding, J. (2005) 'Protein homology detection by HMM–HMM comparison'. Bioinformatics 21, 951-960.
The default transition matrix is Gonnet, gap opening penalty is 6 bits, gap extension is 1 bit.`
HHalign:
HHalign compares two alignments with each other by pairwise alignment of HMMs. It shows the optimal alignment and all significant non-overlapping suboptimal alignments. It also generates a dotplot for which the profile-profile column score is averaged over a window of variable size. If only one alignment is entered, this is compared to itself. Used in this way, HHalign is a very sensitive repeat-identification tool.
### Examples:
To extract examples, we will review our first session of NCBI using following instructions:
1. Search for P53 proteins in NCBI
2. Select P53 protein from *Mus muscuslus*
3. Run BLAST on this sequence to identify its homologs
4. Randomly select 10 hits (avoid multiple sequences from same species)
5. View GenPept report, and view the summary (top left) as FASTA (text)
These sequences will be the set of queries for your MSA
### Using Clustal Omega
1. Select all the query sequences (Optionally: you can edit the FASTA header by keeping only species name)
2. Go to Clustal Omega web form, ad paste your query sequences
3. Choose output format as 'clustal w/ numbers'
4. Submit you query
5. Browse your output result
* Show colors
* Phylogenetic tree
* Summary: Percent Identity Matrix
## Optional exercise: COBALT (NCBI)
COBALT in a tool for multiple sequence alignment, integrated in the NCBI resource for sequence analysis. It alignes sequences by conserved proteins domains and local similarities of the sequences.
1. Go back to your NCBI page of P53 BLAST result
* Click on multiple alignment
* Browse the result: phylogenetic tree
2. Randomly select few sequences, go to the GenPept page
* In the 'Analyse these sequences', select the option 'Align sequences with COBALT'
* Browse your output result: Phylogenetic tree
## List of few other tools for MSA
1. [T-Coffee](http://www.tcoffee.org/)
2. [UGENE](http://ugene.net/)
3. [Phylo: interactive video game](http://phylo.cs.mcgill.ca/)
4. [MUSCLE](http://www.drive5.com/muscle/)
5. [MAFFT](http://mafft.cbrc.jp/alignment/software/)
6. [MAVID](http://baboon.math.berkeley.edu/mavid/)
## MSA and MSA related tools on EBI-EMBL
Link: http://www.ebi.ac.uk/Tools/msa/
Link to the original file: https://piratenpad.de/p/pdz_plan
## PDZ meeting
#### Trainers: Toby Gibson, Matt Rogon (rogon@embl.de), Marc Gouw, Jelena Calyseva, Malvika Sharan
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
### Add your comments here:
- Comments by the Participants:
- I really like the cytoscape course but due to time constrains I couldn`t able to take full advantage of the course. I am quite sure that I will be using it in near future. So it would be very nice of you if you can share some basic as well as advance course material on cytoscape.
- Response by the tutors:
- Please see the files here for CytoScape tutorials: https://git.embl.de/rogon/introduction_to_cytoscape
- Comments by the Participants:
- If you have some courses in Kegg and Reactome patheway analysis, then that would be very useful to me.
- Response by the tutors:
- Please see the docs here for Reactome and other related materials: https://git.embl.de/rogon/Monterotondo_Module_2/tree/master/Practical/3.%20ReactomeFI
#### Advanced bioinformatic courses: I can direct you to different learning resources for this
- Comments by the Participants:
- I wish to learn R course in details, so if you have some basic and advance course material then please share it.
- Response by the tutors:
- http://rstatisticsguide.com/the-best-free-courses-to-learn-r/
- https://www.coursera.org/learn/r-programming
- Comments by the Participants:
- Analysis of High-Throughput Sequencing Data and RNA seq analysis
- Response by the tutors:
- https://www.ebi.ac.uk/training/online/course/embo-practical-course-analysis-high-throughput-seq
- https://www.ebi.ac.uk/training/events/2017/analysis-high-throughput-sequencing-data-0
#### This is not in the scope of my expertise but I can look around for some materials
- Comments by the Participants:
- can you post some reading material for phosphositeplus?
- Response by the tutors:
- We do not have a self developed tutorial for this.
- https://www.youtube.com/watch?v=lJ1BxYTAqzQ
- https://www.phosphosite.org/homeAction.action
- Comments by the Participants:
- Some course materials for Drug repurposing
- Response by the tutors:
- We do not have a self developed tutorial for this.
#### Online materials for different biological themes:
- https://fairsharing.org/
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Tools for 3D structure prediction:
-- Phyre2 (personal preference)
-- other tools: https://molbiol-tools.ca/Protein_tertiary_structure.htm
- Phospho site prediction:
-- NetPhos: http://www.cbs.dtu.dk/services/NetPhos/
### Notes during the course:
This might lose its format, please see th priginal piratenpad
#### Unix Materials
Material: https://github.com/malvikasharan/SWC_reference_material/blob/master/Unix_Shell/Unix_Shell.md
Origin of Species (example document): https://github.com/malvikasharan/SWC_reference_material/blob/master/Unix_Shell/origin_of_species.txt
#### Shared by Malvika
Blast result: http://www.uniprot.org/blast/uniprot/B20170926A7434721E10EE6586998A056CCD0537E0410ABM
clustal omega result: http://www.ebi.ac.uk/Tools/services/web/toolresult.ebi?jobId=clustalo-I20170926-155909-0079-57062684-pg
emboss server: http://emboss.bioinformatics.nl/
SOS_human Interpro result:
https://www.ebi.ac.uk/interpro/sequencesearch/iprscan5-S20170927-140201-0896-11166200-p2m
Domains and disorders:
IUPred guide: http://iupred.enzim.hu/Help.php
Anchor guide: http://anchor.enzim.hu/Help.php
TMHMM guide: http://www.cbs.dtu.dk/services/TMHMM/TMHMM2.0b.guide.php
Exercises:
https://docs.google.com/document/d/1S_gMSr3P6C_qk9lQ7IH0F04E4n7iFLXsoOmpzJ45Xpk/edit#heading=h.1ponmc33dg79
#### Shared by Marc
ELM practical material: https://docs.google.com/document/d/11T4wjfB0mOA6JTJSKKCIvSqmXLWhfjeXvLWnGV881gU/edit?usp=sharing
#### From Lena
- http://www.ebi.ac.uk/Tools/services/web/toolresult.ebi?jobId=clustalo-I20170927-134403-0253-4849516-pg
>DLG4
RIVIHRGSTGLGFNIVGGEDGEGIFISFILAGGPADLSGELRKGDQILSVNGVDLRNASH
EQAAIALKNAGQTVTIIAQYK
>SHANK1
TVLLQKKDSEGFGFVLRGAKAQTPIEEFTPTPAFPALQYLESVDEGGVAWRAGLRMGDFL
IEVNGQNVVKVGHRQVVNMIRQGGNTLMVKVVMVT
- Jalview tutorials:
- https://docs.google.com/document/d/1ceyNSXCpytsG0Bih-sRIOKnf2ZRNxJTOp9jCzKn6lOY/edit#heading=h.p0bkrpqyf7n3
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#### Computer requirements:
Marc Gouw, Malvika Sharan, Jelena Calyseva (unix and basic protein bioinformatics):
Git bash: https://git-for-windows.github.io/
Toby Gibson (Jalview and Chimera):
Java-8: http://www.oracle.com/technetwork/java/javase/8-whats-new-2157071.html
Jalview: http://www.jalview.org/download
Day - 3
Matt Rogon (Cytoscape)
Java-8: http://www.oracle.com/technetwork/java/javase/8-whats-new-2157071.html
Cytoscape3.5.x (not 3.6): http://www.cytoscape.org/download.php
Toby
Chimera-2 (not x): https://www.cgl.ucsf.edu/chimera/download.html
Jelena Calyseva:
No specific
Michael Kuhn (String)
No specific requirement
Other important links:
Program: https://docs.google.com/document/d/1pfJiuC3m3WYMshiX0m6vqvVJwxhMqIO_3Ny9iK6pvew/edit?usp=sharing
Please add your preferences for the core facility tour (October 2nd):
https://docs.google.com/spreadsheets/d/1LEog9biDugmybsvNoNXwwQ8-7ndXPl1n34U2hV_PhwM/edit?usp=drive_web
Dinner preferences: https://docs.google.com/spreadsheets/d/1zpV-mU59fQ-kNpreph48ogQdOm29MyrgGQan_g3N0B0/edit
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Prerequisites:
Materials: The teaching materials will be shared on the day of workshops and will be made available online.
Reading recommendations for pre-workshop preparation:
Multiple sequence alignment:
Jalview 2:
https://www.ncbi.nlm.nih.gov/pubmed/19151095
MSA theory
http://www.sciencedirect.com/science/article/pii/0022283686902524?via%3Dihub
Protein-protein interaction:
String database:
https://www.ncbi.nlm.nih.gov/pubmed/27924014
Short linear motifs:
Eukaryotic Linear Motif database
https://www.ncbi.nlm.nih.gov/pubmed/26615199
Optional materials:
Reviews on PDZ:
PDZ domains and their binding partners: structure, specificity, and modification
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2891790/
A structural portrait of the PDZ domain family.
https://www.ncbi.nlm.nih.gov/pubmed/25158098 (abstract)
http://www.sciencedirect.com/science/article/pii/S0022283614004318?via%3Dihub (full)
Protein related papers:
sequence to structure and function--current status.
https://www.ncbi.nlm.nih.gov/pubmed/20887265 (Abstract)
http://arep.med.harvard.edu/johnson/predict/protein_main.html
http://www.biochem.ucl.ac.uk/bsm/dbbrowser/jj/
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# Proteins
## Introduction
Proteins are macromolecules, constituted of long chains of amino acid residues of varying lengths inferred from the corresponding nucleotide sequences of their genes. Proteins are the building block of our body and they are involved in a wide range of biological functions within organisms, that include DNA replication, catalysis of metabolic reactions, response to stimuli, interaction with other biomolecules for pathway regulation, stability, transport, localization or degradation.
## Protein databases
A biological database is an organized collection of a particular type of datasets compiled from a large number of scientifc publications and discoveries, for example, biological sequences or different -omics (transcriptomics, proteomics, metagenomics) data, specific type of annotations, structural data, chemical compounds, biological pathways etc.
The Protein databases contain entries for each protein sequence from all the known proteome sets. There are few well known protein databases like the National Center for Biotechnology Information Reference Sequence project, UniProtKB/SWISS-Prot and the DNA Databank of Japan Amino Acid Sequence Database.
Protein records are available mainly in text formats that include sequence entries as FASTA and their corresponding annotations in XML formats. The protein entries are generally linked to external resources, allowing users to find relevant data such as literature (Pubmed), genes (NCBI, GenBank database), biological pathways (KEGG database), structures (PDB database), corresponding DNA/RNA sequences, sequence homologs, and expression and variation data.
## Hands-on sessions on protein databases
#### 1. [National Center for Biotechnology Information - NCBI](https://www.ncbi.nlm.nih.gov/)
The NCBI interface provides aceess to several journals and bioinfomatics resources.
In this course, we will use several protein related resources of NCBI.
###### Example proteins:
* **Tumor protein P53**: a tumor suppressor protein in human, the absence of which allows many cancers to proliferate.
###### Search method:
* Text/term search in [All fields] (simply type in your query)
* Limiting the search using [filters]
- Organism [ORGN]
- Source database
- Genetic component
- Bio-chemical/physical properties etc.
* Combining multiple search criteria by boolean AND, OR, NOT
* Browsing by taxonomy (right side of the screen)
###### Select one record of your choice
* Browse the GenPept entry
- Identical proteins
- FASTA entry
- Graphical representation of the features
- Other linked data
- Articles
- Pathways
- Reference sequences
- Homologs
- Related information
- Link-outs
- Analysis options (we will explore these later)
- BLAST
- Domains
- Sequence features
- Regular expression
- Tertiary structure
- Multiple alignment by COBALT
#### 2. [UniProt Knowledgebase](https://www.ebi.ac.uk/uniprot)
- Swissprot and Trembl
- Cross-reference
- Other resources for proteins
## Computational structure prediction of proteins
### Secondary and tertiary structure prediction
**Tool used in this course**: [Phyre2](http://www.sbg.bio.ic.ac.uk/phyre2/index.cgi)
- Phyre2 is a collection of of tools to predict and analyze protein structure, function and mutations.
- It uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyse the effect of amino acid variants or a user's protein sequence.
![How does phyre work?](http://www.nature.com/nprot/journal/v10/n6/images_article/nprot.2015.053-F1.jpg)
1. A database of all the know structure is created
1. For each sequence in this database, PSI-BLAST is carried out to identify homologs
1. A HMM is created using all the homologs
1. The above steps are followed for all the sequences of the known structures, which creates a structure database
1. Query protein is subjected to PSI-BLAST and HMM is creates
1. The HMM of the query is compared to the HMM in the database
1. The fragments of matching strcutures are assembled into a predicted structures
- It analyses query sequence and generates a result page with comprehensive analysis result called Phyre investigator.
* We will look at the different fields of the result using working examples listed below.
**Caution**
- Any structure prediction takes long time, so do not re-analyze any of the examples given below (use the provided results to save time)
- You can submit your proteins as queries and you should have prediction result by the end of the day (or tomorrow)
- You should always save the results (because who would want to wait for another day to analyze the same proteins)
#### [Exercises taken from Phyre2 server](http://www.sbg.bio.ic.ac.uk/phyre2/workshops/2016/EBI/worked_examples.html)
##### Working with example to understand Phyre Investigator
For help viewing your PDB file off-line, please see the FAQ here:
http://www.sbg.bio.ic.ac.uk/phyre2/html/help.cgi?id=help/faq
###### Exercise-1
**Example using Human Globin**
````
>Globin_example
SVYDAAAQLTADVKKDLRDSWKVIGSDKKGNGVALMTTLFADNQETIGYFKRLGNVSQGMANDKLRGHSITLMYALQNFIDQLDNPDDLVCVVEKFAVNHITRKISAAEFGKINGPIKKVLASKNFGDKYANAWAKLVAVVQAAL
````
[Phyre investigator](http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/ca8acc6688d7f918/summary.html)
**Walkthrough/interpretation**
Here is a simple example for Phyre investigator to get you used to the interface.
1. Scroll down to the Detailed template information section. You can see the investigator buttons on the right hand side except for rank 3 which says "view investigator results". This is because that analysis is already provided. Please don't press the other investigator buttons on the tutorial examples as multiple people running the analysis is likely to cause a mess.
2. Click on the View investigator results for the rank 3 hit c3pt8B_. This will take you to the Investigator interface. The screen is divided into 3 main horizontal sections: the Info box, The 3D structure and Analyses section, and at the bottom, the Sequence view
3. In the Analyses section, click the Quality tab and below that click the 'ProQ2 quality assessment' button. The structure will be coloured mainly orange and yellow. Look at the key to the left. This indicates most of the structure is towards the 'Good' end of the spectrum. Look at the text box near the top of the page. It gives a brief summary of what this analysis (ProQ2) does.
4. Move your mouse down towards the Sequence view area. Note how as you hover over residues in the sequence view, the corresponding residue in the 3D structure is highlighted. Clicking on a residue causes that position to 'spacefill'. You can clear that by clicking the 'clear selection' button just above the sequence view.
5. Also, hovering over a position in the sequence view displays two bar graphs on the right portion of the middle section. These graphs display the preference of a residue type in the sequence profile ('Sequence Profile' graph) and the likelihood a mutation to one of the 20 amino acids will have a phenotypic effect ('Mutations' graph).
6. In the 'Analyses' section are 3 tabs: Quality, Function, and CDD. Under the 'Quality' tab you can investigate a number of features. Try clicking the 'Ramachandran Analysis' button. A few residues will be colored green and red in the 3D structure. Also a new row will appear for the sequence analysis section. Corresponding residues will appear to those highlighted in the structure. The 'Bad' and 'Allowed' residues only appear in the loop regions. So probably not much to worry about
7. Clicking the 'Disorder' button shows similarly that loop regions and the termini are the only regions with any significant disorder
8. Let's look at the CDD tab. This tab only appears if information from the Conserved Domain Database is available for your sequence. In this case it has detected a Heme-binding site. First click 'clear selection'. Now for each residue colored red in the sequence view, click to spacefill. You should have about 11 residues in spacefill mode, coloured red. As you click on each residue, have a look at the 'Mutations' graph. In almost all cases you can see that mutating the residue to anything other than that in the query sequence is likely to have a phenotypic effect
9. Go to the 'Function' tab. Click through 'conservation', 'pocket detection' and 'mutational sensitivity', reading the text in the Info box for each analysis. Notice how the heme-binding site residues correlate well with these features.
10. Finally, click the protindb interface button to see those residues known to form an interface in the template structure.
###### Exercise-2: [A bad example](http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/d4c0b42a3223e636/summary.html)
````
>CD630_32760_[_protein=PTS_system,_mannose/fructose/sorbose_IID_component]_[protein_id=CAJ70173.1]
MTLNKLTKKELRSMFWRSFALQGAFNYERMQNLGYCYSMLPAIKKLYNKKENQAKAIERHLEIFNTTTVVVPAILGITAAMEEENANNPEFDESSISAVKTALMGPLAGIGDSLFWGTFRIIAAGVGVSLAKEGNIFGPLLFLLLYNIPAFALRIFGLKYGYQVGVNSLERIQREGLMEKIMSMATTVGLFVVGGMVATMLSITTPLKFNLNGAEVILQDILDKIIPNMLPLTFAFVIYYMLKRKVSVTKLTIGTIVTGIALHAIGLL
````
1. First note the low confidence (41%) and low coverage (16%). Immediately you know you aren't going to learn too much from this.
Note the PhyreAlarm icon. This pops up in such cases of low confidence and coverage.
2. Go down to the Sequence Analysis section. Click the button for PSI-Blast Pseudo-multiple sequence alignment. That opens a new window. One can see plenty of homologous sequences, which is good. It means the secondary structure prediction will be pretty accurate and the hidden Markov model for the sequence should be quite powerful. But the lack of any confident hits suggest maybe this is a new fold or just really remote from anything we have a structure for.
3. Look at the Secondary structure and disorder prediction. Click Show'. No significant disorder, confidently all alpha helices (SS confidence mainly red). Notice the gold helices? That indicates transmembrane helices. Click 'Hide' to close the secondary structure prediction panel.
4. Scroll down to Domain analysis and click Show. Only short blue and green matches, all well below any useful confidence threshold.
5. Click Hide to hide the domain analysis. Scroll down to the Detailed template information. One can see that the rank 3 and 4 hits have red boxes highlighting the 40% identity between the query protein and the template. But then look how short they are. One can often get high sequence identities purely by chance from short alignments.
6. Scroll to the very bottom of the web page. You can choose to Hide the Detailed template information if you like to make this easier. You'll see the Transmembrane helix prediction section.
7. It Looks like all we can get from this run is possibly a useful TM topology prediction. The image indicates the extracellular and cytoplasmic sides of the helices and their start and stop positions. This is probably a good candidate for PhyreAlarm. Maybe a new structure will come out in the weeks ahead that we can build a model on.
###### Exercise-3: investigate few more examples
Example using Human [Prion](http://www.uniprot.org/uniprot/P04156)
````
>sp|P04156|PRIO_HUMAN Major prion protein OS=Homo sapiens GN=PRNP PE=1 SV=1
MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPVILLISFLIFLIVG
````
Full html results of all homologues, models, secondary structure etc. available at:
http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/ecba3197e3730bcd/summary.html
Example using [Human Toll like receptor1](http://www.uniprot.org/uniprot/Q15399)
````
>sp|Q15399|TLR1_HUMAN Toll-like receptor 1 OS=Homo sapiens GN=TLR1 PE=1 SV=3
MTSIFHFAIIFMLILQIRIQLSEESEFLVDRSKNGLIHVPKDLSQKTTILNISQNYISELWTSDILSLSKLRILIISHNRIQYLDISVFKFNQELEYLDLSHNKLVKISCHPTVNLKHLDLSFNAFDALPICKEFGNMSQLKFLGLSTTHLEKSSVLPIAHLNISKVLLVLGETYGEKEDPEGLQDFNTESLHIVFPTNKEFHFILDVSVKTVANLELSNIKCVLEDNKCSYFLSILAKLQTNPKLSNLTLNNIETTWNSFIRILQLVWHTTVWYFSISNVKLQGQLDFRDFDYSGTSLKALSIHQVVSDVFGFPQSYIYEIFSNMNIKNFTVSGTRMVHMLCPSKISPFLHLDFSNNLLTDTVFENCGHLTELETLILQMNQLKELSKIAEMTTQMKSLQQLDISQNSVSYDEKKGDCSWTKSLLSLNMSSNILTDTIFRCLPPRIKVLDLHSNKIKSIPKQVVKLEALQELNVAFNSLTDLPGCGSFSSLSVLIIDHNSVSHPSADFFQSCQKMRSIKAGDNPFQCTCELGEFVKNIDQVSSEVLEGWPDSYKCDYPESYRGTLLKDFHMSELSCNITLLIVTIVATMLVLAVTVTSLCSYLDLPWYLRMVCQWTQTRRRARNIPLEELQRNLQFHAFISYSGHDSFWVKNELLPNLEKEGMQICLHERNFVPGKSIVENIITCIEKSYKSIFVLSPNFVQSEWCHYELYFAHHNLFHEGSNSLILILLEPIPQYSIPSSYHKLKSLMARRTYLEWPKEKSKRGLFWANLRAAINIKLTEQAKK
````
Full html results of all homologues, models, secondary structure etc. available at:
http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/6f4f568f92839199/summary.html
###### Example of intensive analysis: http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/d4a1f7b1ec99b495/summary.html
1. Click the Interactive 3D view in JSmol link. Maybe that N-terminal blue alpha-helix (which was built ab initio) probably shouldn't be where it is. It should probably pack better - but ab initio is tricky! Also there appears to be some tangling in the red C-terminus. This is usually caused by disagreements between the input templates in that region.
2. In the summary section click on the link called 'Details' below the confidence key. This takes you to the bottom of the page of results to the Multi-template and ab initio information table. This table shows you which templates were used, what regions of your sequence they covered, and their confidence.
3. In particular, note that template d1svma_ (bottom of the list) covers a significant extra region of the query protein at the N-terminus, but is missing a sizeable segment at the C-terminus. Luckily the other templates cover this region well already. This is where using multiple-templates as a 'patchwork' can improve model coverage.
In this case, use of intensive has managed to model an extra 60+ residues. Its also had a fair go at the missing first 20 residues that have no template. The secondary structure prediction says this should be helix and intensive (or rather the Poing system) has attempted to build a helix for these residues and pack them against the rest of the structure. However, whenever ab initio modelling is concerned, please take results with a large pinch of salt.
We are working very hard on methods to avoid 'spaghettification', tangling from inconsistent templates, and better methods of template selection, including user-defined selections. These new approaches should be incorporated into Phyre2 by the end of 2016.
Intensive mode often creates excellent full length models that cannot be achieved by normal mode. The examples presented here are designed to illustrate how the process can occasionally go wrong, how to detect the problem and diagnose the cause.
**Other tools**:
Example: Human Toll like receptor 4
````
>sp|O00206|TLR4_HUMAN Toll-like receptor 4 OS=Homo sapiens GN=TLR4 PE=1 SV=2
MMSASRLAGTLIPAMAFLSCVRPESWEPCVEVVPNITYQCMELNFYKIPDNLPFSTKNLD
LSFNPLRHLGSYSFFSFPELQVLDLSRCEIQTIEDGAYQSLSHLSTLILTGNPIQSLALG
AFSGLSSLQKLVAVETNLASLENFPIGHLKTLKELNVAHNLIQSFKLPEYFSNLTNLEHL
DLSSNKIQSIYCTDLRVLHQMPLLNLSLDLSLNPMNFIQPGAFKEIRLHKLTLRNNFDSL
NVMKTCIQGLAGLEVHRLVLGEFRNEGNLEKFDKSALEGLCNLTIEEFRLAYLDYYLDDI
IDLFNCLTNVSSFSLVSVTIERVKDFSYNFGWQHLELVNCKFGQFPTLKLKSLKRLTFTS
NKGGNAFSEVDLPSLEFLDLSRNGLSFKGCCSQSDFGTTSLKYLDLSFNGVITMSSNFLG
LEQLEHLDFQHSNLKQMSEFSVFLSLRNLIYLDISHTHTRVAFNGIFNGLSSLEVLKMAG
NSFQENFLPDIFTELRNLTFLDLSQCQLEQLSPTAFNSLSSLQVLNMSHNNFFSLDTFPY
KCLNSLQVLDYSLNHIMTSKKQELQHFPSSLAFLNLTQNDFACTCEHQSFLQWIKDQRQL
LVEVERMECATPSDKQGMPVLSLNITCQMNKTIIGVSVLSVLVVSVVAVLVYKFYFHLML
LAGCIKYGRGENIYDAFVIYSSQDEDWVRNELVKNLEEGVPPFQLCLHYRDFIPGVAIAA
NIIHEGFHKSRKVIVVVSQHFIQSRWCIFEYEIAQTWQFLSSRAGIIFIVLQKVEKTLLR
QQVELYRLLSRNTYLEWEDSVLGRHIFWRRLRKALLDGKSWNPEGTVGTGCNWQEATSI
````
### Secondary structure prediction
[JPred](http://www.compbio.dundee.ac.uk/jpred/)
JPred is a protein secondary structure prediction tool. It also makes predictions on Solvent Accessibility and Coiled-coil regions. It first searches the query sequence in PDB to identify homologous structure, if not available it predicts structure using Jnet algorithm, which uses neural network secondary structure prediction algorithm with different types of multiple sequence alignment profiles derived from the same sequences.
###### Example using their default example protein
- There is a limit of sequence length to 800 aa, however the sequence can be split and used at batch mode
- Run Jpred (click the `Make Prediction`)
- It immediately opens a list of PDB tht matches the query
- One can either explore those proteins individually using the PDB linkout
- Or, submit the job to identify a more accurate secondary structure assignment.
- the result page shows the predicted components of secondary structures
Default query using default parameters:
````
MQVWPIEGIKKFETLSYLPPLTVEDLLKQIEYLLRSKWVPCLEFSKVGFVYRENHRSPGYYDGRYWTMWKLPMFGCTDATQVLKELEEAKKAYPDAFVRIIGFDNVRQVQLISFIAYKPPGC
````
[result page](http://www.compbio.dundee.ac.uk/jpred4/results/jp_1Xqh2hJ/jp_1Xqh2hJ.results.html)
###### Long sequence of Toll-like receptor 4 split into multiple fragments (advanced search):
**Result:**
An archive with all the results can be downloaded from the following link:
http://www.compbio.dundee.ac.uk/jpred4/results/jp_batch_1478672019__YtqOUQg/jp_batch_1478672019__ALL_JOBS_ARCHIVE.tar.gz
**Results for individual queries are available from the links below:**
- TLR4_HUMAN1 [Link to results](http://www.compbio.dundee.ac.uk/jpred4/results/jp_batch_1478672019__6tFSD1_)
- TLR4_HUMAN2 [Link to results](http://www.compbio.dundee.ac.uk/jpred4/results/jp_batch_1478672019__YtqOUQg)
- TLR4_HUMAN3 [Link to results](http://www.compbio.dundee.ac.uk/jpred4/results/jp_batch_1478672019__mUJGlod)
### Tertiary structure prediction
1. [I-TASSER](http://zhanglab.ccmb.med.umich.edu/I-TASSER/)
Requires registration and submission permission
Check an already analysed result page for [Toll like receptor 4](http://www.uniprot.org/uniprot/O00206) using I-TASSER
Results available at: http://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S298631/
2. (PS)2-v2: [Protein Structure Prediction Server](http://ps2.life.nctu.edu.tw/docs.php)
- A much faster tool compared the Phyre2, unfortunately broken at the moment.
- combines both sequence and secondary structure information for the detection of homologous proteins with remote similarity and the target-template alignment.
- Check an already analysed result page for [Toll like receptor 4](http://www.uniprot.org/uniprot/O00206) using (PS)2-v2
Results available at: http://140.113.239.111/~ps2v2/display_multi.php?folder=408053421
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
File added
File added
- Pettersen, E.F., et al., UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem. Oct;25(13):1605-12. [PMID: 15264254]
\ No newline at end of file
File added
File added
![Exercises](https://docs.google.com/document/d/1S_gMSr3P6C_qk9lQ7IH0F04E4n7iFLXsoOmpzJ45Xpk/edit?usp=sharing)
\ No newline at end of file
File added
File added