Add new file

66d97b36 · Malvika Sharan · 8ca8948d · 66d97b36
Commit 66d97b36 authored 8 years ago by Malvika Sharan
--- a/TeachingMaterials/tertiary_structure_pred.md
+++ b/TeachingMaterials/tertiary_structure_pred.md
+## Computational structure prediction of proteins
+
+### Secondary structure prediction
+
+### Tertiary structure prediction
+
+**Tool name**: [Phyre2](http://www.sbg.bio.ic.ac.uk/phyre2/index.cgi)
+
+- Phyre2 is a collection of of tools to predict and analyze protein structure, function and mutations. 
+- It uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyse the effect of amino acid variants or a user's protein sequence. 
+    
+![How does phyre work?](http://www.nature.com/nprot/journal/v10/n6/images_article/nprot.2015.053-F1.jpg)
+
+1. A database of all the know structure is created
+1. For each sequence in this database, PSI-BLAST is carried out to identify homologs
+1. A HMM is created using all the homologs
+1. The above steps are followed for all the sequences of the known structures, which creates a structure database
+1. Query protein is subjected to PSI-BLAST and HMM is creates
+1. The HMM of the query is compared to the HMM in the database
+1. The fragments of matching strcutures are assembled into a predicted structures
+
+- It analyses query sequence and generates a result page with comprehensive analysis result called Phyre investigator.
+    * We will look at the different fields of the result using working examples listed below.
+    
+#### [Exercises taken from Phyre2 server](http://www.sbg.bio.ic.ac.uk/phyre2/workshops/2016/EBI/worked_examples.html)
+
+##### Working with example to understand Phyre Investigator
+
+For help viewing your PDB file off-line, please see the FAQ here:
+http://www.sbg.bio.ic.ac.uk/phyre2/html/help.cgi?id=help/faq
+
+###### Exercise-1
+
+**Example using Human Globin**
+
+````
+>Globin_example
+SVYDAAAQLTADVKKDLRDSWKVIGSDKKGNGVALMTTLFADNQETIGYFKRLGNVSQGMANDKLRGHSITLMYALQNFIDQLDNPDDLVCVVEKFAVNHITRKISAAEFGKINGPIKKVLASKNFGDKYANAWAKLVAVVQAAL
+````
+
+[Phyre investigator](http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/ca8acc6688d7f918/summary.html)
+
+**Walkthrough/interpretation**
+
+Here is a simple example for Phyre investigator to get you used to the interface.
+
+1. Scroll down to the Detailed template information section. You can see the investigator buttons on the right hand side except for rank 3 which says "view investigator results". This is because that analysis is already provided. Please don't press the other investigator buttons on the tutorial examples as multiple people running the analysis is likely to cause a mess.
+
+2. Click on the View investigator results for the rank 3 hit c3pt8B_. This will take you to the Investigator interface. The screen is divided into 3 main horizontal sections: the Info box, The 3D structure and Analyses section, and at the bottom, the Sequence view
+
+3. In the Analyses section, click the Quality tab and below that click the 'ProQ2 quality assessment' button. The structure will be coloured mainly orange and yellow. Look at the key to the left. This indicates most of the structure is towards the 'Good' end of the spectrum. Look at the text box near the top of the page. It gives a brief summary of what this analysis (ProQ2) does.
+
+4. Move your mouse down towards the Sequence view area. Note how as you hover over residues in the sequence view, the corresponding residue in the 3D structure is highlighted. Clicking on a residue causes that position to 'spacefill'. You can clear that by clicking the 'clear selection' button just above the sequence view.
+
+5. Also, hovering over a position in the sequence view displays two bar graphs on the right portion of the middle section. These graphs display the preference of a residue type in the sequence profile ('Sequence Profile' graph) and the likelihood a mutation to one of the 20 amino acids will have a phenotypic effect ('Mutations' graph).
+
+6. In the 'Analyses' section are 3 tabs: Quality, Function, and CDD. Under the 'Quality' tab you can investigate a number of features. Try clicking the 'Ramachandran Analysis' button. A few residues will be colored green and red in the 3D structure. Also a new row will appear for the sequence analysis section. Corresponding residues will appear to those highlighted in the structure. The 'Bad' and 'Allowed' residues only appear in the loop regions. So probably not much to worry about
+
+7.Clicking the 'Disorder' button shows similarly that loop regions and the termini are the only regions with any significant disorder
+
+8.Let's look at the CDD tab. This tab only appears if information from the Conserved Domain Database is available for your sequence. In this case it has detected a Heme-binding site. First click 'clear selection'. Now for each residue colored red in the sequence view, click to spacefill. You should have about 11 residues in spacefill mode, coloured red. As you click on each residue, have a look at the 'Mutations' graph. In almost all cases you can see that mutating the residue to anything other than that in the query sequence is likely to have a phenotypic effect
+
+9.Go to the 'Function' tab. Click through 'conservation', 'pocket detection' and 'mutational sensitivity', reading the text in the Info box for each analysis. Notice how the heme-binding site residues correlate well with these features.
+
+10.Finally, click the protindb interface button to see those residues known to form an interface in the template structure.
+
+
+######Exercise-2: [A bad example](http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/d4c0b42a3223e636/summary.html)
+
+````
+>CD630_32760_[_protein=PTS_system,_mannose/fructose/sorbose_IID_component]_[protein_id=CAJ70173.1]
+MTLNKLTKKELRSMFWRSFALQGAFNYERMQNLGYCYSMLPAIKKLYNKKENQAKAIERHLEIFNTTTVVVPAILGITAAMEEENANNPEFDESSISAVKTALMGPLAGIGDSLFWGTFRIIAAGVGVSLAKEGNIFGPLLFLLLYNIPAFALRIFGLKYGYQVGVNSLERIQREGLMEKIMSMATTVGLFVVGGMVATMLSITTPLKFNLNGAEVILQDILDKIIPNMLPLTFAFVIYYMLKRKVSVTKLTIGTIVTGIALHAIGLL
+````
+
+1.First note the low confidence (41%) and low coverage (16%). Immediately you know you aren't going to learn too much from this.
+
+Note the PhyreAlarm icon. This pops up in such cases of low confidence and coverage.
+
+2.Go down to the Sequence Analysis section. Click the button for PSI-Blast Pseudo-multiple sequence alignment. That opens a new window. One can see plenty of homologous sequences, which is good. It means the secondary structure prediction will be pretty accurate and the hidden Markov model for the sequence should be quite powerful. But the lack of any confident hits suggest maybe this is a new fold or just really remote from anything we have a structure for.
+
+3.Look at the Secondary structure and disorder prediction. Click Show'. No significant disorder, confidently all alpha helices (SS confidence mainly red). Notice the gold helices? That indicates transmembrane helices. Click 'Hide' to close the secondary structure prediction panel.
+
+4.Scroll down to Domain analysis and click Show. Only short blue and green matches, all well below any useful confidence threshold.
+
+5.Click Hide to hide the domain analysis. Scroll down to the Detailed template information. One can see that the rank 3 and 4 hits have red boxes highlighting the 40% identity between the query protein and the template. But then look how short they are. One can often get high sequence identities purely by chance from short alignments.
+
+6.Scroll to the very bottom of the web page. You can choose to Hide the Detailed template information if you like to make this easier. You'll see the Transmembrane helix prediction section.
+
+7.It Looks like all we can get from this run is possibly a useful TM topology prediction. The image indicates the extracellular and cytoplasmic sides of the helices and their start and stop positions. This is probably a good candidate for PhyreAlarm. Maybe a new structure will come out in the weeks ahead that we can build a model on.
+
+###### Exercise-3: investigate few more examples
+
+***Example using Human [Prion](http://www.uniprot.org/uniprot/P04156)***
+
+````
+>sp|P04156|PRIO_HUMAN Major prion protein OS=Homo sapiens GN=PRNP PE=1 SV=1
+MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPVILLISFLIFLIVG
+````
+
+Full html results of all homologues, models, secondary structure and more available at:
+http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/ecba3197e3730bcd/summary.html
+
+***Example using Human Toll like receptor***
+
+###### Example of intensive analysis: http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/d4a1f7b1ec99b495/summary.html
+
+1.Click the Interactive 3D view in JSmol link. Maybe that N-terminal blue alpha-helix (which was built ab initio) probably shouldn't be where it is. It should probably pack better - but ab initio is tricky! Also there appears to be some tangling in the red C-terminus. This is usually caused by disagreements between the input templates in that region.
+
+2. In the summary section click on the link called 'Details' below the confidence key. This takes you to the bottom of the page of results to the Multi-template and ab initio information table. This table shows you which templates were used, what regions of your sequence they covered, and their confidence.
+
+3.In particular, note that template d1svma_ (bottom of the list) covers a significant extra region of the query protein at the N-terminus, but is missing a sizeable segment at the C-terminus. Luckily the other templates cover this region well already. This is where using multiple-templates as a 'patchwork' can improve model coverage.
+
+In this case, use of intensive has managed to model an extra 60+ residues. Its also had a fair go at the missing first 20 residues that have no template. The secondary structure prediction says this should be helix and intensive (or rather the Poing system) has attempted to build a helix for these residues and pack them against the rest of the structure. However, whenever ab initio modelling is concerned, please take results with a large pinch of salt.
+
+We are working very hard on methods to avoid 'spaghettification', tangling from inconsistent templates, and better methods of template selection, including user-defined selections. These new approaches should be incorporated into Phyre2 by the end of 2016.
+
+Intensive mode often creates excellent full length models that cannot be achieved by normal mode. The examples presented here are designed to illustrate how the process can occasionally go wrong, how to detect the problem and diagnose the cause.
\ No newline at end of file