Commit 628077a8 authored by Hugo Carlos's avatar Hugo Carlos
Browse files

Upload New File

parent 247d4d56
## Computational structure prediction of proteins
### Secondary and tertiary structure prediction
**Tool used in this course**: [Phyre2](http://www.sbg.bio.ic.ac.uk/phyre2/index.cgi)
- Phyre2 is a collection of of tools to predict and analyze protein structure, function and mutations.
- It uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyse the effect of amino acid variants or a user's protein sequence.
![How does phyre work?](http://www.nature.com/nprot/journal/v10/n6/images_article/nprot.2015.053-F1.jpg)
1. A database of all the know structure is created
1. For each sequence in this database, PSI-BLAST is carried out to identify homologs
1. A HMM is created using all the homologs
1. The above steps are followed for all the sequences of the known structures, which creates a structure database
1. Query protein is subjected to PSI-BLAST and HMM is creates
1. The HMM of the query is compared to the HMM in the database
1. The fragments of matching strcutures are assembled into a predicted structures
- It analyses query sequence and generates a result page with comprehensive analysis result called Phyre investigator.
* We will look at the different fields of the result using working examples listed below.
**Caution**
- Any structure prediction takes long time, so do not re-analyze any of the examples given below (use the provided results to save time)
- You can submit your proteins as queries and you should have prediction result by the end of the day (or tomorrow)
- You should always save the results (because who would want to wait for another day to analyze the same proteins)
#### [Exercises taken from Phyre2 server](http://www.sbg.bio.ic.ac.uk/phyre2/workshops/2016/EBI/worked_examples.html)
##### Working with example to understand Phyre Investigator
For help viewing your PDB file off-line, please see the FAQ here:
http://www.sbg.bio.ic.ac.uk/phyre2/html/help.cgi?id=help/faq
###### Exercise-1
**Example using Human Globin**
````
>Globin_example
SVYDAAAQLTADVKKDLRDSWKVIGSDKKGNGVALMTTLFADNQETIGYFKRLGNVSQGMANDKLRGHSITLMYALQNFIDQLDNPDDLVCVVEKFAVNHITRKISAAEFGKINGPIKKVLASKNFGDKYANAWAKLVAVVQAAL
````
[Phyre investigator](http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/ca8acc6688d7f918/summary.html)
**Walkthrough/interpretation**
Here is a simple example for Phyre investigator to get you used to the interface.
1. Scroll down to the Detailed template information section. You can see the investigator buttons on the right hand side except for rank 3 which says "view investigator results". This is because that analysis is already provided. Please don't press the other investigator buttons on the tutorial examples as multiple people running the analysis is likely to cause a mess.
2. Click on the View investigator results for the rank 3 hit c3pt8B_. This will take you to the Investigator interface. The screen is divided into 3 main horizontal sections: the Info box, The 3D structure and Analyses section, and at the bottom, the Sequence view
3. In the Analyses section, click the Quality tab and below that click the 'ProQ2 quality assessment' button. The structure will be coloured mainly orange and yellow. Look at the key to the left. This indicates most of the structure is towards the 'Good' end of the spectrum. Look at the text box near the top of the page. It gives a brief summary of what this analysis (ProQ2) does.
4. Move your mouse down towards the Sequence view area. Note how as you hover over residues in the sequence view, the corresponding residue in the 3D structure is highlighted. Clicking on a residue causes that position to 'spacefill'. You can clear that by clicking the 'clear selection' button just above the sequence view.
5. Also, hovering over a position in the sequence view displays two bar graphs on the right portion of the middle section. These graphs display the preference of a residue type in the sequence profile ('Sequence Profile' graph) and the likelihood a mutation to one of the 20 amino acids will have a phenotypic effect ('Mutations' graph).
6. In the 'Analyses' section are 3 tabs: Quality, Function, and CDD. Under the 'Quality' tab you can investigate a number of features. Try clicking the 'Ramachandran Analysis' button. A few residues will be colored green and red in the 3D structure. Also a new row will appear for the sequence analysis section. Corresponding residues will appear to those highlighted in the structure. The 'Bad' and 'Allowed' residues only appear in the loop regions. So probably not much to worry about
7. Clicking the 'Disorder' button shows similarly that loop regions and the termini are the only regions with any significant disorder
8. Let's look at the CDD tab. This tab only appears if information from the Conserved Domain Database is available for your sequence. In this case it has detected a Heme-binding site. First click 'clear selection'. Now for each residue colored red in the sequence view, click to spacefill. You should have about 11 residues in spacefill mode, coloured red. As you click on each residue, have a look at the 'Mutations' graph. In almost all cases you can see that mutating the residue to anything other than that in the query sequence is likely to have a phenotypic effect
9. Go to the 'Function' tab. Click through 'conservation', 'pocket detection' and 'mutational sensitivity', reading the text in the Info box for each analysis. Notice how the heme-binding site residues correlate well with these features.
10. Finally, click the protindb interface button to see those residues known to form an interface in the template structure.
###### Exercise-2: [A bad example](http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/d4c0b42a3223e636/summary.html)
````
>CD630_32760_[_protein=PTS_system,_mannose/fructose/sorbose_IID_component]_[protein_id=CAJ70173.1]
MTLNKLTKKELRSMFWRSFALQGAFNYERMQNLGYCYSMLPAIKKLYNKKENQAKAIERHLEIFNTTTVVVPAILGITAAMEEENANNPEFDESSISAVKTALMGPLAGIGDSLFWGTFRIIAAGVGVSLAKEGNIFGPLLFLLLYNIPAFALRIFGLKYGYQVGVNSLERIQREGLMEKIMSMATTVGLFVVGGMVATMLSITTPLKFNLNGAEVILQDILDKIIPNMLPLTFAFVIYYMLKRKVSVTKLTIGTIVTGIALHAIGLL
````
1. First note the low confidence (41%) and low coverage (16%). Immediately you know you aren't going to learn too much from this.
Note the PhyreAlarm icon. This pops up in such cases of low confidence and coverage.
2. Go down to the Sequence Analysis section. Click the button for PSI-Blast Pseudo-multiple sequence alignment. That opens a new window. One can see plenty of homologous sequences, which is good. It means the secondary structure prediction will be pretty accurate and the hidden Markov model for the sequence should be quite powerful. But the lack of any confident hits suggest maybe this is a new fold or just really remote from anything we have a structure for.
3. Look at the Secondary structure and disorder prediction. Click Show'. No significant disorder, confidently all alpha helices (SS confidence mainly red). Notice the gold helices? That indicates transmembrane helices. Click 'Hide' to close the secondary structure prediction panel.
4. Scroll down to Domain analysis and click Show. Only short blue and green matches, all well below any useful confidence threshold.
5. Click Hide to hide the domain analysis. Scroll down to the Detailed template information. One can see that the rank 3 and 4 hits have red boxes highlighting the 40% identity between the query protein and the template. But then look how short they are. One can often get high sequence identities purely by chance from short alignments.
6. Scroll to the very bottom of the web page. You can choose to Hide the Detailed template information if you like to make this easier. You'll see the Transmembrane helix prediction section.
7. It Looks like all we can get from this run is possibly a useful TM topology prediction. The image indicates the extracellular and cytoplasmic sides of the helices and their start and stop positions. This is probably a good candidate for PhyreAlarm. Maybe a new structure will come out in the weeks ahead that we can build a model on.
###### Exercise-3: investigate few more examples
Example using Human [Prion](http://www.uniprot.org/uniprot/P04156)
````
>sp|P04156|PRIO_HUMAN Major prion protein OS=Homo sapiens GN=PRNP PE=1 SV=1
MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPVILLISFLIFLIVG
````
Full html results of all homologues, models, secondary structure etc. available at:
http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/ecba3197e3730bcd/summary.html
Example using [Human Toll like receptor1](http://www.uniprot.org/uniprot/Q15399)
````
>sp|Q15399|TLR1_HUMAN Toll-like receptor 1 OS=Homo sapiens GN=TLR1 PE=1 SV=3
MTSIFHFAIIFMLILQIRIQLSEESEFLVDRSKNGLIHVPKDLSQKTTILNISQNYISELWTSDILSLSKLRILIISHNRIQYLDISVFKFNQELEYLDLSHNKLVKISCHPTVNLKHLDLSFNAFDALPICKEFGNMSQLKFLGLSTTHLEKSSVLPIAHLNISKVLLVLGETYGEKEDPEGLQDFNTESLHIVFPTNKEFHFILDVSVKTVANLELSNIKCVLEDNKCSYFLSILAKLQTNPKLSNLTLNNIETTWNSFIRILQLVWHTTVWYFSISNVKLQGQLDFRDFDYSGTSLKALSIHQVVSDVFGFPQSYIYEIFSNMNIKNFTVSGTRMVHMLCPSKISPFLHLDFSNNLLTDTVFENCGHLTELETLILQMNQLKELSKIAEMTTQMKSLQQLDISQNSVSYDEKKGDCSWTKSLLSLNMSSNILTDTIFRCLPPRIKVLDLHSNKIKSIPKQVVKLEALQELNVAFNSLTDLPGCGSFSSLSVLIIDHNSVSHPSADFFQSCQKMRSIKAGDNPFQCTCELGEFVKNIDQVSSEVLEGWPDSYKCDYPESYRGTLLKDFHMSELSCNITLLIVTIVATMLVLAVTVTSLCSYLDLPWYLRMVCQWTQTRRRARNIPLEELQRNLQFHAFISYSGHDSFWVKNELLPNLEKEGMQICLHERNFVPGKSIVENIITCIEKSYKSIFVLSPNFVQSEWCHYELYFAHHNLFHEGSNSLILILLEPIPQYSIPSSYHKLKSLMARRTYLEWPKEKSKRGLFWANLRAAINIKLTEQAKK
````
Full html results of all homologues, models, secondary structure etc. available at:
http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/6f4f568f92839199/summary.html
###### Example of intensive analysis: http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/d4a1f7b1ec99b495/summary.html
1. Click the Interactive 3D view in JSmol link. Maybe that N-terminal blue alpha-helix (which was built ab initio) probably shouldn't be where it is. It should probably pack better - but ab initio is tricky! Also there appears to be some tangling in the red C-terminus. This is usually caused by disagreements between the input templates in that region.
2. In the summary section click on the link called 'Details' below the confidence key. This takes you to the bottom of the page of results to the Multi-template and ab initio information table. This table shows you which templates were used, what regions of your sequence they covered, and their confidence.
3. In particular, note that template d1svma_ (bottom of the list) covers a significant extra region of the query protein at the N-terminus, but is missing a sizeable segment at the C-terminus. Luckily the other templates cover this region well already. This is where using multiple-templates as a 'patchwork' can improve model coverage.
In this case, use of intensive has managed to model an extra 60+ residues. Its also had a fair go at the missing first 20 residues that have no template. The secondary structure prediction says this should be helix and intensive (or rather the Poing system) has attempted to build a helix for these residues and pack them against the rest of the structure. However, whenever ab initio modelling is concerned, please take results with a large pinch of salt.
We are working very hard on methods to avoid 'spaghettification', tangling from inconsistent templates, and better methods of template selection, including user-defined selections. These new approaches should be incorporated into Phyre2 by the end of 2016.
Intensive mode often creates excellent full length models that cannot be achieved by normal mode. The examples presented here are designed to illustrate how the process can occasionally go wrong, how to detect the problem and diagnose the cause.
**Other tools**:
Example: Human Toll like receptor 4
````
>sp|O00206|TLR4_HUMAN Toll-like receptor 4 OS=Homo sapiens GN=TLR4 PE=1 SV=2
MMSASRLAGTLIPAMAFLSCVRPESWEPCVEVVPNITYQCMELNFYKIPDNLPFSTKNLD
LSFNPLRHLGSYSFFSFPELQVLDLSRCEIQTIEDGAYQSLSHLSTLILTGNPIQSLALG
AFSGLSSLQKLVAVETNLASLENFPIGHLKTLKELNVAHNLIQSFKLPEYFSNLTNLEHL
DLSSNKIQSIYCTDLRVLHQMPLLNLSLDLSLNPMNFIQPGAFKEIRLHKLTLRNNFDSL
NVMKTCIQGLAGLEVHRLVLGEFRNEGNLEKFDKSALEGLCNLTIEEFRLAYLDYYLDDI
IDLFNCLTNVSSFSLVSVTIERVKDFSYNFGWQHLELVNCKFGQFPTLKLKSLKRLTFTS
NKGGNAFSEVDLPSLEFLDLSRNGLSFKGCCSQSDFGTTSLKYLDLSFNGVITMSSNFLG
LEQLEHLDFQHSNLKQMSEFSVFLSLRNLIYLDISHTHTRVAFNGIFNGLSSLEVLKMAG
NSFQENFLPDIFTELRNLTFLDLSQCQLEQLSPTAFNSLSSLQVLNMSHNNFFSLDTFPY
KCLNSLQVLDYSLNHIMTSKKQELQHFPSSLAFLNLTQNDFACTCEHQSFLQWIKDQRQL
LVEVERMECATPSDKQGMPVLSLNITCQMNKTIIGVSVLSVLVVSVVAVLVYKFYFHLML
LAGCIKYGRGENIYDAFVIYSSQDEDWVRNELVKNLEEGVPPFQLCLHYRDFIPGVAIAA
NIIHEGFHKSRKVIVVVSQHFIQSRWCIFEYEIAQTWQFLSSRAGIIFIVLQKVEKTLLR
QQVELYRLLSRNTYLEWEDSVLGRHIFWRRLRKALLDGKSWNPEGTVGTGCNWQEATSI
````
### Secondary structure prediction
[JPred](http://www.compbio.dundee.ac.uk/jpred/)
JPred is a protein secondary structure prediction tool. It also makes predictions on Solvent Accessibility and Coiled-coil regions. It first searches the query sequence in PDB to identify homologous structure, if not available it predicts structure using Jnet algorithm, which uses neural network secondary structure prediction algorithm with different types of multiple sequence alignment profiles derived from the same sequences.
###### Example using their default example protein
- There is a limit of sequence length to 800 aa, however the sequence can be split and used at batch mode
- Run Jpred (click the `Make Prediction`)
- It immediately opens a list of PDB tht matches the query
- One can either explore those proteins individually using the PDB linkout
- Or, submit the job to identify a more accurate secondary structure assignment.
- the result page shows the predicted components of secondary structures
Default query using default parameters:
````
MQVWPIEGIKKFETLSYLPPLTVEDLLKQIEYLLRSKWVPCLEFSKVGFVYRENHRSPGYYDGRYWTMWKLPMFGCTDATQVLKELEEAKKAYPDAFVRIIGFDNVRQVQLISFIAYKPPGC
````
[result page](http://www.compbio.dundee.ac.uk/jpred4/results/jp_1Xqh2hJ/jp_1Xqh2hJ.results.html)
###### Long sequence of Toll-like receptor 4 split into multiple fragments (advanced search):
**Result:**
An archive with all the results can be downloaded from the following link:
http://www.compbio.dundee.ac.uk/jpred4/results/jp_batch_1478672019__YtqOUQg/jp_batch_1478672019__ALL_JOBS_ARCHIVE.tar.gz
**Results for individual queries are available from the links below:**
- TLR4_HUMAN1 [Link to results](http://www.compbio.dundee.ac.uk/jpred4/results/jp_batch_1478672019__6tFSD1_)
- TLR4_HUMAN2 [Link to results](http://www.compbio.dundee.ac.uk/jpred4/results/jp_batch_1478672019__YtqOUQg)
- TLR4_HUMAN3 [Link to results](http://www.compbio.dundee.ac.uk/jpred4/results/jp_batch_1478672019__mUJGlod)
### Tertiary structure prediction
1. [I-TASSER](http://zhanglab.ccmb.med.umich.edu/I-TASSER/)
Requires registration and submission permission
Check an already analysed result page for [Toll like receptor 4](http://www.uniprot.org/uniprot/O00206) using I-TASSER
Results available at: http://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S298631/
2. (PS)2-v2: [Protein Structure Prediction Server](http://ps2.life.nctu.edu.tw/docs.php)
- A much faster tool compared the Phyre2, unfortunately broken at the moment.
- combines both sequence and secondary structure information for the detection of homologous proteins with remote similarity and the target-template alignment.
- Check an already analysed result page for [Toll like receptor 4](http://www.uniprot.org/uniprot/O00206) using (PS)2-v2
Results available at: http://140.113.239.111/~ps2v2/display_multi.php?folder=408053421
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment