Compare revisions

bd8c7d86 · 97838cc2 · 3950a447 · ab32fe13 · abf69147 · 76458d8a
--- a/TeachingMaterials/Antibodypedia.md
+++ b/TeachingMaterials/Antibodypedia.md
@@ -12,18 +12,15 @@ Why does Santa Cruz torture goats?
 ### Examples

 1. FGF13
-
-	* Click through to the NBP2-45642 and see the validation images
+    * Click through to the NBP2-45642 and see the validation images

 2. Beta-Catenin
-
-	* Click on the buttons – What do you get? 
+    * Click on the buttons – What do you get? 
 	* Click on the image – What do you get? 
 	* Do all the antibodies give similar ICC images?
 	* Do most antibodies work for multiple methods? 

 3. Look up antibodies for your favourite proteins
-
 	* Is an antibody you use in the list? 
 	* Do you think you have used the best antibody available for your purposes? 


--- a/TeachingMaterials/2016/EMBOSS_EBI.md
+++ b/TeachingMaterials/2016/EMBOSS_EBI.md
+# [EMBOSS tools for sequence analysis](http://www.ebi.ac.uk/Tools/emboss/)
+
+## [EMBOSS explorer](http://emboss.bioinformatics.nl/cgi-bin/emboss/)
+        
+###### Official [EMBOSS tutorial](http://emboss.sourceforge.net/docs/emboss_tutorial/emboss_tutorial.html) written by [Gary Williams](http://emboss.sourceforge.net/docs/emboss_tutorial/node8.html)
+
+**Why EMBOSS?**
+- Open source
+- Wide range of tools for sequence analysis
+- Ideal for building workflow (commandline tools)
+- Accesses remote databases conveniently
+
+The official EMBOSS suite comprises of over 150 programs that are available as commandline tools and only few of those are offered as web based applications.
+
+Wageningen Bioinformatics Webportal, Netherlands offers [a graphical user interface to the EMBOSS suite](http://emboss.bioinformatics.nl/cgi-bin/emboss/), which we will use today for the hands-on session (more like demo!).
+
+
+## Quick Demo on EMBOSS tools
+
+...but before that, re-use/do the Clustal Omega analysis on your set of 10 P53 sequences. (or, go down this document to use my set of sequences ;) !)
+
+- [extractalign](http://emboss.bioinformatics.nl/cgi-bin/emboss/extractalign)
+    - Swich to [Mview](http://www.ebi.ac.uk/Tools/msa/mview/) to visualize consensus
+    - Also check [alnviz](https://toolkit.tuebingen.mpg.de/alnviz), but don't dive into it today. We will cover such visualizations tomorrow.
+- Create consensus with [cons](http://emboss.bioinformatics.nl/cgi-bin/emboss/cons)
+    - Also check [consambig](http://emboss.bioinformatics.nl/cgi-bin/emboss/consambig): cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the amino acid residue or nucleotide at each position is compared to the possible ambiguity codes using consambig. The consensus sequence uses the minimum ambiguity code match. The ambiguity characters were designed to encode positional variations found among families of related genes. Useful for DNA sequences.
+- use [Merger](Merge two overlapping sequences) to merge two overlapping sequences. It uses a global alignment algorithm (Needleman & Wunsch) to optimally align the sequences. A merged sequence is generated from the alignment and writen to the output file. Also useful in case of DNA.
+- [Dotmatcher](http://emboss.bioinformatics.nl/cgi-bin/emboss/dotmatcher) generates a dotplot from two input sequences. The dotplot is an intuitive graphical representation of the regions of similarity between two sequences. All positions from the first input sequence are compared with all positions from the second input sequence using a specified substitution matrix. 
+- [plotcon](http://emboss.bioinformatics.nl/cgi-bin/emboss/plotcon)
+    - [prettyplot](http://emboss.bioinformatics.nl/cgi-bin/emboss/prettyplot): claims to present alignment with pretty formatting (?)
+
+## Example proteins
+
+For pairwise alignment tools, we can use human p53 and zebrafish dp53:
+- Human p53: [P04637](http://www.uniprot.org/uniprot/P04637.fasta)
+- Zebrafish tp53: [P79734](http://www.uniprot.org/uniprot/P79734.fasta)
+
+````
+>P53_HUMAN|P04637| Cellular tumor antigen p53 OS=Homo sapiens GN=TP53 PE=1 SV=4
+MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP
+DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK
+SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE
+RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
+SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP
+PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG
+GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD
+
+>P53_DANRE|P79734| Cellular tumor antigen p53 OS=Danio rerio GN=tp53 PE=1 SV=1
+MAQNDSQEFAELWEKNLIIQPPGGGSCWDIINDEEYLPGSFDPNFFENVLEEQPQPSTLP
+PTSTVPETSDYPGDHGFRLRFPQSGTAKSVTCTYSPDLNKLFCQLAKTCPVQMVVDVAPP
+QGSVVRATAIYKKSEHVAEVVRRCPHHERTPDGDNLAPAGHLIRVEGNQRANYREDNITL
+RHSVFVPYEAPQLGAEWTTVLLNYMCNSSCMGGMNRRPILTIITLETQEGQLLGRRSFEV
+RVCACPGRDRKTEESNFKKDQETKTMAKTTTGTKRSLVKESSSATLRPEGSKKAKGSSSD
+EEIFTLQVRGRERYEILKKLNDSLELSDVVPASDAEKYRQKFMTKNKKENRESSEPKQGK
+KLMVKDEGRSDSD
+
+````
+
+For dotmatcher we can use these sequences:
+- ZKSC7_HUMAN: [Q9P0L1](http://www.uniprot.org/uniprot/Q9P0L1.fasta)
+- MPDZ_HUMAN: [O75970](http://www.uniprot.org/uniprot/O75970.fasta)
+
+````
+>ZKSC7_HUMAN|Q9P0L1| Zinc finger protein with KRAB and SCAN domains 7 OS=Homo sapiens GN=ZKSCAN7 PE=1 SV=2
+MTTAGRGNLGLIPRSTAFQKQEGRLTVKQEPANQTWGQGSSLQKNYPPVCEIFRLHFRQL
+CYHEMSGPQEALSRLRELCRWWLMPEVHTKEQILELLVLEQFLSILPGELRTWVQLHHPE
+SGEEAVAVVEDFQRHLSGSEEVSAPAQKQEMHFEETTALGTTKESPPTSPLSGGSAPGAH
+LEPPYDPGTHHLPSGDFAQCTSPVPTLPQVGNSGDQAGATVLRMVRPQDTVAYEDLSVDY
+TQKKWKSLTLSQRALQWNMMPENHHSMASLAGENMMKGSELTPKQEFFKGSESSNRTSGG
+LFGVVPGAAETGDVCEDTFKELEGQTSDEEGSRLENDFLEITDEDKKKSTKDRYDKYKEV
+GEHPPLSSSPVEHEGVLKGQKSYRCDECGKAFNRSSHLIGHQRIHTGEKPYECNECGKTF
+RQTSQLIVHLRTHTGEKPYECSECGKAYRHSSHLIQHQRLHNGEKPYKCNECAKAFTQSS
+RLTDHQRTHTGEKPYECNECGEAFIRSKSLARHQVLHTGKKPYKCNECGRAFCSNRNLID
+HQRIHTGEKPYECSECGKAFSRSKCLIRHQSLHTGEKPYKCSECGKAFNQNSQLIEHERI
+HTGEKPFECSECGKAFGLSKCLIRHQRLHTGEKPYKCNECGKSFNQNSHLIIHQRIHTGE
+KPYECNECGKVFSYSSSLMVHQRTHTGEKPYKCNDCGKAFSDSSQLIVHQRVHTGEKPYE
+CSECGKAFSQRSTFNHHQRTHTGEKSSGLAWSVS
+
+>MPDZ_HUMAN|O75970| Multiple PDZ domain protein OS=Homo sapiens GN=MPDZ PE=1 SV=2
+MLEAIDKNRALHAAERLQTKLRERGDVANEDKLSLLKSVLQSPLFSQILSLQTSVQQLKD
+QVNIATSATSNIEYAHVPHLSPAVIPTLQNESFLLSPNNGNLEALTGPGIPHINGKPACD
+EFDQLIKNMAQGRHVEVFELLKPPSGGLGFSVVGLRSENRGELGIFVQEIQEGSVAHRDG
+RLKETDQILAINGQALDQTITHQQAISILQKAKDTVQLVIARGSLPQLVSPIVSRSPSAA
+STISAHSNPVHWQHMETIELVNDGSGLGFGIIGGKATGVIVKTILPGGVADQHGRLCSGD
+HILKIGDTDLAGMSSEQVAQVLRQCGNRVKLMIARGAIEERTAPTALGITLSSSPTSTPE
+LRVDASTQKGEESETFDVELTKNVQGLGITIAGYIGDKKLEPSGIFVKSITKSSAVEHDG
+RIQIGDQIIAVDGTNLQGFTNQQAVEVLRHTGQTVLLTLMRRGMKQEAELMSREDVTKDA
+DLSPVNASIIKENYEKDEDFLSSTRNTNILPTEEEGYPLLSAEIEEIEDAQKQEAALLTK
+WQRIMGINYEIVVAHVSKFSENSGLGISLEATVGHHFIRSVLPEGPVGHSGKLFSGDELL
+EVNGITLLGENHQDVVNILKELPIEVTMVCCRRTVPPTTQSELDSLDLCDIELTEKPHVD
+LGEFIGSSETEDPVLAMTDAGQSTEEVQAPLAMWEAGIQHIELEKGSKGLGFSILDYQDP
+IDPASTVIIIRSLVPGGIAEKDGRLLPGDRLMFVNDVNLENSSLEEAVEALKGAPSGTVR
+IGVAKPLPLSPEEGYVSAKEDSFLYPPHSCEEAGLADKPLFRADLALVGTNDADLVDEST
+FESPYSPENDSIYSTQASILSLHGSSCGDGLNYGSSLPSSPPKDVIENSCDPVLDLHMSL
+EELYTQNLLQRQDENTPSVDISMGPASGFTINDYTPANAIEQQYECENTIVWTESHLPSE
+VISSAELPSVLPDSAGKGSEYLLEQSSLACNAECVMLQNVSKESFERTINIAKGNSSLGM
+TVSANKDGLGMIVRSIIHGGAISRDGRIAIGDCILSINEESTISVTNAQARAMLRRHSLI
+GPDIKITYVPAEHLEEFKISLGQQSGRVMALDIFSSYTGRDIPELPEREEGEGEESELQN
+TAYSNWNQPRRVELWREPSKSLGISIVGGRGMGSRLSNGEVMRGIFIKHVLEDSPAGKNG
+TLKPGDRIVEVDGMDLRDASHEQAVEAIRKAGNPVVFMVQSIINRPRKSPLPSLLHNLYP
+KYNFSSTNPFADSLQINADKAPSQSESEPEKAPLCSVPPPPPSAFAEMGSDHTQSSASKI
+SQDVDKEDEFGYSWKNIRERYGTLTGELHMIELEKGHSGLGLSLAGNKDRSRMSVFIVGI
+DPNGAAGKDGRLQIADELLEINGQILYGRSHQNASSIIKCAPSKVKIIFIRNKDAVNQMA
+VCPGNAVEPLPSNSENLQNKETEPTVTTSDAAVDLSSFKNVQHLELPKDQGGLGIAISEE
+DTLSGVIIKSLTEHGVAATDGRLKVGDQILAVDDEIVVGYPIEKFISLLKTAKMTVKLTI
+HAENPDSQAVPSAAGAASGEKKNSSQSLMVPQSGSPEPESIRNTSRSSTPAIFASDPATC
+PIIPGCETTIEISKGRTGLGLSIVGGSDTLLGAIIIHEVYEEGAACKDGRLWAGDQILEV
+NGIDLRKATHDEAINVLRQTPQRVRLTLYRDEAPYKEEEVCDTLTIELQKKPGKGLGLSI
+VGKRNDTGVFVSDIVKGGIADADGRLMQGDQILMVNGEDVRNATQEAVAALLKCSLGTVT
+LEVGRIKAGPFHSERRPSQSSQVSEGSLSSFTFPLSGSSTSESLESSSKKNALASEIQGL
+RTVEMKKGPTDSLGISIAGGVGSPLGDVPIFIAMMHPTGVAAQTQKLRVGDRIVTICGTS
+TEGMTHTQAVNLLKNASGSIEMQVVAGGDVSVVTGHQQEPASSSLSFTGLTSSSIFQDDL
+GPPQCKSITLERGPDGLGFSIVGGYGSPHGDLPIYVKTVFAKGAASEDGRLKRGDQIIAV
+NGQSLEGVTHEEAVAILKRTKGTVTLMVLS
+
+````
+
+### Set of P53 proteins:
+
+**Raw sequences**
+
+```
+>Mus musculus
+MTAMEESQSDISLELPLSQETFSGLWKLLPPEDILPSPHCMDDLLLPQDVEEFFEGPSEALRVSGAPAAQDPVTETPGPV
+APAPATPWPLSSFVPSQKTYQGNYGFHLGFLQSGTAKSVMCTYSPPLNKLFFQLAKTCPVQLWVSATPPAGSRVRAMAIY
+KKSQHMTEVVRRCPHHERCSDGDGLAPPQHLIRVEGNLYPEYLEDRQTFRHSVVVPYEPPEAGSEYTTIHYKYMCNSSCM
+GGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKEVLCPELPPGSAKRALPTCTSASPPQKKKPL
+DGEYFTLKIRGRKRFEMFRELNEALELKDAHATEESGDSRAHSSLQPRAFQALIKEESPNC
+>Rattus norvegicus
+MEDSQSDMSIELPLSQETFSCLWKLLPPDDILPTTATGSPNSMEDLFLPQDVAELLEGPEEALQVSAPAAQEPGTEAPAP
+VAPASATPWPLSSSVPSQKTYQGNYGFHLGFLQSGTAKSVMCTYSISLNKLFCQLAKTCPVQLWVTSTPPPGTRVRAMAI
+YKKSQHMTEVVRRCPHHERCSDGDGLAPPQHLIRVEGNPYAEYLDDRQTFRHSVVVPYEPPEVGSDYTTIHYKYMCNSSC
+MGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKEEHCPELPPGSAKRALPTSTSSSPQQKKKP
+LDGEYFTLKIRGRERFEMFRELNEALELKDARAAEESGDSRAHSSLQPRTFQALIKKESPNC
+>Mastomys natalensis
+LPLSQETFQRLWKLLPPEAVLSEASPNSMDNMFLSPDVVNLLEGPEEALQVSAAPAAQDPVTETPAPAAPAPATPWPLSS
+FVPSQKTYQGSYGFHLGFLQSGTAKSVMCTYSPSLNKLFCQLAKTCPVQLWVSDTPPAGSRVRAMAIYKKSQHMTEVVRR
+CPHHERCTDGDGLAPPQHLIRVEGNLNAEYLDDKQTFRHSVVVPYEPPEVGSDYTTIHYKYMCNSSCMGGMNRRPILTII
+TLEDSSGNLLGRDSFEVRICACPGRDRRTEEENFRKKEEPCPELPLGSAKRALPTGTSASPQQKKKRLDGEYFTLKIRGR
+ERFEMFRELNEALELKDARAAEELGDSRAHSSYLKTKRGQSSSHHKKPMVKKVGPDSD
+>Microtus ochrogaster
+MEEPQSDLSIEPPLSQETFSDLWNLLPPNNVLSTSLSVDAMEDLFLSQDVANWLEEPNEGPQMSAAASTAEDPVTEAPAP
+VTPAPVTSWPLSSSVPSQKTYQGEYGFRLGFLHSGTAKSVTCTYSPSLNKLFCQLAKTCPVQLWVSSTPPPGTRVRAMAI
+YKKSQHMTEVVRRCPHHERCSDGDGLAPPQHLIRVEGNLRAEYLDDRQTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSC
+MGGMNRRPILTIITLEDPSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGEPRPELPVGSTKRVLPTNTSSPQPKKKPL
+DGEYFTLKIRGRERFKMFSELNEALELKDAQDANGSGDSRAHSSYLKSKKGQSTSRHKKLMIKREGPDSD
+>Nannospalax galili
+MEEQQSDLSIEPPLSQETFSDLWKLLPQNNVLSTPLSPNSMEDLLLSPEDVANWLDDPDEALQVPAAAITGDPVTETSAP
+VAPPPATPWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPPLNKLFCQLAKTCPVQLWVDSTPPPGTRVRAMAI
+YKKSQHMTEVVKRCPHHERCSDSDGLAPPQHLIRVEGNLRAEYLDDKHTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSC
+MGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGELCPELPPGSTKRALPTGTSSSPQPKKKP
+LDGEYFTLKIRGRERFEMFRELNEALELKDTQAEKDSGESRAHSSYLKSKKGQSTSRHKKLMIKREGPDSD
+>Eospalaxbaileyi
+MEEPQSDLSIEPPLSQETFSDLWKLLPQNNVLSTSLSPNSMEDLLLSAEDVANWLDDPDDALRMPAAPVTEDPATEASAP
+VAPPPATPWPLSSSVPSQKTYQGNYGFRLGFLHSGTAKSVTCTYSPCLNKLFCQLAKTCPVQLWVDSTPPPGTRVRAMAI
+YKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRAEYLDDKHTFRHSVIVPYEPPEVGSDCTTIHYNYMCNSSC
+MGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGESCPELPPGSTKRALPTDTSSSPQPKKKP
+LLDGEYFTLKIRGRERFEMFRELNEALELKDAQAEKESGESRAHSSYLKSKKGQSTSRHKKLMIKREGPDSD
+>Eospalaxcansus
+MEEPQSDLSIEPPLSQETFSDLWKLLPQNNVLSTSLSPNSMEDLLLSAEDVANWLDDPDDALRMPAAPVTEDPTTEASAP
+VAPPPATPWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVACTYSPCLNKLFCQLAKTCPVQLWVDSTPPPGTRVRAMAI
+YKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRAEYLDDKHTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSC
+MGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGESCPELPPGSTKRALPTGTSSSPQPKKKP
+LLDGEYFTLKIRGRERFEMFRELNEALELKDAQAEKESGESRAHSSYLKSKKGQSTSRHKKLMIKREGPDSD
+>Cricetulus griseus
+MEEPQSDLSIELPLSQETFSDLWKLLPPNNVLSTLPSSDSIEELFLSENVTGWLEDSGGALQGVAAAAASTAEDPVTETP
+APVASAPATPWPLSSSVPSYKTFQGDYGFRLGFLHSGTAKSVTCTYSPSLNKLFCQLAKTCPVQLWVNSTPPPGTRVRAM
+AIYKKLQYMTEVVRRCPHHERSSEGDSLAPPQHLIRVEGNLHAEYLDDKQTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
+SCMGGMNRRPILTIITLEDPSGNLLGRNSFEVRICACPGRDRRTEEKNFQKKGEPCPELPPKSAKRALPTNTSSSPPPKK
+KTLDGEYFTLKIRGHERFKMFQELNEALELKDAQASKGSEDNGAHSSYLKSKKGQSASRLKKLMIKREGPDSD
+>Oryctolagus cuniculus
+MSATAQAGPGGSQEASDPAAAMEESQSDLSLEPPLSQETFSDLWKLLPENNLLTTSLNPPVDDLLSAEDVANWLNEDPEE
+GLRVPAAPAPEAPAPAAPALAAPAPATSWPLSSSVPSQKTYHGNYGFRLGFLHSGTAKSVTCTYSPCLNKLFCQLAKTCP
+VQLWVDSTPPPGSRVRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRAEYLDDRNTFRHSVVVPYEP
+PEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGEPCPELPPG
+SSKRALPTTTTDSSPQTKKKPLDGEYFILKIRGRERFEMFRELNEALELKDAQAEKEPGGSRAHSSYLKAKKGQSTSRHK
+KPMFKREGPDSD
+>Carlito syrichta
+MEEPQSDLSIEPLSQETFSDLWKLLPENNVLSPSLSPPVDDLILSTEDIANWFSEGPDEALRTAPAPVAPTPAASTQAAP
+APGTPWPLSSSVPSQKTYHGNYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQ
+SQYMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDKTTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGG
+MNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGEPCSELPPGSTKRALPTSTSSPSQPKKKPLDG
+EYFTLQIRGRERFEMFRELNEALELKDAQAGKEPGGSRAHTSHLKSKKGQSTSRHKKLMFKREGPDSD
+```
+
+**Aligned by Clustal Omega**
+
+```
+CLUSTAL O(1.2.3) multiple sequence alignment
+
+
+Cricetulus           ---------------------MEEPQSDLSIELPLSQETFSDLWKLLPPNNVLSTL--PS
+Carlito              ---------------------MEEPQSDLSIE-PLSQETFSDLWKLLPENNVLSPS--LS
+Microtus             ---------------------MEEPQSDLSIEPPLSQETFSDLWNLLPPNNVLSTS--LS
+Oryctolagus          MSATAQAGPGGSQEASDPAAAMEESQSDLSLEPPLSQETFSDLWKLLPENNLLTTS--LN
+Nannospalax          ---------------------MEEQQSDLSIEPPLSQETFSDLWKLLPQNNVLSTP--LS
+Eospalaxbaileyi      ---------------------MEEPQSDLSIEPPLSQETFSDLWKLLPQNNVLSTS--LS
+Eospalaxcansus       ---------------------MEEPQSDLSIEPPLSQETFSDLWKLLPQNNVLSTS--LS
+Mastomys             --------------------------------LPLSQETFQRLWKLLPPEAVLSE---AS
+Mus                  ------------------MTAMEESQSDISLELPLSQETFSGLWKLLPPEDILPS-----
+Rattus               ---------------------MEDSQSDMSIELPLSQETFSCLWKLLPPDDILPTTATGS
+                                                      *******. **:*** : :*       
+
+Cricetulus           SDSIEELFL-SENVTGWLEDSGGALQGVAAAAASTAEDPVTETPAPVASAPATPWPLSSS
+Carlito              PP-VDDLILSTEDIANWFSEGPDE--ALRTAPAPV--APTPAASTQAAPAPGTPWPLSSS
+Microtus             VDAMEDLFL-SQDVANWLEEPNEG--PQMSAAASTAEDPVTEAPAPVTPAPVTSWPLSSS
+Oryctolagus          PP--VDDLLSAEDVANWLNEDPEE--GLRVPAAPAPEAPAPAAPALAAPAPATSWPLSSS
+Nannospalax          PNSMEDLLLSPEDVANWLD-DPDE--ALQVPAAAITGDPVTETSAPVAPPPATPWPLSSS
+Eospalaxbaileyi      PNSMEDLLLSAEDVANWLD-DPDD--ALRMPAAPVTEDPATEASAPVAPPPATPWPLSSS
+Eospalaxcansus       PNSMEDLLLSAEDVANWLD-DPDD--ALRMPAAPVTEDPTTEASAPVAPPPATPWPLSSS
+Mastomys             PNSMDNMFL-SPDVVNLLEGPEE---ALQVSAAPAAQDPVTETPAPAAPAPATPWPLSSF
+Mus                  PHCMDDLLL-PQDVEEFFEGPSE---ALRVSGAPAAQDPVTETPGPVAPAPATPWPLSSF
+Rattus               PNSMEDLFL-PQDVAELLEGPEE---ALQVS-APAAQEPGTEAPAPVAPASATPWPLSSS
+                          : :*   ::   :.             *     *   :   .:    * ***** 
+
+Cricetulus           VPSYKTFQGDYGFRLGFLHSGTAKSVTCTYSPSLNKLFCQLAKTCPVQLWVNSTPPPGTR
+Carlito              VPSQKTYHGNYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTR
+Microtus             VPSQKTYQGEYGFRLGFLHSGTAKSVTCTYSPSLNKLFCQLAKTCPVQLWVSSTPPPGTR
+Oryctolagus          VPSQKTYHGNYGFRLGFLHSGTAKSVTCTYSPCLNKLFCQLAKTCPVQLWVDSTPPPGSR
+Nannospalax          VPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPPLNKLFCQLAKTCPVQLWVDSTPPPGTR
+Eospalaxbaileyi      VPSQKTYQGNYGFRLGFLHSGTAKSVTCTYSPCLNKLFCQLAKTCPVQLWVDSTPPPGTR
+Eospalaxcansus       VPSQKTYQGSYGFRLGFLHSGTAKSVACTYSPCLNKLFCQLAKTCPVQLWVDSTPPPGTR
+Mastomys             VPSQKTYQGSYGFHLGFLQSGTAKSVMCTYSPSLNKLFCQLAKTCPVQLWVSDTPPAGSR
+Mus                  VPSQKTYQGNYGFHLGFLQSGTAKSVMCTYSPPLNKLFFQLAKTCPVQLWVSATPPAGSR
+Rattus               VPSQKTYQGNYGFHLGFLQSGTAKSVMCTYSISLNKLFCQLAKTCPVQLWVTSTPPPGTR
+                     *** **::*.***:****:******* ****  ***:* ************  *** *:*
+
+Cricetulus           VRAMAIYKKLQYMTEVVRRCPHHERSSEGDSLAPPQHLIRVEGNLHAEYLDDKQTFRHSV
+Carlito              VRAMAIYKQSQYMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDKTTFRHSV
+Microtus             VRAMAIYKKSQHMTEVVRRCPHHERCSDGDGLAPPQHLIRVEGNLRAEYLDDRQTFRHSV
+Oryctolagus          VRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRAEYLDDRNTFRHSV
+Nannospalax          VRAMAIYKKSQHMTEVVKRCPHHERCSDSDGLAPPQHLIRVEGNLRAEYLDDKHTFRHSV
+Eospalaxbaileyi      VRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRAEYLDDKHTFRHSV
+Eospalaxcansus       VRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRAEYLDDKHTFRHSV
+Mastomys             VRAMAIYKKSQHMTEVVRRCPHHERCTDGDGLAPPQHLIRVEGNLNAEYLDDKQTFRHSV
+Mus                  VRAMAIYKKSQHMTEVVRRCPHHERCSDGDGLAPPQHLIRVEGNLYPEYLEDRQTFRHSV
+Rattus               VRAMAIYKKSQHMTEVVRRCPHHERCSDGDGLAPPQHLIRVEGNPYAEYLDDRQTFRHSV
+                     ********: *:*****:*******.::.*.*************   ***:*: ******
+
+Cricetulus           VVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDPSGNLLGRNSFEVRICA
+Carlito              VVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCA
+Microtus             VVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDPSGNLLGRNSFEVRVCA
+Oryctolagus          VVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCA
+Nannospalax          VVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCA
+Eospalaxbaileyi      IVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCA
+Eospalaxcansus       VVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCA
+Mastomys             VVPYEPPEVGSDYTTIHYKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRICA
+Mus                  VVPYEPPEAGSEYTTIHYKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCA
+Rattus               VVPYEPPEVGSDYTTIHYKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCA
+                     :*******.**: *****:************************ *******:*****:**
+
+Cricetulus           CPGRDRRTEEKNFQKKGEPCPELPPKSAKRALPTNTSSS-PPPKKKTLDGEYFTLKIRGH
+Carlito              CPGRDRRTEEENFRKKGEPCSELPPGSTKRALPTSTSS-PSQPKKKPLDGEYFTLQIRGR
+Microtus             CPGRDRRTEEENFRKKGEPRPELPVGSTKRVLPTNTS--SPQPKKKPLDGEYFTLKIRGR
+Oryctolagus          CPGRDRRTEEENFRKKGEPCPELPPGSSKRALPTTTTDSSPQTKKKPLDGEYFILKIRGR
+Nannospalax          CPGRDRRTEEENFRKKGELCPELPPGSTKRALPTGTSSSPQPKKKP-LDGEYFTLKIRGR
+Eospalaxbaileyi      CPGRDRRTEEENFRKKGESCPELPPGSTKRALPTDTSSSPQPKKKPLLDGEYFTLKIRGR
+Eospalaxcansus       CPGRDRRTEEENFRKKGESCPELPPGSTKRALPTGTSSSPQPKKKPLLDGEYFTLKIRGR
+Mastomys             CPGRDRRTEEENFRKKEEPCPELPLGSAKRALPTGTSAS-PQQKKKRLDGEYFTLKIRGR
+Mus                  CPGRDRRTEEENFRKKEVLCPELPPGSAKRALPTCTSAS-PPQKKKPLDGEYFTLKIRGR
+Rattus               CPGRDRRTEEENFRKKEEHCPELPPGSAKRALPTSTSSS-PQQKKKPLDGEYFTLKIRGR
+                     **********:**:**     ***  *:**.*** *:      **  ****** *:***:
+
+Cricetulus           ERFKMFQELNEALELKDAQASKGSEDNGAHSSYLKSKKGQSASRLKKLMIKREGPDSD
+Carlito              ERFEMFRELNEALELKDAQAGKEPGGSRAHTSHLKSKKGQSTSRHKKLMFKREGPDSD
+Microtus             ERFKMFSELNEALELKDAQDANGSGDSRAHSSYLKSKKGQSTSRHKKLMIKREGPDSD
+Oryctolagus          ERFEMFRELNEALELKDAQAEKEPGGSRAHSSYLKAKKGQSTSRHKKPMFKREGPDSD
+Nannospalax          ERFEMFRELNEALELKDTQAEKDSGESRAHSSYLKSKKGQSTSRHKKLMIKREGPDSD
+Eospalaxbaileyi      ERFEMFRELNEALELKDAQAEKESGESRAHSSYLKSKKGQSTSRHKKLMIKREGPDSD
+Eospalaxcansus       ERFEMFRELNEALELKDAQAEKESGESRAHSSYLKSKKGQSTSRHKKLMIKREGPDSD
+Mastomys             ERFEMFRELNEALELKDARAAEELGDSRAHSSYLKTKRGQSSSHHKKPMVKKVGPDSD
+Mus                  KRFEMFRELNEALELKDAHATEESGDSRAHSSLQPRAFQ--------ALIKEESPNC-
+Rattus               ERFEMFRELNEALELKDARAAEESGDSRAHSSLQPRTFQ--------ALIKKESPNC-
+                     :**:** **********::  :    . **:*                :.*. .*:. 
+```
--- a/TeachingMaterials/2016/HPRexercise.md
+++ b/TeachingMaterials/2016/HPRexercise.md
+# Exploring the Human Protein Atlas
+
+http://www.proteinatlas.org
+
+Antibodies raised against most human proteins.
+Rigorous purification protocol.
+Used to Stain human tissues and cells. 
+
+Key fact: Provides independent validation of cellular location and tissue distribution using commercial (or home produced) antibodies. 
+
+## Example proteins to explore the Atlas
+
+1. A-kinase anchoring proteins are scaffolds for the PKA kinase.
+	- AKAP1
+	- AKAP4
+	- AKAP8
+	- AKAP12
+
+	###Questions
+	- Are these AKAPs all found in the same cell compartments and subcellular locations? 
+	- What happens when you toggle the channels? 
+	- Are these AKAPs found in all tissues? 
+	- Are they highly expressed in cancer cells?
+	- Is there a PKA kinase cascade?
+
+1. c-Myc transcription factor
+
+	### Questions
+	- Do all antibodies perform similarly? (Click for primary data for a summary).
+	- In the cerebral cortex are all cell types stained?
+	- Would we expect that for Myc? 
+	- Which cells are stained in the placenta?
+	- Which cell compartments have Myc? 
+
+3. CTNNB1 – the key Wnt signalling transcription factor beta catenin
+
+	### Questions
+	- Which tissues don’t express any beta catenin? 
+	- Which cancers don’t express any beta catenin? 
+	- Is most of the beta catenin staining in the nucleus? 
+
+4. FGF13 - fibroblast growth factor 13
+
+	### Questions
+	- Is FGF13 strongly expressed in most cancer cells? 
+	- Are there any tissues that don’t stain for FGF13? 
+	- Growth factors are secreted and their receptors are on the cell surface: Which cellular compartments contain FGF13? 
+	- Which cellular compartments in bronchial tissue contain FGF13? 
+
+5. Try your own favourite proteins! 
+
+Let us know if the images make sense to you…
+
+
+
--- a/TeachingMaterials/2016/ProteinBioinfo-MalvikaSharan.pdf
+++ b/TeachingMaterials/2016/ProteinBioinfo-MalvikaSharan.pdf
--- a/TeachingMaterials/2016/ProteinBioinfo-MalvikaSharan.pptx
+++ b/TeachingMaterials/2016/ProteinBioinfo-MalvikaSharan.pptx
--- a/TeachingMaterials/2016/UniProt.md
+++ b/TeachingMaterials/2016/UniProt.md
+## UniProt
+
+1. Introduction
+2. Swiss-Prot (curated) vs. TrEMBL (automated)
+3. Cross-references and link-outs
+	* OMIM, Domains, GO, ...
+
+
+### Introduction
+
+[**UniProt**](http://www.uniprot.org) is a protein database. Its focus is on proteins that have been observed experimentally, particularly by mass spectrometry. It has a dedicated team of curators working on high-quality annotation of these proteins and makes a great **central hub** for all protein-related information. It's my first stop for any protein or gene name.
+
+Its protein focus sets it apart from genomic databases like [**Ensembl**](http://www.ensembl.org) and the [**UCSC Genome Browser**](http://genome.ucsc.edu), which focus on gene loci, transcripts, splicing, and predicting protein sequences from nucleic acids. Ensembl is more inclusive when it comes to splice variants, but its sequences are less rigorously validated than UniProt's and it contains almost no annotation for genes and proteins.
+
+### Swiss-Prot (curated) vs. TrEMBL (automated)
+
+UniProtKB consists of two subsets: Swiss-Prot (the hand-curated part) and TrEMBL, which contains automatically annotated sequences yet to be looked at by the curation team. There are currently around 550,000 annotated proteins in Swiss-Prot and around 70,000,000 in TrEMBL.
+
+Historically, there was only Swiss-Prot, but as more sequencing data flooded in, a fast way of providing at least some annotation (e.g. by transferring annotation over from homologs) was needed, and so TrEMBL was added.
+
+For the model organisms, especially human, mouse and yeast, the manual Swiss-Prot annotation is excellent and frequently updated by the curators as well as through automated pipelines. In these organisms, all proteins are now covered.
+
+### Cross-references and link-outs
+
+UniProt incorporates and links out to a huge number of protein-related databases. It can be considered an authoritative resource that covers nearly all major information sources. The information includes:
+
+- [**Function:**](http://www.uniprot.org/uniprot/P04637#function)
+	- A curated paragraph or two describing the biology of the protein. Individual statements are backed up by literature references. Also includes Gene Ontology (GO) annotation.
+- [**Names & Taxonomy:**](http://www.uniprot.org/uniprot/P04637#names_and_taxonomy)
+	- Alternative gene and protein names that might be in use, as well as information on the organism.
+- [**Subcellular location:**](http://www.uniprot.org/uniprot/P04637#subcellular_location)
+	- Nuclear/cytoplasmic etc., also broken down into isoforms since they might differ in their localisation.
+- [**Pathology & Biotech:**](http://www.uniprot.org/uniprot/P04637#pathology_and_biotech)
+	- Disease-associated variants and somatic mutations from [OMIM](http://omim.org) etc.
+- [**PTM / Processing:**](http://www.uniprot.org/uniprot/P04637#ptm_processing)
+	- Phosphorylation sites etc. largely from mass spectrometry, as well as annotation on signal peptides and other segments that are cleaved off during protein maturation.
+- [**Expression:**](http://www.uniprot.org/uniprot/P04637#expression)
+	- Organismal tissues and cell types where the protein is expressed. Links out to the [Human Protein Atlas](http://www.proteinatlas.org) (HPA) which assays this using antibodies and transcriptomics, as well as other resources.
+- [**Interaction:**](http://www.uniprot.org/uniprot/P04637#interaction)
+	- Known protein-protein or protein-DNA/RNA interactions, sometimes including information on the protein regions involved.
+- [**Structure:**](http://www.uniprot.org/uniprot/P04637#structure)
+	- Information on 3D structures from [PDB](http://www.rcsb.org/pdb/) obtained by X-ray crystallography, NMR or cryo-electron microscopy, as well as the regions they cover.
+- [**Family & Domains:**](http://www.uniprot.org/uniprot/P04637#family_and_domains)
+	- Protein domains that might be catalytic or mediate things like protein-protein interactions from Pfam, InterPro, SMART etc.
+- [**Sequences:**](http://www.uniprot.org/uniprot/P04637#sequences)
+	- The FASTA sequence for the protein's canonical isoform as well as a selection of variants from alternative splicing, alternative promoter usage etc. Note that Ensembl is often more comprehensive when it comes to isoforms, but it doesn't cover proteolytic processing.
+- [**Cross-references:**](http://www.uniprot.org/uniprot/P04637#cross_references)
+	- A comprehensive list of all major external databases that provide additional information aspects on the protein. Also repeats all resources that have been referenced in the previous sections.
+- [**Entry information:**](http://www.uniprot.org/uniprot/P04637#entry_information)
+	- Gives a brief history of the protein entry within UniProt: when it was created, last updated etc.
+- [**Miscellaneous:**](http://www.uniprot.org/uniprot/P04637#miscellaneous)
+	- Some additional terms such as which "[proteome](http://www.uniprot.org/help/reference_proteome)" release(s) this protein is part of. These are based on mass spectrometry and are available for more than just the curated model organisms. They can provide additional confidence e.g. for TrEMBL proteins predicted from nucleic acid sequences only.
+- [**Similar proteins:**](http://www.uniprot.org/uniprot/P04637#similar_proteins)
+	- Links to UniProt's "UniRef" clusters at 100%, 90% and 50% sequence identity. These clusters are useful mainly to reduce bias for protein groups with many members (paralogs) in bioinformatics studies by collapsing them. If you are looking for homologs, a better place is either the phylogenomic databases under "Family & Domains" such as eggNOG or, better yet, look up the gene on [Ensembl](http://www.ensembl.org), e.g. [here](http://www.ensembl.org/Homo_sapiens/Gene/Compara_Tree?db=core;g=ENSG00000141510;r=17:7661779-7687550) for p53.
+
+What's also fantastic is that most pieces of information UniProt displays link directly to the original PubMed article. Alternatively, the source information might show that something was transferred over from another organism (e.g. mouse to human).
+
+### Nice to know
+
+- The "feature viewer" is a little bit hidden: it's at the top of the sidebar on the left. It provides a clear overview of all features along a protein's sequence at a glance (and allows you to expand feature categories that interest you).
+- Each protein has a readable "ID" (e.g. P53_HUMAN) and a more cryptic, but stable "accession" (e.g. P04637). The ID can change as more becomes known about a protein, similar to a gene name, while the accession is always kept the same. Therefore, if you are making e.g. a table for a paper, be sure to include the accessions.
+- In Swiss-Prot, there is a [star rating](http://www.uniprot.org/help/annotation_score) for each protein which gives an indication of how much evidence there is that it exists in the form described.
+- Important features that UniProt doesn't include yet:
+	- Intrinsically disordered regions (can be obtained from [D2P2](http://d2p2.pro))
+	- Many short linear motifs (can be obtained from [ELM](http://elm.eu.org))
+	 For natural variation and disease-causing variants:
+     - Check out the 2-minute variation video on the UniProt YouTube channel (above)
+     - Another fantastic new resource is [ExAC](http://exac.broadinstitute.org), the Exome Aggregation Consortium. They combined exome (transcript) sequences from 60,000 humans. It's by far the biggest resource on human sequence variants to date.
+- There are some really nice short tutorial videos here: [UniProt YouTube channel](https://www.youtube.com/channel/UCkCR5RJZCZZoVTQzTYY92aw)
+- If you are ever in doubt about a particular term or feature (like "accession"), the [Help section](http://www.uniprot.org/help/) is really concise and excellent.
+
+### Individual examples
+
+- A protein you're working on?
+- Annotation transferred from one species to another, "By similarity":
+	- KDM3B's function from [human](http://www.uniprot.org/uniprot/Q7LBC6#function) to [mouse](http://www.uniprot.org/uniprot/Q6ZPY7#function) (note the yellow annotation source tags)
+- p53 in the feature viewer:
+	- The normal UniProt view is one long page: [p53 (normal view)](http://www.uniprot.org/uniprot/P04637).
+	- Alternatively, you can use the "feature viewer" to get a quick overview of what's going on in the protein: [p53 (feature viewer)](http://www.uniprot.org/uniprot/P04637#showFeaturesViewer).
+		- Where does [P53_HUMAN](http://www.uniprot.org/uniprot/P04637#showFeaturesViewer) get post-translationally modified?
+		- Where do its disease mutations happen?
+- Two proteins from one precursor protein:
+	- See if you can find the ghrelin and obestatin peptides within the [precursor protein](http://www.uniprot.org/uniprot/Q9UBU3#showFeaturesViewer)!
+	- Hint hint: Click "Molecule Processing", or [here](http://www.uniprot.org/uniprot/Q9UBU3#ptm_processing)!
+- Isoforms:
+	- Lamin A:
+		- Has a progeria-causing [pathogenic isoform](http://www.uniprot.org/uniprot/P02545#sequences) (number 6, see note), which is produced by unusual splicing if a disease-associated missense SNP is present.
+		- See also the "natural" variant at residue 608. Under references, it says: "[Recurrent de novo point mutations in lamin A cause Hutchinson-Gilford progeria syndrome.](https://www.ncbi.nlm.nih.gov/pubmed/12714972)"
+	- Interleukin-33:
+		- Has a constitutively active isoform (number 3): [IL33_HUMAN](http://www.uniprot.org/uniprot/O95760#sequences)
+	- Ankyrin-1:
+		- Has a muscle-specific isoform (Mu17): [ANK1_HUMAN](http://www.uniprot.org/uniprot/P16157#function)
+- A protein with too many names:
+	- [FOLH1_HUMAN](http://www.uniprot.org/uniprot/Q04609#names_and_taxonomy) (Glutamate carboxypeptidase 2, or N-acetylated-alpha-linked acidic dipeptidase I, or Prostate-specific membrane antigen, or Folate hydrolase 1, or Cell growth-inhibiting gene 27 protein). Good that they're all in here, no?
+- Nice examples of comprehensive subcellular localisation annotation:
+	- [AKP8L_HUMAN](http://www.uniprot.org/uniprot/Q9ULX6#subcellular_location) lists four papers that describe its localisation, colocalisation with other proteins, and potential shuttling in and out of the nucleus.
+	- [AKA7A_HUMAN](http://www.uniprot.org/uniprot/O43687#subcellular_location) has two isoforms with different localisations.
+- Trypsin-2 expression:
+	- [TRY2_HUMAN](http://www.uniprot.org/uniprot/P07478#expression) is very tissue-specific.
+	- Also check out its [entry](http://www.proteinatlas.org/ENSG00000275896-PRSS3P2/tissue) in the Human Protein Atlas (linked in UniProt) for some microscopy images as well as RNA sequencing data.
+- Protein-protein interactions:
+	- [HAND1_MOUSE](http://www.uniprot.org/uniprot/Q64279#interaction) is a transcription factor that needs to form a homodimer to work.
+- Protein domains:
+	- [KDM5C_HUMAN](http://www.uniprot.org/uniprot/P41229#family_and_domains) has at least 3 domains (JmjN, ARID and then JmjC).
+	- We know that it's a [histone demethylase](http://www.uniprot.org/uniprot/P41229#function) acting on H3K4me2/3.
+	- To find out what the domains do, let's follow the link out to Pfam:
+
+		- Scroll down a bit to "Family and domain databases".
+		- Click "[graphical view](http://pfam.xfam.org/protein/P41229)" in the Pfam section.
+		
+		- From there, we can check out the 3 domains UniProt mentioned:
+			- [JmjN](http://pfam.xfam.org/family/JmjN): Nothing much seems to be known about this one except that it occurs N-terminally of JmjC.
+			- [ARID](http://pfam.xfam.org/family/ARID): A DNA-binding domain.
+			- [JmjC](http://pfam.xfam.org/family/JmjC): Looks like it might be a catalytic domain.
+		- Pfam lists a few more domains than UniProt did, actually:
+			- [zf-C5HC2](http://pfam.xfam.org/family/zf-C5HC2): A small "zinc-finger" domain that is thought to bind DNA as well.
+			- [PLU-1](http://pfam.xfam.org/family/PLU-1): A larger domain that may also play a role in DNA binding, but not much is known about it.
+			- [PHD](http://pfam.xfam.org/family/PHD): This one is incredibly well annotated on Pfam compared to the others. It is a very important epigenetic "reader" domain that is thought to specifically bind trimethylated lysines in many cases. Pfam mentions it occurs in over 100 human proteins, and that it might play a role in epigenetic cross-talk with H3K9 trimethylation.
+		- We know that the [catalytic residues](http://www.uniprot.org/uniprot/P41229#function) are 514 (H), 517 (D) and 602 (H). These are all negatively charged and apparently they chelate an iron ion (Fe2+).
+			- Looking at the [feature viewer](http://www.uniprot.org/uniprot/P41229#showFeaturesViewer), we can clearly see that these are indeed in the JmjC domain, making it the catalytic lysine demethylase domain.
+
+Just let me know if you have any questions, ideas or comments, I'm Ben Lang from the Gibson Team  ([lang@embl.de](lang@embl.de))! :)
\ No newline at end of file
--- a/TeachingMaterials/2016/course-2016.md
+++ b/TeachingMaterials/2016/course-2016.md
+|**Workshop**|**Protein bioinformatics for beginners**|
+|----------|:-------------:|------:|
+|**Dates**|8 - 9 November|
+|**Time**|09:30 - 17:00 hrs|
+|**Venue**|ATC Computer lab, EMBL Heidelberg|
+|**Trainers**|Toby Gibson, Marc Gouw, Michael Kuhn, Manjeet Kumar, Benjamin Lang, Malvika Sharan|
+
+## List of resources that will be covered in this workshop
+
+**Part-1 Protein databases and sequence analysis**
+
+1. Protein databases:
+    - [Introduction to protein databases](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/ProteinBioinfo-MalvikaSharan.pdf): Malvika
+    - [Quick overview of NCBI](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/protein_database.md): Malvika
+    - [UniProt](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/UniProt.md): Ben
+        - Swissprot and Trembl
+        - Cross-refrences and link-outs
+            - OMIM, Domains, GO, ...
+
+2. [Study of similar sequences](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/sequence_similarity/tutorial_text.md): Marc
+    - BLAST
+        - BLASTP, BLASTN & PSI-BLAST
+    - HMMER
+    - HHPred
+3. [Multiple sequence alignments](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/multiple_sequence_alignment.md): Malvika
+    - Clustal omega (EMBL-EBI)
+    - COBALT (NCBI)
+4. Other resources
+    - [Human Protein Atlas](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/HPRexercise.md): Toby
+    - [Antibodypedia](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/Antibodypedia.md): Toby
+    - [EMBOSS toolkits](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/EMBOSS_EBI.md): Malvika
+        - [EMBOSS explorer](http://emboss.bioinformatics.nl/cgi-bin/emboss/)
+
+**Part-2 Protein structure analysis**
+
+*Lecture (Toby):* Secondary vs tertiary structure vs protein complexes
+
+1. Protein Structures - Toby
+    - Structure database: PDB at [RCSB](http://www.rcsb.org/pdb/home/home.do)
+    - [Structure visualization](https://docs.google.com/document/d/19gtIv5fqqkEP1sJyIaCzJzMrIPKTSnT0owmk093w2C8/pub)
+        - Chimera
+2. Structure prediction - Malvika
+    - [Secondary and Tertiary structure prediction](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/tertiary_structure_pred.md)
+3. Protein-protein interaction - Michael
+    - STRING and STITCH
+    - Intact
+    - MINT
+4. [Domain databases](https://docs.google.com/document/d/1v7JM9i7yANHasTdpLFZIx_oKIx5K-t0S2o5GnDbXKD0/edit): Manjeet
+    - [SMART](http://smart.embl-heidelberg.de/)
+    - [Pfam](http://pfam.xfam.org/)
+5. [Prediction of transmembrane helices in proteins](https://docs.google.com/document/d/1v7JM9i7yANHasTdpLFZIx_oKIx5K-t0S2o5GnDbXKD0/edit): Manjeet
+    - [TMHMM](http://www.cbs.dtu.dk/services/TMHMM/)
+    - [IUPRED](http://iupred.enzim.hu/) and [Anchor](http://anchor.enzim.hu/)
+6. Intrinsically disordered region: Marc
+    - [ELM](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/elm.md)
+    - [DisProt](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/disprot.md)
+7. Motif visualization:
+     - [Weblogo and MEME](https://git.embl.de/sharan/protein-bioinformatics-embl-hd/blob/master/TeachingMaterials/2016/motif_visualization.md) - Malvika
+     - [Jalview Alignment Viewer](https://docs.google.com/document/d/1Rd7KiqndSW3xqbW_GJc6gfU1dRkjoxR00gCi97F9VMU/pub) - Toby
+
+### Important references
+    - http://www.sciencedirect.com/science/book/9788131222973
+    - http://molbiol-tools.ca/Protein_Chemistry.htm
+    - http://www.ebi.ac.uk/Tools/pfa/
+    - https://toolkit.tuebingen.mpg.de/
+    - http://emboss.sourceforge.net/
+
+### [Post workshop survey](https://www.surveymonkey.de/r/2GCN32Q)
--- a/README.md
+++ b/README.md
@@ -9,24 +9,26 @@

 **Part-1 Protein databases and sequence analysis**

-1. Protein databases: Malvika and Ben
-    - NCBI (quick overview)
-    - UniProt
+1. Protein databases:
+    - Introduction to protein databases
+    - Quick overview of NCBI
+    - UniProt - Ben
        - Swissprot and Trembl 
        - Cross-refrences and link-outs
            - OMIM, Domains, GO, ...
-2. Study of similar sequences: Marc
+2. Study of similar sequences - Marc
    - BLAST
        - BLASTp, BLASTn, PSI-BLAST, ...
    - Diamond
    - HMMER  
    - HHPred
-3. Multiple sequence alignments: Malvika
-    - Muscle, Clustal omega, etc.
-4. Other resources: Toby & Malvika
+3. Multiple sequence alignments
+    - Clustal omega (EMBL-EBI)
+    - COBALT (NCBI)
+4. Other resources
    - Human Protein Atlas
    - Antibodypedia
-    - EMBOSS toolkits (EBI)
+    - EMBOSS toolkits
        - EMBOSS dot-plot
        - dotmatcher
        - Pepinfo

--- a/TeachingMaterials/2016/disprot.md
+++ b/TeachingMaterials/2016/disprot.md
+# DisProt
+
+DisProt is a collection of manually curated disordered protein regions, and
+contains over 800 entries. The DisProt homepage can be found here:
+http://www.disprot.org/
+
+## Exercise 1: Browsing DisProt
+
+Navigate to DisProt hompage, and subsequently to the "Browse" section to browse
+the database content.
+
+- **Question 1:** How many Rabbit proteins are annotated in DisProt?
+
+Find the DisProt entry for (human) **DNA topoisomerase 1**.
+
+- **Question 2:** How many disorered regious exist in this protein?
+- **Question 3:** Which method was used to determine that the region between
+  "175 - 214" is disordered?
--- a/TeachingMaterials/2016/elm.md
+++ b/TeachingMaterials/2016/elm.md
+# Short Linear Motifs
+
+This text was largely adapted from a [tutorial written by Holger
+Dinkel][elm_tutorial] for the [EMBO Practical Course on computational analysis
+of protein-protein interactions][embo_course_ppi]
+
+[elm_tutorial]: http://aidanbudd.github.io/course_EMBO_at_TGAC_PPI_Sep2015/trainingMaterial/holgerDinkel/linear_motifs/
+[embo_course_ppi]: http://aidanbudd.github.io/course_EMBO_at_TGAC_PPI_Sep2015//
+
+## Eukaryotic Linear Motifs
+
+Eukaryotic Linear Motifs (or ELMS) sometimes also known as short Linear Motifs
+(SLiMs) are short sequences typically found in disordered regions that have
+important roles in the function of a protein.
+
+## The ELM database
+
+The [ELM database][elm] is a project who's ultimate goal it is to all occurences
+of ELMs and their function in all known proteins(!). 
+
+It consists of manually annotated entries carefully curated by experts in a
+particular field, working in a certain protein, or a particular motif. These
+annotators are responsible for contributing ELM **classes**, which represent
+linear motifs with a known function, and experimentally verified **instances**
+of this motif.
+
+- **types** There are 6 types of motifs: LIG: ligand binding, MOD:
+  modification, TRG: targeting, DOC: docking, DEG: degradation, CLV: cleavage.
+
+- **class** is a sequence of amino acids with a given function, based on
+  binding partner, modifying enzyme, acting peptidase and targeted subcellular
+  localisation. Each **class** is defined by a **regular expression**
+
+- **instance** an manually annotated occurrence of a **class** in a protein,
+  verified by a literature citable experiment.
+
+[elm]: http://elm.eu.org
+
+## Browsing content
+
+## Exercise 1: Browsing content 
+
+There are two main ways in which the ELM database content can be browsed.
+
+Click on "ELM DB" -> "ELM Classes", or follow the link to the ELM classes page:
+http://elm.eu.org/elms to browse the ELM **classes** that have been annotated.
+
+Use the search (or side filters) to find the ELM motif: **DOC_CYCLIN_1**
+
+- **Question 1:** What does this motif do? 
+- **Question 2:** How many instances are annotated in the database?
+- **Question 3:** Which Gene Ontology terms is this motif associated with?  
+
+This motif was identified in P53 in the sequence: **KKLMF**
+
+- **Question 4:** What is the starting and finishing position of this sequence
+  in P53?
+- **Question 5:** Which experimental protocols were used to infer the existence
+  of this instance? 
+- **Question 6:** How certain are we about this annotation? 
+- **Question 7:** What activates P53 in the pathway to induce apoptosis? 
+
+## Exercise 2: The ELM Prediction tool
+
+Navigate to the "ELM predictions" page.
+
+Search protein **SRC_HUMAN** (accession P12931) for ELMs using the following parameters:
+
+- Cell Compartment: Not specified
+- Motif Probability Cutoff: 100
+- Context information: (leave blank)
+
+Some questions:
+
+- **Question 1:** How many instances do you find?
+- **Question 2:** What can you say about the globularity of the protein? Does
+  it have globular and/or disordered regions?
+
+Redo the above search, this time using the following parameters:
+
+- Cell Compartment: cytosol
+- Motif Probability Cutoff: 0.01
+- Context information: Homo sapiens
+
+Some questions:
+
+- **Question 3:** How many instances do you find now?
+- **Question 4:** How many of the instances are manually annotated?
+- **Question 5:** Do the structural predictors/filters (SMART, GlobPlot,
+  IUPRED, Secondary Structure) agree in terms of which regions are
+structured/disordered?
+- **Question 6:** Compare the location of the annotated instances with
+  structural information at hand (IUPRED, Secondary Structure).
+- **Question 7:** How many deteced instances were removed by the
+  SMART/Structure filter?
+- **Question 8:** For the annotated instances, which of the ELM classes require
+  a phosphorylation at a certain residue of the motif? (Hint: This information
+can be found in the description of the ELM class)
+- **Question 9:** Which residue in SRC_HUMAN corresponds to this and can you
+  find evidence for a phosphorylation of this residue (using Phospho.ELM)?
+
+
+## Exercise 3: The ELM Prediction tool
+
+Search ELM using the protein name **MDM4_HUMAN** and look for the ‘USP binding motif’ **DOC_USP7_MATH_1**
+
+- **Question 1:** How many such motif instances are found in this protein sequence?
+- **Question 2:** How many of these have been exprimentally validated (i.e., are manually annotated?), and what are the "FP" annotations?
+
+## Exercise 4: Switches 
+
+Use the ELM "global search box" (on the top right) to search for the class
+**LIG_SH3_2**. (Just start typing, and wait for the autocomplete to finish).
+
+Click on "LIG_SH3_2" to visit the class page.
+
+- **Question 1:** How many switches are annotated for this class?
+- **Question 2:** What is the mechanism that results in the switching event in **SYNJ2_RAT**?
--- a/TeachingMaterials/2016/motif_visualization.md
+++ b/TeachingMaterials/2016/motif_visualization.md
--- a/TeachingMaterials/2016/multiple_sequence_alignment.md
+++ b/TeachingMaterials/2016/multiple_sequence_alignment.md
+# Multiple sequence alignment
+
+A multiple sequence alignment (MSA) is a method for the comparison of three or more biological sequences (protein, DNA, or RNA) by aligning them against each other. In practice, these query sequences would share an evolutionary relationship (common ancestor). With MSA the distances and similarities between the sequences can be inferred, which facilitates the analysis of phylogenetic association such as evolutionary origins. 
+
+A MSA allows to visualize the conserved locations in the sequences that hold the functional relevance across species as well as mutation events (that appear as hyphens in one or more of the sequences in the alignment) such as insertion, deletion mutations or sunstitutions to allow calculation the rate of evolution. 
+
+MSA is used to define a protein family by assessing sequence conservation of protein domains, tertiary and secondary structures.
+
+[PDF slides](https://git.embl.de/sharan/protein-bioinformatics-nov-2016/blob/master/TeachingMaterials/Multiple_Sequence_Alignment_slides.pdf)
+
+[external slide with comprehensive details on algorithm](http://player.slideplayer.com/17/5286187/#)
+
+## Hands-on session on [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/) for multiple sequence alignment
+
+Clustal omega is the current version of the MSA tools from clustal series. It uses progressive alignment heuristic to build a final MSA, beginning with the most similar pair and progressing to the most distantly related.
+
+The progressive alignment combines all the pairwise alignments in two stages: a first stage in which the relationships between the sequences are represented as a tree (clustering), called a guide tree, and a second step in which the MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree. 
+
+**Availability:**
+- Clustal Omega can be used via the web interface available at http://www.ebi.ac.uk/Tools/msa/clustalo/.
+
+**Input:**
+- It requires protein accession IDs or protein seqences in FASTA format. 
+
+[Frequently asked questions](http://www.ebi.ac.uk/Tools/msa/clustalo/help/faq.html#1)
+
+`What substitution matrix/default parameters are used by Clustal Omega?
+Clustal Omega uses the HHalign algorithm and its default settings as its core alignment engine. The algorithm is described in Söding, J. (2005) 'Protein homology detection by HMM–HMM comparison'. Bioinformatics 21, 951-960.
+The default transition matrix is Gonnet, gap opening penalty is 6 bits, gap extension is 1 bit.`
+
+HHalign:
+HHalign compares two alignments with each other by pairwise alignment of HMMs. It shows the optimal alignment and all significant non-overlapping suboptimal alignments. It also generates a dotplot for which the profile-profile column score is averaged over a window of variable size. If only one alignment is entered, this is compared to itself. Used in this way, HHalign is a very sensitive repeat-identification tool.
+
+### Examples:
+
+To extract examples, we will review our first session of NCBI using following instructions:
+
+1. Search for P53 proteins in NCBI
+2. Select P53 protein from *Mus muscuslus*
+3. Run BLAST on this sequence to identify its homologs
+4. Randomly select 10 hits (avoid multiple sequences from same species)
+5. View GenPept report, and view the summary (top left) as FASTA (text)
+
+These sequences will be the set of queries for your MSA
+
+### Using Clustal Omega
+
+1. Select all the query sequences (Optionally: you can edit the FASTA header by keeping only species name)
+2. Go to Clustal Omega web form, ad paste your query sequences
+3. Choose output format as 'clustal w/ numbers'
+4. Submit you query
+5. Browse your output result
+    * Show colors
+    * Phylogenetic tree
+    * Summary: Percent Identity  Matrix
+
+## Optional exercise: COBALT (NCBI)
+
+COBALT in a tool for multiple sequence alignment, integrated in the NCBI resource for sequence analysis. It alignes sequences by conserved proteins domains and local similarities of the sequences.
+
+1. Go back to your NCBI page of P53 BLAST result
+    * Click on multiple alignment
+    * Browse the result: phylogenetic tree
+2. Randomly select few sequences, go to the GenPept page
+    * In the 'Analyse these sequences', select the option 'Align sequences with COBALT'
+    * Browse your output result: Phylogenetic tree
+
+## List of few other tools for MSA
+
+1. [T-Coffee](http://www.tcoffee.org/)
+2. [UGENE](http://ugene.net/)
+3. [Phylo: interactive video game](http://phylo.cs.mcgill.ca/)
+4. [MUSCLE](http://www.drive5.com/muscle/)
+5. [MAFFT](http://mafft.cbrc.jp/alignment/software/)
+6. [MAVID](http://baboon.math.berkeley.edu/mavid/)
+
+## MSA and MSA related tools on EBI-EMBL
+Link: http://www.ebi.ac.uk/Tools/msa/
+
+
+
--- a/TeachingMaterials/2016/protein_database.md
+++ b/TeachingMaterials/2016/protein_database.md
+# Proteins
+
+## Introduction
+
+Proteins are macromolecules, constituted of long chains of amino acid residues of varying lengths inferred from the corresponding nucleotide sequences of their genes. Proteins are the building block of our body and they are involved in a wide range of biological functions within organisms, that include DNA replication, catalysis of metabolic reactions, response to stimuli, interaction with other biomolecules for pathway regulation, stability, transport, localization or degradation.
+
+## Protein databases
+
+A biological database is an organized collection of a particular type of datasets compiled from a large number of scientifc publications and discoveries, for example, biological sequences or different -omics (transcriptomics, proteomics, metagenomics) data, specific type of annotations, structural data, chemical compounds, biological pathways etc.
+
+The Protein databases contain entries for each protein sequence from all the known proteome sets. There are few well known protein databases like the National Center for Biotechnology Information Reference Sequence project, UniProtKB/SWISS-Prot and the DNA Databank of Japan Amino Acid Sequence Database. 
+
+Protein records are available mainly in text formats that include sequence entries as FASTA and their corresponding annotations in XML formats. The protein entries are generally linked to external resources, allowing users to find relevant data such as literature (Pubmed), genes (NCBI, GenBank database), biological pathways (KEGG database), structures (PDB database), corresponding DNA/RNA sequences, sequence homologs, and expression and variation data.
+
+## Hands-on sessions on protein databases
+
+#### 1. [National Center for Biotechnology Information - NCBI](https://www.ncbi.nlm.nih.gov/)
+
+The NCBI interface provides aceess to several journals and bioinfomatics resources. 
+
+In this course, we will use several protein related resources of NCBI.
+
+###### Example proteins:
+    
+* **Tumor protein P53**: a tumor suppressor protein in human, the absence of which allows many cancers to proliferate.
+    
+###### Search method:
+    
+* Text/term search in [All fields] (simply type in your query)
+* Limiting the search using [filters]
+    - Organism [ORGN]
+    - Source database
+    - Genetic component
+    - Bio-chemical/physical properties etc.
+* Combining multiple search criteria by boolean AND, OR, NOT
+* Browsing by taxonomy (right side of the screen)
+    
+###### Select one record of your choice
+    
+* Browse the GenPept entry
+    - Identical proteins
+    - FASTA entry
+    - Graphical representation of the features
+    - Other linked data
+        - Articles
+        - Pathways
+        - Reference sequences
+        - Homologs
+        - Related information
+        - Link-outs
+    - Analysis options (we will explore these later)
+        - BLAST
+        - Domains
+        - Sequence features
+        - Regular expression
+        - Tertiary structure
+        - Multiple alignment by COBALT
+
+#### 2. [UniProt Knowledgebase](https://www.ebi.ac.uk/uniprot)
+- Swissprot and Trembl
+- Cross-reference
+- Other resources for proteins
+
+
+
--- a/TeachingMaterials/2016/sequence_similarity/images/BLOSUM62.png
+++ b/TeachingMaterials/2016/sequence_similarity/images/BLOSUM62.png
--- a/TeachingMaterials/2016/sequence_similarity/images/alignments.png
+++ b/TeachingMaterials/2016/sequence_similarity/images/alignments.png
--- a/TeachingMaterials/2016/sequence_similarity/images/blosum.gif
+++ b/TeachingMaterials/2016/sequence_similarity/images/blosum.gif
--- a/TeachingMaterials/2016/sequence_similarity/images/descriptions.png
+++ b/TeachingMaterials/2016/sequence_similarity/images/descriptions.png
--- a/TeachingMaterials/2016/sequence_similarity/images/graphic.png
+++ b/TeachingMaterials/2016/sequence_similarity/images/graphic.png
--- a/TeachingMaterials/2016/sequence_similarity/images/graphicsummary.png
+++ b/TeachingMaterials/2016/sequence_similarity/images/graphicsummary.png
--- a/TeachingMaterials/2016/sequence_similarity/images/programselection.jpg
+++ b/TeachingMaterials/2016/sequence_similarity/images/programselection.jpg
No results found