multiple_sequence_alignment.md



Multiple sequence alignment
A multiple sequence alignment (MSA) is a method for the comparison of three or more biological sequences (protein, DNA, or RNA) by aligning them against each other. In practice, these query sequences would share an evolutionary relationship (common ancestor). With MSA the distances and similarities between the sequences can be inferred, which facilitates the analysis of phylogenetic association such as evolutionary origins.
A MSA allows to visualize the conserved locations in the sequences that hold the functional relevance across species as well as mutation events (that appear as hyphens in one or more of the sequences in the alignment) such as insertion, deletion mutations or sunstitutions to allow calculation the rate of evolution.
MSA is used to define a protein family by assessing sequence conservation of protein domains, tertiary and secondary structures.
PDF slides
external slide with comprehensive details on algorithm

Hands-on session on Clustal Omega for multiple sequence alignment
Clustal omega is the current version of the MSA tools from clustal series. It uses progressive alignment heuristic to build a final MSA, beginning with the most similar pair and progressing to the most distantly related.
The progressive alignment combines all the pairwise alignments in two stages: a first stage in which the relationships between the sequences are represented as a tree (clustering), called a guide tree, and a second step in which the MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree.
Availability:

Clustal Omega can be used via the web interface available at http://www.ebi.ac.uk/Tools/msa/clustalo/.

Input:

It requires protein accession IDs or protein seqences in FASTA format.

Frequently asked questions
What substitution matrix/default parameters are used by Clustal Omega? Clustal Omega uses the HHalign algorithm and its default settings as its core alignment engine. The algorithm is described in Söding, J. (2005) 'Protein homology detection by HMM–HMM comparison'. Bioinformatics 21, 951-960. The default transition matrix is Gonnet, gap opening penalty is 6 bits, gap extension is 1 bit.
HHalign:
HHalign compares two alignments with each other by pairwise alignment of HMMs. It shows the optimal alignment and all significant non-overlapping suboptimal alignments. It also generates a dotplot for which the profile-profile column score is averaged over a window of variable size. If only one alignment is entered, this is compared to itself. Used in this way, HHalign is a very sensitive repeat-identification tool.

Examples:
To extract examples, we will review our first session of NCBI using following instructions:

Search for P53 proteins in NCBI
Select P53 protein from Mus muscuslus

Run BLAST on this sequence to identify its homologs
Randomly select 10 hits (avoid multiple sequences from same species)
View GenPept report, and view the summary (top left) as FASTA (text)

These sequences will be the set of queries for your MSA

Using Clustal Omega

Select all the query sequences (Optionally: you can edit the FASTA header by keeping only species name)
Go to Clustal Omega web form, ad paste your query sequences
Choose output format as 'clustal w/ numbers'
Submit you query
Browse your output result

Show colors
Phylogenetic tree
Summary: Percent Identity  Matrix


Optional exercise: COBALT (NCBI)
COBALT in a tool for multiple sequence alignment, integrated in the NCBI resource for sequence analysis. It alignes sequences by conserved proteins domains and local similarities of the sequences.

Go back to your NCBI page of P53 BLAST result

Click on multiple alignment
Browse the result: phylogenetic tree


Randomly select few sequences, go to the GenPept page

In the 'Analyse these sequences', select the option 'Align sequences with COBALT'
Browse your output result: Phylogenetic tree


List of few other tools for MSA

T-Coffee
UGENE
Phylo: interactive video game
MUSCLE
MAFFT
MAVID


MSA and MSA related tools on EBI-EMBL
Link: http://www.ebi.ac.uk/Tools/msa/