Commit ccbb58cd authored by Martin Larralde's avatar Martin Larralde
Browse files

Add an example use of `TopHits.write` to the example notebooks [ci skip]

parent 6fd7817c
%% Cell type:markdown id: tags:
# Build an HMM from a multiple sequence alignment
%% Cell type:code id: tags:
``` python
import pyhmmer
pyhmmer.__version__
```
%% Cell type:code id: tags:
``` python
alphabet = pyhmmer.easel.Alphabet.amino()
```
%% Cell type:markdown id: tags:
## Loading the alignment
%% Cell type:markdown id: tags:
A new HMM can be built from a single sequence, or from a multiple sequence alignment. Let's load an alignment in digital mode so that we can build our HMM:
%% Cell type:code id: tags:
``` python
with pyhmmer.easel.MSAFile("data/msa/LuxC.sto", digital=True, alphabet=alphabet) as msa_file:
msa = next(msa_file)
```
%% Cell type:markdown id: tags:
<div class="alert alert-info">
Note
In this example, we load a multiple sequence alignment from a file, but if your program produces alignment and you wish to make an HMM out of them, you can instantiate a `TextMSA` object yourself, e.g.:
```python
seq1 = pyhmmer.easel.TextSequence(name=b"seq1", sequence="WVPKQDFT")
seq2 = pyhmmer.easel.TextSequence(name=b"seq2", sequence="WL--PQGE")
msa = pyhmmer.easel.TextMSA(name=b"msa", sequences=[seq1, seq2])
```
Because we need a `DigitalMSA` to build the HMM, you will have to convert it first:
```python
msa_d = msa.digitize(alphabet)
```
</div>
%% Cell type:markdown id: tags:
## Building an HMM
Now that we have a multiple alignment loaded in memory, we can build a pHMM using a `pyhmmer.plan7.Builder`. This also requires a Plan7 background model to compute the transition probabilities.
%% Cell type:code id: tags:
``` python
builder = pyhmmer.plan7.Builder(alphabet)
background = pyhmmer.plan7.Background(alphabet)
hmm, _, _ = builder.build_msa(msa, background)
```
%% Cell type:markdown id: tags:
<div class="alert alert-warning">
Warning
HMMs must always have a name. The `Builder` will attempt to use the `MSA` name, but in the event your alignment file does not contain a name for the whole alignment (for instance, a `#=GF ID` line in a [Stockholm file](https://en.wikipedia.org/wiki/Stockholm_format)), this method will fail:
```python
>>> anonymous_msa = pyhmmer.easel.TextMSA(sequences=[seq1, seq2])
>>> builder.build_msa(anonymous_msa)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Could not build HMM: Unable to name the HMM.
```
</div>
%% Cell type:markdown id: tags:
We can have a look at the consensus sequence of the HMM with the `consensus` property:
%% Cell type:code id: tags:
``` python
hmm.consensus
```
%% Cell type:markdown id: tags:
## Saving the resulting HMM
%% Cell type:markdown id: tags:
Now that we have an HMM, we can save it to a file to avoid having to rebuild it every time. Using the `HMM.write` method lets us write the HMM in ASCII or binary format to an arbitrary file. The resulting file will also be compatible with the `hmmsearch` binary if you wish to use that one instead of pyHMMER.
%% Cell type:code id: tags:
``` python
with open("LuxC.hmm", "wb") as output_file:
hmm.write(output_file)
```
%% Cell type:markdown id: tags:
## Applying the HMM to a sequence database
%% Cell type:markdown id: tags:
Once a pHMM has been obtained, it can be applied to a sequence database with the `pyhmmer.plan7.Pipeline` object. Let's iterate over the protein sequences in a FASTA to see if our new HMM gets any hits:
%% Cell type:code id: tags:
``` python
with pyhmmer.easel.SequenceFile("data/seqs/LuxC.faa", digital=True, alphabet=alphabet) as seq_file:
sequences = list(seq_file)
pipeline = pyhmmer.plan7.Pipeline(alphabet, background=background)
hits = pipeline.search_hmm(query=hmm, sequences=sequences)
```
%% Cell type:markdown id: tags:
We can then query the `TopHits` object to access the domain hits in the sequences:
%% Cell type:code id: tags:
``` python
ali = hits[0].domains[0].alignment
print(ali)
```
%% Cell type:markdown id: tags:
Finally, we can save the results in tabular format similar to what `hmmsearch` would have produced:
%% Cell type:code id: tags:
``` python
with open("LuxC.domtbl", "wb") as f:
hits.write(f, format="domains")
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment