Skip to content
Snippets Groups Projects
Commit 32da861d authored by Niko Papadopoulos's avatar Niko Papadopoulos
Browse files

now running MMseqs2 with -s 7

parent 30b67fc7
No related branches found
No related tags found
No related merge requests found
%% Cell type:code id:19a33b01-d8d4-49de-9e05-302c14fb7c42 tags:
``` python
from datetime import datetime, timezone
import pandas as pd
import pytz
utc_dt = datetime.now(timezone.utc) # UTC time
dt = utc_dt.astimezone()
tz = pytz.timezone('Europe/Berlin')
berlin_now = datetime.now(tz)
print(f'{berlin_now:%Y-%m-%d %H:%M}')
```
%% Output
2023-01-24 16:43
2023-01-25 21:33
%% Cell type:markdown id:55f808fc-0a34-48d5-96b9-659d27f16f13 tags:
The reviewers challenged us to look for the HGT candidates in the nearest non-metazoan outgroup, choanoflagellates. We are using _Salpingoeca rosetta_ and _Monosiga brevicollis_, two model choanoflagellates with publically available genomes.
%% Cell type:code id:c4ad4dc2-a115-4679-8947-c6ed78582bbd tags:
``` python
!curl "https://rest.uniprot.org/uniprotkb/stream?format=fasta&query=%28%28proteome%3AUP000007799%29%29" -o salpingoeca.faa
!curl "https://rest.uniprot.org/uniprotkb/stream?format=fasta&query=%28%28proteome%3AUP000001357%29%29" -o monosiga.faa
!rm -rf tmp/
```
%% Output
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 9376k 0 9376k 0 0 1607k 0 --:--:-- 0:00:05 --:--:-- 1472k
100 9376k 0 9376k 0 0 789k 0 --:--:-- 0:00:11 --:--:-- 321k
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 6562k 0 6562k 0 0 2390k 0 --:--:-- 0:00:02 --:--:-- 2394k
100 6562k 0 6562k 0 0 1575k 0 --:--:-- 0:00:04 --:--:-- 1577k0 1460k 0 --:--:-- 0:00:03 --:--:-- 1462k
%% Cell type:code id:aae09518-9f38-4c3e-90e0-f8595589e748 tags:
``` python
%%bash --out salpingoeca.out --err salpingoeca.err
hgt_candidates="/Users/npapadop/Documents/data/coffe/hgt.pep"
mmseqs easy-search ${hgt_candidates} "./salpingoeca.faa" salpingoeca.m8 tmp --search-type 2
mmseqs easy-search ${hgt_candidates} "./salpingoeca.faa" salpingoeca.m8 tmp --search-type 2 -s 7.0
```
%% Cell type:code id:9f1aa456-1f6f-4523-b923-763b603b649c tags:
``` python
%%bash --out monosiga.out --err monosiga.err
hgt_candidates="/Users/npapadop/Documents/data/coffe/hgt.pep"
mmseqs easy-search ${hgt_candidates} "./monosiga.faa" monosiga.m8 tmp --search-type 2
mmseqs easy-search ${hgt_candidates} "./monosiga.faa" monosiga.m8 tmp --search-type 2 -s 7.0
```
%% Cell type:markdown id:80982650-856c-4b18-be0e-501b240533ab tags:
Let's have a look at the results:
%% Cell type:code id:169716b0-3ce2-4eb7-8e53-915c99ec5122 tags:
``` python
pd.read_csv("monosiga.m8", sep="\t", header=None)
```
%% Output
0 1 2 3 4 5 6 7 8 9 \
0 c103983_g1_i1_m.71422, A9V527 0.251 505 370 0 65 569 66 560
0 1 2 3 4 5 6 7 8 9 \
0 c103983_g1_i1_m.71422, A9V527 0.251 505 370 0 65 569 66 560
1 c97022_g1_i1_m.29482, A9V324 0.302 135 91 0 93 224 114 248
2 c97022_g1_i1_m.29482, A9V989 0.233 255 191 0 61 315 684 933
10 11
0 3.201000e-14 76
1 9.888000e-05 44
2 1.727000e-04 43
%% Cell type:code id:8c324422-faed-4249-89da-cb1dafeac4d1 tags:
``` python
pd.read_csv("salpingoeca.m8", sep="\t", header=None)
```
%% Output
0 1 2 3 4 5 6 7 8 9 \
0 c103983_g1_i1_m.71422, F2UGB0 0.253 490 354 0 63 552 159 633
1 c97022_g1_i1_m.29482, F2TZ54 0.262 260 185 0 56 315 19 270
2 c97022_g1_i1_m.29482, F2UQL9 0.319 135 89 0 93 224 111 245
10 11
0 2.262000e-16 83
1 5.263000e-07 52
2 8.588000e-06 48
%% Cell type:markdown id:8184c03c-4741-4499-a346-a31f4e4ed61c tags:
In both cases the only relevant hit that is found is c103983_g1, the gene EggNOG v5.0 identifies as "metal-dependent hydrolase - Proteobacteria" and MorF putatively identifies as an aminohydrolase.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment