name: Galaxy
on:
- push
- pull_request
jobs:
lint:
name: Lint tool
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Update build dependencies
run: pip install -U wheel pip setuptools
- name: Install planemo
run: pip install -U planemo
- name: Lint repository
run: planemo shed_lint --tools ./galaxy
test:
name: Test tool
needs: lint
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Update build dependencies
run: pip install -U wheel pip setuptools
- name: Install planemo
run: pip install -U planemo
- name: Test tool
run: planemo test ./galaxy/gecco.xml
deploy:
environment: Tool Shed
name: Deploy tool
needs: test
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Update build dependencies
run: pip install -U wheel pip setuptools
- name: Install planemo
run: pip install -U planemo
- name: Deploy repository to Tool Shed
if: startsWith(github.ref, 'refs/tags/v')
run: planemo shed_update ./galaxy --owner althonos --name gecco -t toolshed -m ${{ github.event.head_commit.message }} --shed_key ${{ secrets.TOOL_SHED_API_KEY }}
......@@ -5,7 +5,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
## [Unreleased]
[Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3-post1...master
[Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.5...master
## [v0.8.5] - 2021-11-21
[v0.8.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.4...v0.8.5
### Added
- Minimal compatibility support for running GECCO inside of Galaxy workflows.
## [v0.8.4] - 2021-09-26
[v0.8.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3-post1...v0.8.4
......
......@@ -38,7 +38,7 @@ To bump the version of the internal HMMs (for instance, to switch to a newer
version of Pfam), simply edit the INI file for that HMM in the
``gecco/hmmer`` folder.
Then simply clean and rebuild data files to download the latest version of
Then clean and rebuild data files to download the latest version of
the HMMs:
```console
......@@ -48,7 +48,7 @@ $ python setup.py clean build_data --inplace
### Upgrading the internal CRF model
After having trained a new version of the model, simply run the following
After having trained a new version of the model, run the following
command to update the internal GECCO model as well as the hash signature file:
```console
......
......@@ -19,6 +19,7 @@ in genomic and metagenomic data using Conditional Random Fields (CRFs).
[![Preprint](https://img.shields.io/badge/preprint-bioRxiv-darkblue?style=flat-square&maxAge=2678400)](https://www.biorxiv.org/content/10.1101/2021.05.03.442509v1)
[![PyPI](https://img.shields.io/pypi/v/gecco-tool.svg?style=flat-square&maxAge=3600)](https://pypi.python.org/pypi/gecco-tool)
[![Bioconda](https://img.shields.io/conda/vn/bioconda/gecco?style=flat-square&maxAge=3600)](https://anaconda.org/bioconda/gecco)
[![Galaxy](https://img.shields.io/badge/Galaxy-GECCO-darkblue?style=flat-square&maxAge=3600)](https://toolshed.g2.bx.psu.edu/repository?repository_id=c29bc911b3fc5f8c)
[![Versions](https://img.shields.io/pypi/pyversions/gecco-tool.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/gecco-tool/#files)
[![Wheel](https://img.shields.io/pypi/wheel/gecco-tool?style=flat-square&maxAge=3600)](https://pypi.org/project/gecco-tool/#files)
......@@ -69,7 +70,7 @@ Additional parameters of interest are:
- `--jobs`, which controls the number of threads that will be spawned by
GECCO whenever a step can be parallelized. The default, *0*, will
autodetect the number of CPUs on the machine using
[`multiprocessing.cpu_count`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.cpu_count).
[`os.cpu_count`](https://docs.python.org/3/library/os.html#os.cpu_count).
- `--cds`, controlling the minimum number of consecutive genes a BGC region
must have to be detected by GECCO (default is 3).
- `--threshold`, controlling the minimum probability for a gene to be
......@@ -98,7 +99,7 @@ in a simple, easily reproducible situation.
### 🏗️ Contributing
Contributions are more than welcome! See [`CONTRIBUTING.md`](https://github.com/althonos/pyhmmer/blob/master/CONTRIBUTING.md)
Contributions are more than welcome! See [`CONTRIBUTING.md`](https://github.com/zellerlab/GECCO/blob/master/CONTRIBUTING.md)
for more details.
## ⚖️ License
......
......@@ -3,7 +3,7 @@ GECCO
*Biosynthetic Gene Cluster prediction with Conditional Random Fields.*
|GitLabCI| |License| |Coverage| |Source| |Mirror| |Issues| |Preprint| |PyPI| |Bioconda| |Versions| |Wheel|
|GitLabCI| |License| |Coverage| |Source| |Mirror| |Issues| |Preprint| |PyPI| |Bioconda| |Galaxy| |Versions| |Wheel|
.. |GitLabCI| image:: https://img.shields.io/gitlab/pipeline/grp-zeller/GECCO/master?gitlab_url=https%3A%2F%2Fgit.embl.de&logo=gitlab&style=flat-square&maxAge=600
:target: https://git.embl.de/grp-zeller/GECCO/-/pipelines
......@@ -38,6 +38,9 @@ GECCO
.. |Wheel| image:: https://img.shields.io/pypi/wheel/gecco-tool?style=flat-square&maxAge=3600&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAABhGlDQ1BJQ0MgcHJvZmlsZQAAKJF9kT1Iw0AcxV9TS0UqCnYQdchQnSxIFXHUKhShQqgVWnUwufQLmjQkKS6OgmvBwY/FqoOLs64OroIg+AHi5uak6CIl/i8ptIj14Lgf7+497t4BQr3MNKtrAtB020wl4mImuyoGXxFAEP0YRkxmljEnSUl0HF/38PH1LsqzOp/7c/SqOYsBPpF4lhmmTbxBPL1pG5z3icOsKKvE58TjJl2Q+JHrisdvnAsuCzwzbKZT88RhYrHQxkobs6KpEU8RR1RNp3wh47HKeYuzVq6y5j35C0M5fWWZ6zRHkMAiliBBhIIqSijDRpRWnRQLKdqPd/APuX6JXAq5SmDkWEAFGmTXD/4Hv7u18pMxLykUBwIvjvMxCgR3gUbNcb6PHadxAvifgSu95a/UgZlP0mstLXIE9G0DF9ctTdkDLneAwSdDNmVX8tMU8nng/Yy+KQsM3AI9a15vzX2cPgBp6ip5AxwcAmMFyl7v8O7u9t7+PdPs7wdys3KnxRVKKQAAAAZiS0dEAP8A/wD/oL2nkwAABH9JREFUWMPFV81LJEcUr5kBD3PIQSYIEiYE8qG5iAgbCLiwJz/CXpR4WEMQFxL2H9hbTsldWMSFHILgJZBLUGHvHnKJOEO3joGNOOM6OuOsTPdMV3V3fbyXQ6onNT1+9ChLHjR0V3XVe1Xv937vPYKIJMlj23aOMbYopVxHxF0hhEAt+n1XSrnu+/7iwcFBLum+t/4AAKNCiA0A8DGhAIAvpdwAgNH7GJD1PO8lAOBdBQDQ87yXiJi9Tk8KEUlczs7ORoaGhrYymczHpFcuhRB/IuLrTCZzSAghSqnRVCr1STqd/jKTybwXX6CU+rterz8eHh7+q2e3uEXNZvMh57wdP00QBDuc89mlpaXI6AHbtscsyxojhAwgIpmbm0txzmeDINiJr+ect5vN5sMbXVCtVkfiygGgQimdRkTiOM44pXRNCHEUVyCEOKKUrrmuO46IhDE2DQCVuBHVanXkOgOyUsrXsX13CoXCoOu6ed/3N5P63vf9zVarlS8UCoOI2HUbWke2xwANli7lYRhmEXFSCOFFg0qpBmNs9fT09AdD4U+U0lWlVMO4EQ8RJ4MgyMaN0Lr+MwAARk20A0BFWz+JiFQPU8bY883NzSwiEinlhLHnBCKSra2tLGPsubkGESf39vYGTXcAAEYhShCRCCE2TAsppdOu6+ajkwPAm3K5/CAGoI4BjUajC1yVSuUBALyJbsJ13bznedMxzGwgIiH7+/s5k2Q0gonp83q9PhtHb6PReBTNu667FJ+v1WpLJiYQkZjRAQC+bds5whhbjCF1ViPZdAkVQqxwzjsU6/v+98Yv30XjpVLpszAMN5RSXYByHGc8DMNZc4wxtkg0t0cIdefn51OU0jWt+BIAqOkdpdSK7/sLiHhojB9SSp8wxn6RUoIB2DoiXmi3rk1NTaUQ8a2hb50g4q5x+lcaE0carauVSuUjIcSrfmmYMfarZVk5z/NWI57QgP/d+G2XmFktDMMXmtWi76+jq3Vd91sAuDROt9NsNvOO4+QBYMc41bnv+4+jdXoPxH/pcyAIghfGgQWJkcQz27bH4uFloHvCiJQPDOM+jMaPj48/vy5aLMsak1I+M3WmSR8CAJ13KWXnPZ1OdzJaq9Ua6GfPxC5ot9vfAMBb0wWO4+Rd1+1xQRAEfbngRhCen5/n9XhfQin9rVQq3Q7CeBguLCx0haFSqisMAWAlDMOeMPQ870kQBD8LIa4Nw5mZmd4wvIqIHMcZjx9IKbXCOX/fuNqnVxGRbdufBkGQnIgSUvFXcaq9uLjoUHGr1Xoanz87O7uVii3Lyl2ZjBhjfSWjWq32yJwrl8tfxJORLmp6k9F16bhYLN6YjjnnPel4e3v7ynRcKBRuTsc6zBIXJJTS1ZOTE7Mg+ZExdmVBovfoKki0rmQlWbFYHHQc504lmb7FZCXZTUUpY6xTlHqe986K0kRl+fLycqcstyzr3mV5342JUqoFAH+8s8bExES73b53a6YBl71zc6qU+t+a067Hsqyu9pxzLsysFrXnjLFF27YTt+f/AKtN0SMRWK0jAAAAAElFTkSuQmCC
:target: https://pypi.org/project/gecco-tool/#files
.. |Galaxy| image:: https://img.shields.io/badge/Galaxy-GECCO-darkblue?style=flat-square&maxAge=3600&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAYAAACqaXHeAAABhWlDQ1BJQ0MgcHJvZmlsZQAAKJF9kT1Iw1AUhU9TpaIVh1YQcchQnSxIFXHUKhShQqgVWnUweekfNGlIUlwcBdeCgz+LVQcXZ10dXAVB8AfEzc1J0UVKvC8ttIjxwuN9nHfP4b37AKFeZprVNQFoum2mEnExk10VA6/owyBC8CEmM8uYk6QkPOvrnvqo7qI8y7vvz+pXcxYDfCLxLDNMm3iDeHrTNjjvE4dZUVaJz4nHTbog8SPXlSa/cS64LPDMsJlOzROHicVCBysdzIqmRjxFHFE1nfKFTJNVzluctXKVte7JXxjM6SvLXKc1ggQWsQQJIhRUUUIZNqK066RYSNF53MM/7PolcinkKoGRYwEVaJBdP/gf/J6tlZ+MNZOCcaD7xXE+RoHALtCoOc73seM0TgD/M3Clt/2VOjDzSXqtrUWOgIFt4OK6rSl7wOUOMPRkyKbsSn5aQj4PvJ/RN2WB0C3Qu9acW+scpw9AmmaVvAEODoGxAmWve7y7p3Nu//a05vcDa5hypGma9BsAAAAGYktHRAD/AP8A/6C9p5MAAAKHSURBVHja7Zo/ixRJGIefX3XPeqvsIohygQcnggaCYCSYyKUGHmd8H+DCS/0oB3efwU8gaGJkpqIg/kETM7ldUKt2+v1d0LPLsudisIOzw7wPNEV1zdRUPzVd/b5FQ5IkSZIkq4r2V7a3t08Al4CyBGMP8IuNjc2do3TS76/Y/hv4fYkm8C/gj6N0cHCmf16yf/BPR+2grPoakAJSQApIASkgBaSAFLCy9Afq94HLSzT+h3PNBmcJUfe188cQSxoyoU+S+a0BW1tbf0rcBboFjeejze3Nzc3ni3oK3LE5t8AJOQNcB76bgPKtp0IGQikgBaSAFJACUkAKSAErIuD9gscTwIdFJkOngKsLTIb+nUz6p+vrJ515arKAW+DonNUN4qstj1gH1oAp8G7vfAcMABeBV3vFyAW48gaeHejrN+Ae+FgJePJgcgv457D1Q3gw7CAapgFV0Lxbyk2m7rYbVRg/NysbqIKbiVagghpSRTQFzXItlBamIrei0kTUKdE6T2rfT9t0UDvdn6xRt9r5Xxj6uS3foZvAj4eb9tSmUqiYOruYClSshhjrUsWuoImhyvTAxKIX6ow7oBgEEgIkKcCyMR4gQOFOg4JuajqLUgaVzyFtfJL4Mvc4YDn3UjIQSgEpIAWkgBSQAlLAyjK3UFjye40Jyv9fugAkwGgWvBZEwRRwQZKFNH53PITssRzfhdAY9Ho3+EXstgtUxt0UCTSLk8vsl7qAIqQCXYB67yUBc4tfXz6mvH09uTxEOUyqUSAVi7CNRbGJGB0VQ4SwjSyF5RIGg21hrBCYgh2yiHDIFLkwGEoEcmHHaM2YQFOrkztji1g/Ff6h4mu/RuRmQJIkSbLi/Ac/0AO/ft/DTwAAAABJRU5ErkJggg==
:target: https://toolshed.g2.bx.psu.edu/repository?repository_id=c29bc911b3fc5f8c
Overview
--------
......
categories:
- Genome annotation
- Sequence Analysis
- Metagenomics
description: 'Biosynthetic Gene Cluster prediction with Conditional Random Fields with GECCO.'
name: gecco
owner: althonos
homepage_url: https://gecco.embl.de
long_description: |
GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
It is developed in the Zeller group and is part of the suite of computational microbiome analysis tools hosted at EMBL.
remote_repository_url: https://github.com/zellerlab/GECCO/tree/master/galaxy
type: unrestricted
Hi, I’m GECCO!
==============
.. image:: https://raw.githubusercontent.com/zellerlab/GECCO/v0.6.2/static/gecco-square.png
:target: https://github.com/zellerlab/GECCO/
🦎 ️Overview
---------------
GECCO (Gene Cluster prediction with Conditional Random Fields) is a fast
and scalable method for identifying putative novel Biosynthetic Gene
Clusters (BGCs) in genomic and metagenomic data using Conditional Random
Fields (CRFs).
|GitLabCI| |License| |Coverage| |Docs| |Source| |Mirror| |Changelog|
|Issues| |Preprint| |PyPI| |Bioconda| |Galaxy| |Versions| |Wheel|
🔧 Installing GECCO
-------------------
GECCO is implemented in `Python <https://www.python.org/>`__, and
supports `all versions <https://endoflife.date/python>`__ from Python
3.6. It requires additional libraries that can be installed directly
from `PyPI <https://pypi.org>`__, the Python Package Index.
Use `pip <https://pip.pypa.io/en/stable/>`__ to install GECCO on
your machine::
$ pip install gecco-tool
If you’d rather use `Conda <https://conda.io>`__, a package is available
in the `bioconda <https://bioconda.github.io/>`__ channel. You can
install with::
$ conda install -c bioconda gecco
This will install GECCO, its dependencies, and the data needed to run
predictions. This requires around 100MB of data to be downloaded, so it
could take some time depending on your Internet connection. Once done,
you will have a ``gecco`` command available in your $PATH.
*Note that GECCO uses* `HMMER3 <http://hmmer.org/>`__, *which can
only run on PowerPC and recent x86-64 machines running a POSIX operating
system. Therefore, Linux and OSX are supported platforms, but GECCO will
not be able to run on Windows.*
🧬 Running GECCO
-----------------
Once ``gecco`` is installed, you can run it from the terminal by giving
it a FASTA or GenBank file with the genomic sequence you want to
analyze, as well as an output directory::
$ gecco run --genome some_genome.fna -o some_output_dir
Additional parameters of interest are:
- ``--jobs``, which controls the number of threads that will be spawned
by GECCO whenever a step can be parallelized. The default, *0*, will
autodetect the number of CPUs on the machine using
`os.cpu_count <https://docs.python.org/3/library/os.html#os.cpu_count>`__.
- ``--cds``, controlling the minimum number of consecutive genes a BGC
region must have to be detected by GECCO (default is 3).
- ``--threshold``, controlling the minimum probability for a gene to be
considered part of a BGC region. Using a lower number will increase
the number (and possibly length) of predictions, but reduce accuracy.
🔖 Reference
-------------
GECCO can be cited using the following preprint:
**Accurate de novo identification of biosynthetic gene clusters with
GECCO**. Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby
Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller.
bioRxiv 2021.05.03.442509;
`doi:10.1101/2021.05.03.442509 <https://doi.org/10.1101/2021.05.03.442509>`__
💭 Feedback
------------
⚠️ Issue Tracker
~~~~~~~~~~~~~~~~
Found a bug ? Have an enhancement request ? Head over to the `GitHub
issue tracker <https://github.com/zellerlab/GECCO/issues>`__ if you need
to report or ask something. If you are filing in on a bug, please
include as much information as you can about the issue, and try to
recreate the same bug in a simple, easily reproducible situation.
🏗️ Contributing
~~~~~~~~~~~~~~~~
Contributions are more than welcome! See
`CONTRIBUTING.md <https://github.com/zellerlab/GECCO/blob/master/CONTRIBUTING.md>`__
for more details.
⚖️ License
----------
This software is provided under the `GNU General Public License v3.0 or
later <https://choosealicense.com/licenses/gpl-3.0/>`__. GECCO is
developped by the `Zeller
Team <https://www.embl.de/research/units/scb/zeller/index.html>`__ at
the `European Molecular Biology Laboratory <https://www.embl.de/>`__ in
Heidelberg.
.. |GitLabCI| image:: https://img.shields.io/gitlab/pipeline/grp-zeller/GECCO/master?gitlab_url=https%3A%2F%2Fgit.embl.de&style=flat-square&maxAge=600
:target: https://git.embl.de/grp-zeller/GECCO/-/pipelines/
.. |License| image:: https://img.shields.io/badge/license-GPLv3-blue.svg?style=flat-square&maxAge=2678400
:target: https://choosealicense.com/licenses/gpl-3.0/
.. |Coverage| image:: https://img.shields.io/codecov/c/gh/zellerlab/GECCO?style=flat-square&maxAge=600
:target: https://codecov.io/gh/zellerlab/GECCO/
.. |Docs| image:: https://img.shields.io/badge/docs-gecco.embl.de-green.svg?maxAge=2678400&style=flat-square
:target: https://gecco.embl.de
.. |Source| image:: https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square
:target: https://github.com/zellerlab/GECCO/
.. |Mirror| image:: https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square&maxAge=2678400
:target: https://git.embl.de/grp-zeller/GECCO/
.. |Changelog| image:: https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400&style=flat-square
:target: https://github.com/zellerlab/GECCO/blob/master/CHANGELOG.md
.. |Issues| image:: https://img.shields.io/github/issues/zellerlab/GECCO.svg?style=flat-square&maxAge=600
:target: https://github.com/zellerlab/GECCO/issues
.. |Preprint| image:: https://img.shields.io/badge/preprint-bioRxiv-darkblue?style=flat-square&maxAge=2678400
:target: https://www.biorxiv.org/content/10.1101/2021.05.03.442509v1
.. |PyPI| image:: https://img.shields.io/pypi/v/gecco-tool.svg?style=flat-square&maxAge=3600
:target: https://pypi.python.org/pypi/gecco-tool
.. |Bioconda| image:: https://img.shields.io/conda/vn/bioconda/gecco?style=flat-square&maxAge=3600
:target: https://anaconda.org/bioconda/gecco
.. |Versions| image:: https://img.shields.io/pypi/pyversions/gecco-tool.svg?style=flat-square&maxAge=3600
:target: https://pypi.org/project/gecco-tool/#files
.. |Wheel| image:: https://img.shields.io/pypi/wheel/gecco-tool?style=flat-square&maxAge=3600
:target: https://pypi.org/project/gecco-tool/#files
.. |Galaxy| image:: https://img.shields.io/badge/Galaxy-GECCO-darkblue?style=flat-square&maxAge=3600
:target: https://toolshed.g2.bx.psu.edu/repository?repository_id=c29bc911b3fc5f8c
<?xml version='1.0' encoding='utf-8'?>
<tool id="gecco" name="GECCO" version="0.8.5" python_template_version="3.5">
<description>GECCO (Gene Cluster prediction with Conditional Random Fields) is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).</description>
<requirements>
<requirement type="package" version="0.8.5">gecco</requirement>
</requirements>
<version_command>gecco --version</version_command>
<command detect_errors="aggressive"><![CDATA[
#if str($input.ext) == 'genbank':
#set $file_extension = 'gbk'
#else:
#set $file_extension = $input.ext
#end if
ln -s '$input' input_tempfile.$file_extension &&
gecco -vv run
--format $input.ext
--genome input_tempfile.$file_extension
--postproc $postproc
--force-clusters-tsv
#if $cds:
--cds $cds
#end if
#if $threshold:
--threshold $threshold
#end if
#if $antismash_sideload:
--antismash-sideload
#end if
&& mv input_tempfile.features.tsv $features
&& mv input_tempfile.clusters.tsv $clusters
#if $antismash_sideload
&& mv input_tempfile.sideload.json $sideload
#end if
]]></command>
<inputs>
<param name="input" type="data" format="genbank,fasta,embl" label="Sequence file in GenBank, EMBL or FASTA format"/>
<param argument="--cds" type="integer" min="0" value="" optional="true" label="Minimum number of genes required for a cluster"/>
<param argument="--threshold" type="float" min="0" max="1" value="" optional="true" label="Probability threshold for cluster detection"/>
<param argument="--postproc" type="select" label="Post-processing method for gene cluster validation">
<option value="antismash">antiSMASH</option>
<option value="gecco" selected="true">GECCO</option>
</param>
<param argument="--antismash-sideload" type="boolean" checked="false" label="Generate an antiSMASH v6 sideload JSON file"/>
</inputs>
<outputs>
<collection name="records" type="list" label="${tool.name} detected Biosynthetic Gene Clusters on ${on_string} (GenBank)">
<discover_datasets pattern="(?P&lt;designation&gt;.*)\.gbk" ext="genbank" visible="false" />
</collection>
<data name="features" format="tabular" label="${tool.name} summary of detected features on ${on_string} (TSV)"/>
<data name="clusters" format="tabular" label="${tool.name} summary of detected BGCs on ${on_string} (TSV)"/>
<data name="sideload" format="json" label="antiSMASH v6 sideload file with ${tool.name} detected BGCs on ${on_string} (JSON)">
<filter>antismash_sideload</filter>
</data>
</outputs>
<tests>
<test>
<param name="input" value="BGC0001866.fna"/>
<output name="features" file="features.tsv"/>
<output name="clusters" file="clusters.tsv"/>
<output_collection name="records" type="list">
<element name="BGC0001866.1_cluster_1" file="BGC0001866.1_cluster_1.gbk" ftype="genbank" lines_diff="2"/>
</output_collection>
</test>
<test>
<param name="input" value="BGC0001866.fna"/>
<param name="antismash_sideload" value="True"/>
<output name="features" file="features.tsv"/>
<output name="clusters" file="clusters.tsv"/>
<output name="sideload" file="sideload.json"/>
<output_collection name="records" type="list">
<element name="BGC0001866.1_cluster_1" file="BGC0001866.1_cluster_1.gbk" ftype="genbank" lines_diff="2"/>
</output_collection>
</test>
</tests>
<help>
<![CDATA[
**Overview**
GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
It is developed in the Zeller group and is part of the suite of computational microbiome analysis tools hosted at EMBL.
**Input**
GECCO works with DNA sequences, and loads them using Biopython, allowing it to support a large variety of formats, including the common FASTA and GenBank files.
**Output**
GECCO will create the following files once done (using the same prefix as the input file):
- features.tsv: The features file, containing the identified proteins and domains in the input sequences.
- clusters.tsv: If any were found, a clusters file, containing the coordinates of the predicted clusters, along their putative biosynthetic type.
- {sequence}_cluster_{N}.gbk: If any BGCs were found, a GenBank file per cluster, containing the cluster sequence annotated with its member proteins and domains.
**Contact**
If you have any question about GECCO, if you run into any issue, or if you would like to make a feature request, please create an issue in the GitHub repository.
You can also directly contact Martin Larralde via email. If you want to contribute to GECCO, please have a look at the contribution guide first, and feel free to
open a pull request on the GitHub repository.
]]>
</help>
<citations>
<citation type="bibtex">
@article {Carroll2021.05.03.442509,
author = {Carroll, Laura M. and Larralde, Martin and Fleck, Jonas Simon and Ponnudurai, Ruby and Milanese, Alessio and Cappio, Elisa and Zeller, Georg},
title = {Accurate de novo identification of biosynthetic gene clusters with GECCO},
elocation-id = {2021.05.03.442509},
year = {2021},
doi = {10.1101/2021.05.03.442509},
publisher = {Cold Spring Harbor Laboratory},
abstract = {Biosynthetic gene clusters (BGCs) are enticing targets for (meta)genomic mining efforts, as they may encode novel, specialized metabolites with potential uses in medicine and biotechnology. Here, we describe GECCO (GEne Cluster prediction with COnditional random fields; https://gecco.embl.de), a high-precision, scalable method for identifying novel BGCs in (meta)genomic data using conditional random fields (CRFs). Based on an extensive evaluation of de novo BGC prediction, we found GECCO to be more accurate and over 3x faster than a state-of-the-art deep learning approach. When applied to over 12,000 genomes, GECCO identified nearly twice as many BGCs compared to a rule-based approach, while achieving higher accuracy than other machine learning approaches. Introspection of the GECCO CRF revealed that its predictions rely on protein domains with both known and novel associations to secondary metabolism. The method developed here represents a scalable, interpretable machine learning approach, which can identify BGCs de novo with high precision.Competing Interest StatementThe authors have declared no competing interest.},
URL = {https://www.biorxiv.org/content/early/2021/05/04/2021.05.03.442509},
eprint = {https://www.biorxiv.org/content/early/2021/05/04/2021.05.03.442509.full.pdf},
journal = {bioRxiv}
}
</citation>
</citations>
</tool>
../../tests/test_cli/data/BGC0001866.1_cluster_1.gbk
\ No newline at end of file
../../tests/test_cli/data/BGC0001866.fna
\ No newline at end of file
../../tests/test_cli/data/BGC0001866.clusters.tsv
\ No newline at end of file
../../tests/test_cli/data/BGC0001866.features.tsv
\ No newline at end of file
../../tests/test_cli/data/BGC0001866.sideload.json
\ No newline at end of file
......@@ -10,4 +10,4 @@ See Also:
__author__ = "Martin Larralde"
__license__ = "GPLv3"
__version__ = "0.8.4"
__version__ = "0.8.5"
......@@ -171,7 +171,7 @@ class Annotate(Command): # noqa: D101
task = self.progress.add_task(description=f"{hmm.id} v{hmm.version}", total=hmm.size, unit="domains", precision="")
callback = lambda h, t: self.progress.update(task, advance=1)
self.info("Starting", f"annotation with [bold blue]{hmm.id} v{hmm.version}[/]", level=2)
features = PyHMMER(hmm, self.jobs, whitelist).run(genes, progress=callback)
PyHMMER(hmm, self.jobs, whitelist).run(genes, progress=callback)
self.success("Finished", f"annotation with [bold blue]{hmm.id} v{hmm.version}[/]", level=2)
self.progress.update(task_id=task, visible=False)
......
......@@ -63,6 +63,8 @@ class Run(Annotate): # noqa: D101
output files. [default: .]
--antismash-sideload write an AntiSMASH v6 sideload JSON
file next to the output files.
--force-clusters-tsv always write a ``clusters.tsv`` file
even when no clusters were found.
Parameters - Domain Annotation:
-e <e>, --e-filter <e> the e-value cutoff for protein domains
......@@ -117,6 +119,7 @@ class Run(Annotate): # noqa: D101
self.hmm = self._check_flag("--hmm")
self.output_dir = self._check_flag("--output-dir")
self.antismash_sideload = self._check_flag("--antismash-sideload", bool)
self.force_clusters_tsv = self._check_flag("--force-clusters-tsv", bool)
except InvalidArgument:
raise CommandExit(1)
......@@ -324,6 +327,8 @@ class Run(Annotate): # noqa: D101
self.success("Found", len(clusters), "potential gene clusters", level=1)
else:
self.warn("No gene clusters were found")
if self.force_clusters_tsv:
self._write_cluster_table(clusters)
return 0
# predict types for putative clusters
clusters = self._predict_types(clusters)
......
This diff is collapsed.
sequence_id bgc_id start end average_p max_p bgc_types proteins domains
BGC0001866.1 BGC0001866.1_cluster_1 11550 32979 0.9160481049566951 0.9999881747850234 Polyketide BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23 PF00106;PF00107;PF00108;PF00550;PF00698;PF00975;PF01073;PF02801;PF05401;PF06080;PF06609;PF07690;PF08242;PF08493;PF08659;PF10294;PF12697;PF13489;PF13561;PF13602;PF13649;PF13847;PF14765;PF16073;PF16197;TIGR00128;TIGR00710;TIGR00880;TIGR00891;TIGR00893;TIGR00895;TIGR01829;TIGR01830;TIGR01831;TIGR01930;TIGR01934;TIGR01983;TIGR02072;TIGR02813;TIGR02824;TIGR02825;TIGR03131;TIGR03150;TIGR03206;TIGR03451;TIGR04316;TIGR04532
sequence_id bgc_id start end average_p max_p type alkaloid_probability polyketide_probability ripp_probability saccharide_probability terpene_probability nrp_probability other_probability proteins domains
BGC0001866.1 BGC0001866.1_cluster_1 347 32979 0.9969495815733557 0.9999999447224028 Polyketide 0.0 0.98 0.0 0.0 0.0 0.14 0.0 BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23 PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197
sequence_id protein_id start end strand domain hmm i_evalue domain_start domain_end bgc_probability pvalue
BGC0001866.1 BGC0001866.1_1 347 1489 - PF00394 Pfam 1.9e-10 1 64 0.1855629937118057 6.613296206056387e-14
BGC0001866.1 BGC0001866.1_1 347 1489 - TIGR03388 Tigrfam 1.5e-07 1 88 0.1855629937118057 6.297229219143576e-11
BGC0001866.1 BGC0001866.1_1 347 1489 - TIGR03390 Tigrfam 4.3e-06 1 120 0.1855629937118057 1.8052057094878254e-09
BGC0001866.1 BGC0001866.1_1 347 1489 - TIGR03388 Tigrfam 1.3e-18 118 295 0.1855629937118057 5.4575986565911e-22
BGC0001866.1 BGC0001866.1_1 347 1489 - PF07731 Pfam 3.4e-25 125 284 0.1855629937118057 1.183431952662722e-28
BGC0001866.1 BGC0001866.1_1 347 1489 - TIGR03389 Tigrfam 3e-18 139 290 0.1855629937118057 1.2594458438287154e-21
BGC0001866.1 BGC0001866.1_1 347 1489 - TIGR03390 Tigrfam 1.5e-11 145 283 0.1855629937118057 6.2972292191435764e-15
BGC0001866.1 BGC0001866.1_3 2513 2722 - PF07732 Pfam 4.6e-08 25 65 0.1267424785460721 1.6011138183083884e-11
BGC0001866.1 BGC0001866.1_6 3946 4389 + PF00891 Pfam 4.1e-18 9 124 0.1119208771850534 1.427079707622694e-21
BGC0001866.1 BGC0001866.1_7 4683 5138 + PF00135 Pfam 4.0000000000000004e-23 38 144 0.0993439431693874 1.3922728854855552e-26
BGC0001866.1 BGC0001866.1_9 5823 6599 + PF00135 Pfam 1.2e-17 1 218 0.109571995839631 4.176818656456665e-21
BGC0001866.1 BGC0001866.1_10 7758 9029 + PF07992 Pfam 2.2e-06 2 152 0.2103665487056046 7.657500870170554e-10
BGC0001866.1 BGC0001866.1_10 7758 9029 + PF13434 Pfam 6.5e-11 4 171 0.2103665487056046 2.262443438914027e-14
BGC0001866.1 BGC0001866.1_10 7758 9029 + PF13738 Pfam 3.8e-08 10 176 0.2103665487056046 1.3226592412112776e-11
BGC0001866.1 BGC0001866.1_10 7758 9029 + PF00743 Pfam 4.4e-09 31 115 0.2103665487056046 1.5315001740341105e-12
BGC0001866.1 BGC0001866.1_13 11550 12662 + TIGR00895 Tigrfam 2.6e-10 1 162 0.4602142733167652 1.0915197313182199e-13
BGC0001866.1 BGC0001866.1_13 11550 12662 + TIGR00891 Tigrfam 3.6e-14 1 170 0.4602142733167652 1.5113350125944585e-17
BGC0001866.1 BGC0001866.1_13 11550 12662 + TIGR00710 Tigrfam 2.8e-12 1 179 0.4602142733167652 1.1754827875734678e-15
BGC0001866.1 BGC0001866.1_13 11550 12662 + PF07690 Pfam 5.0000000000000005e-39 1 365 0.4602142733167652 1.740341106856944e-42
BGC0001866.1 BGC0001866.1_13 11550 12662 + TIGR00893 Tigrfam 5.7e-12 2 178 0.4602142733167652 2.3929471032745593e-15
BGC0001866.1 BGC0001866.1_13 11550 12662 + PF06609 Pfam 8.2e-11 3 279 0.4602142733167652 2.8541594152453884e-14
BGC0001866.1 BGC0001866.1_13 11550 12662 + TIGR00880 Tigrfam 1.5e-14 24 159 0.4602142733167652 6.297229219143576e-18
BGC0001866.1 BGC0001866.1_15 14920 15912 + PF08493 Pfam 3.5e-09 6 253 0.7117195790726876 1.2182387747998608e-12
BGC0001866.1 BGC0001866.1_16 17173 19143 + TIGR02813 Tigrfam 7.9e-51 123 497 0.9951925265755368 3.3165407220822837e-54
BGC0001866.1 BGC0001866.1_16 17173 19143 + TIGR03150 Tigrfam 1.1e-37 126 416 0.9951925265755368 4.617968094038623e-41
BGC0001866.1 BGC0001866.1_16 17173 19143 + PF00108 Pfam 8.8e-06 147 221 0.9951925265755368 3.0630003480682214e-09
BGC0001866.1 BGC0001866.1_16 17173 19143 + TIGR01930 Tigrfam 5.2e-06 152 234 0.9951925265755368 2.18303946263644e-09
BGC0001866.1 BGC0001866.1_16 17173 19143 + PF02801 Pfam 1.2e-37 256 369 0.9951925265755368 4.1768186564566656e-41
BGC0001866.1 BGC0001866.1_16 17173 19143 + PF16197 Pfam 4e-27 371 488 0.9951925265755368 1.3922728854855552e-30
BGC0001866.1 BGC0001866.1_16 17173 19143 + TIGR02813 Tigrfam 1.2e-15 483 655 0.9951925265755368 5.037783375314861e-19
BGC0001866.1 BGC0001866.1_16 17173 19143 + TIGR00128 Tigrfam 3.8e-18 509 655 0.9951925265755368 1.595298068849706e-21
BGC0001866.1 BGC0001866.1_16 17173 19143 + TIGR03131 Tigrfam 2.1e-16 510 655 0.9951925265755368 8.816120906801008e-20
BGC0001866.1 BGC0001866.1_16 17173 19143 + PF00698 Pfam 9.3e-29 511 654 0.9951925265755368 3.2370344587539156e-32
BGC0001866.1 BGC0001866.1_17 19152 22424 + TIGR00128 Tigrfam 5e-06 1 147 0.9998554331059866 2.0990764063811924e-09
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF00698 Pfam 2.3e-18 1 175 0.9998554331059866 8.005569091541943e-22
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF14765 Pfam 4.499999999999999e-62 228 506 0.9998554331059866 1.5663069961712493e-65
BGC0001866.1 BGC0001866.1_17 19152 22424 + TIGR04532 Tigrfam 1.6e-06 254 429 0.9998554331059866 6.717044500419815e-10
BGC0001866.1 BGC0001866.1_17 19152 22424 + TIGR02072 Tigrfam 2.8e-14 638 816 0.9998554331059866 1.1754827875734677e-17
BGC0001866.1 BGC0001866.1_17 19152 22424 + TIGR01983 Tigrfam 3.7e-06 640 785 0.9998554331059866 1.5533165407220824e-09
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF10294 Pfam 2.5e-06 642 771 0.9998554331059866 8.70170553428472e-10
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF13489 Pfam 8.7e-15 642 820 0.9998554331059866 3.0281935259310827e-18
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF06080 Pfam 4.8e-08 645 770 0.9998554331059866 1.6707274625826663e-11
BGC0001866.1 BGC0001866.1_17 19152 22424 + TIGR01934 Tigrfam 4.9e-08 651 781 0.9998554331059866 2.0570948782535685e-11
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF05401 Pfam 5e-06 656 775 0.9998554331059866 1.740341106856944e-09
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF13847 Pfam 1.3e-12 662 815 0.9998554331059866 4.524886877828054e-16
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF13649 Pfam 2.3e-15 667 764 0.9998554331059866 8.005569091541942e-19
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF08242 Pfam 3.0999999999999997e-24 668 766 0.9998554331059866 1.0790114862513051e-27
BGC0001866.1 BGC0001866.1_18 22762 23235 + TIGR02824 Tigrfam 1.6e-19 1 143 0.9969853467939896 6.717044500419815e-23
BGC0001866.1 BGC0001866.1_18 22762 23235 + TIGR02825 Tigrfam 1.3e-09 1 143 0.9969853467939896 5.4575986565911e-13
BGC0001866.1 BGC0001866.1_18 22762 23235 + TIGR03451 Tigrfam 2.9e-07 2 126 0.9969853467939896 1.2174643157010914e-10
BGC0001866.1 BGC0001866.1_18 22762 23235 + PF00107 Pfam 9.4e-18 12 143 0.9969853467939896 3.2718412808910542e-21
BGC0001866.1 BGC0001866.1_18 22762 23235 + PF13602 Pfam 7.1e-08 49 151 0.9969853467939896 2.4712843717368603e-11
BGC0001866.1 BGC0001866.1_19 23268 24623 + TIGR03206 Tigrfam 8.7e-07 62 211 0.9986558997392674 3.652392947103275e-10
BGC0001866.1 BGC0001866.1_19 23268 24623 + PF08659 Pfam 3.4e-63 65 240 0.9986558997392674 1.1834319526627219e-66
BGC0001866.1 BGC0001866.1_19 23268 24623 + TIGR01829 Tigrfam 2.4e-12 66 226 0.9986558997392674 1.0075566750629722e-15
BGC0001866.1 BGC0001866.1_19 23268 24623 + PF00106 Pfam 9.8e-10 66 240 0.9986558997392674 3.41106856943961e-13
BGC0001866.1 BGC0001866.1_19 23268 24623 + TIGR04316 Tigrfam 7.7e-06 67 186 0.9986558997392674 3.2325776658270365e-09
BGC0001866.1 BGC0001866.1_19 23268 24623 + TIGR01831 Tigrfam 5.3e-07 67 220 0.9986558997392674 2.2250209907640637e-10
BGC0001866.1 BGC0001866.1_19 23268 24623 + TIGR01830 Tigrfam 1.7e-14 67 225 0.9986558997392674 7.136859781696054e-18
BGC0001866.1 BGC0001866.1_19 23268 24623 + PF01073 Pfam 9.5e-06 68 227 0.9986558997392674 3.306648103028194e-09
BGC0001866.1 BGC0001866.1_19 23268 24623 + PF13561 Pfam 5.1e-08 71 222 0.9986558997392674 1.775147928994083e-11
BGC0001866.1 BGC0001866.1_19 23268 24623 + TIGR02813 Tigrfam 3.9e-10 99 235 0.9986558997392674 1.63727959697733e-13
BGC0001866.1 BGC0001866.1_19 23268 24623 + PF00550 Pfam 2.9e-12 374 437 0.9986558997392674 1.0093978419770276e-15
BGC0001866.1 BGC0001866.1_20 25769 26056 + PF16073 Pfam 5.5e-26 8 95 0.9981600477582812 1.9143752175426385e-29
BGC0001866.1 BGC0001866.1_21 26544 29999 + PF16073 Pfam 1e-12 1 47 0.9999881747850234 3.4806822137138876e-16
BGC0001866.1 BGC0001866.1_21 26544 29999 + TIGR02813 Tigrfam 4.9e-12 172 308 0.9999881747850234 2.0570948782535684e-15
BGC0001866.1 BGC0001866.1_21 26544 29999 + TIGR03150 Tigrfam 4.6e-39 177 607 0.9999881747850234 1.931150293870697e-42
BGC0001866.1 BGC0001866.1_21 26544 29999 + TIGR02813 Tigrfam 4.7e-68 313 1015 0.9999881747850234 1.9731318219983205e-71
BGC0001866.1 BGC0001866.1_21 26544 29999 + TIGR01930 Tigrfam 2.1e-08 331 458 0.9999881747850234 8.816120906801008e-12
BGC0001866.1 BGC0001866.1_21 26544 29999 + PF00108 Pfam 2.2e-07 335 418 0.9999881747850234 7.657500870170554e-11
BGC0001866.1 BGC0001866.1_21 26544 29999 + PF02801 Pfam 1.9000000000000002e-36 434 556 0.9999881747850234 6.613296206056388e-40
BGC0001866.1 BGC0001866.1_21 26544 29999 + PF16197 Pfam 2.4e-09 564 676 0.9999881747850234 8.353637312913331e-13
BGC0001866.1 BGC0001866.1_21 26544 29999 + TIGR00128 Tigrfam 1.1e-34 705 999 0.9999881747850234 4.6179680940386226e-38
BGC0001866.1 BGC0001866.1_21 26544 29999 + TIGR03131 Tigrfam 2.6000000000000003e-33 707 1018 0.9999881747850234 1.0915197313182201e-36
BGC0001866.1 BGC0001866.1_21 26544 29999 + PF00698 Pfam 3.9e-40 708 1027 0.9999881747850234 1.3574660633484163e-43
BGC0001866.1 BGC0001866.1_21 26544 29999 + TIGR04532 Tigrfam 4.3e-13 1104 1151 0.9999881747850234 1.8052057094878253e-16
BGC0001866.1 BGC0001866.1_22 30150 30890 + TIGR04532 Tigrfam 2.6e-59 1 245 0.999954895962412 1.09151973131822e-62
BGC0001866.1 BGC0001866.1_22 30150 30890 + PF14765 Pfam 5.5e-13 16 246 0.999954895962412 1.9143752175426382e-16
BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00550 Pfam 6.1e-16 64 129 0.9997548724570005 2.1232161503654714e-19
BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00550 Pfam 4.2e-12 173 239 0.9997548724570005 1.4618865297598329e-15
BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00550 Pfam 1.2e-10 298 363 0.9997548724570005 4.176818656456665e-14
BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00975 Pfam 5.8e-26 442 580 0.9997548724570005 2.018795683954055e-29
BGC0001866.1 BGC0001866.1_23 30937 32979 + PF12697 Pfam 3.3e-07 445 673 0.9997548724570005 1.1486251305255831e-10
sequence_id protein_id start end strand domain hmm i_evalue pvalue domain_start domain_end bgc_probability
BGC0001866.1 BGC0001866.1_1 347 1489 - PF00394 Pfam 2.1941888078432915e-08 8.178117062405111e-12 1 63 0.9852038761627908
BGC0001866.1 BGC0001866.1_1 347 1489 - PF07731 Pfam 3.9374169295176556e-23 1.467542649838858e-26 150 281 0.9852038761627908
BGC0001866.1 BGC0001866.1_6 3946 4389 + PF00891 Pfam 4.743887678074703e-16 1.7681280946979883e-19 17 121 0.9910535094227727
BGC0001866.1 BGC0001866.1_7 4683 5138 + PF00135 Pfam 4.674605664377319e-21 1.7423055029360116e-24 48 140 0.9913598896683397
BGC0001866.1 BGC0001866.1_8 5384 5812 + PF00135 Pfam 3.9706994470948554e-30 1.4799476135277136e-33 2 114 0.9925093258822111
BGC0001866.1 BGC0001866.1_9 5823 6599 + PF00135 Pfam 1.4185801852307574e-15 5.287291037013632e-19 2 209 0.9946019708257335
BGC0001866.1 BGC0001866.1_10 7758 9029 + PF13434 Pfam 5.777178703900199e-08 2.153253337271785e-11 13 124 0.9978201609931655
BGC0001866.1 BGC0001866.1_10 7758 9029 + PF00743 Pfam 5.089108077410868e-07 1.8967976434628658e-10 36 102 0.9978201609931655
BGC0001866.1 BGC0001866.1_13 11550 12662 + PF07690 Pfam 5.839871260376694e-37 2.1766199255969786e-40 1 362 0.9990971143689635
BGC0001866.1 BGC0001866.1_13 11550 12662 + PF06609 Pfam 9.543170598318239e-09 3.55690294383833e-12 17 244 0.9990971143689635
BGC0001866.1 BGC0001866.1_15 14920 15912 + PF08493 Pfam 2.6165794251055913e-17 9.752439154325723e-21 139 224 0.9999977987864139
BGC0001866.1 BGC0001866.1_16 17173 19143 + PF00109 Pfam 9.025888536170949e-60 3.364103069761815e-63 2 248 0.9999994272691842
BGC0001866.1 BGC0001866.1_16 17173 19143 + PF02801 Pfam 2.2171445990751238e-35 8.263677223537547e-39 257 368 0.9999994272691842
BGC0001866.1 BGC0001866.1_16 17173 19143 + PF16197 Pfam 3.8698172759236842e-25 1.4423471024687604e-28 371 487 0.9999994272691842
BGC0001866.1 BGC0001866.1_16 17173 19143 + PF00698 Pfam 1.0799913424517567e-26 4.025312495161225e-30 512 648 0.9999994272691842
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF00698 Pfam 2.639223271303753e-16 9.836836642950999e-20 2 151 0.9999940983719267
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF14765 Pfam 2.520598829779557e-60 9.394703055458656e-64 228 504 0.9999940983719267
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF13489 Pfam 1.0131254482174088e-12 3.776091868123029e-16 661 817 0.9999940983719267
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF13847 Pfam 8.939870258494623e-11 3.332042586095648e-14 666 776 0.9999940983719267
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF13649 Pfam 2.319131521369124e-13 8.643799930559537e-17 667 764 0.9999940983719267
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF08242 Pfam 3.6288099491186147e-22 1.3525195486837923e-25 668 766 0.9999940983719267
BGC0001866.1 BGC0001866.1_17 19152 22424 + PF08241 Pfam 5.245291385894328e-12 1.9550098344742185e-15 668 767 0.9999940983719267
BGC0001866.1 BGC0001866.1_18 22762 23235 + PF00107 Pfam 1.0960342036668699e-15 4.085106983476965e-19 12 117 0.9999176675645223
BGC0001866.1 BGC0001866.1_19 23268 24623 + PF08659 Pfam 1.5141662612831146e-61 5.643556695054471e-65 65 239 0.9999724741067139
BGC0001866.1 BGC0001866.1_19 23268 24623 + PF00106 Pfam 1.1379002942545491e-07 4.2411490654288077e-11 68 221 0.9999724741067139
BGC0001866.1 BGC0001866.1_19 23268 24623 + PF00550 Pfam 3.359618716013185e-10 1.2521873708584363e-13 384 437 0.9999724741067139
BGC0001866.1 BGC0001866.1_20 25769 26056 + PF16073 Pfam 1.3071857188363548e-23 4.872104803713585e-27 8 94 0.999988513111687
BGC0001866.1 BGC0001866.1_21 26544 29999 + PF16073 Pfam 8.208876065249628e-11 3.059588544632735e-14 2 47 0.9999999447224028
BGC0001866.1 BGC0001866.1_21 26544 29999 + PF00109 Pfam 2.667462237983852e-82 9.942088102809735e-86 178 426 0.9999999447224028
BGC0001866.1 BGC0001866.1_21 26544 29999 + PF02801 Pfam 2.4031043351141288e-34 8.956780973217029e-38 434 555 0.9999999447224028
BGC0001866.1 BGC0001866.1_21 26544 29999 + PF16197 Pfam 2.535893425129411e-07 9.451708628883381e-11 567 673 0.9999999447224028
BGC0001866.1 BGC0001866.1_21 26544 29999 + PF00698 Pfam 4.597134671955754e-38 1.7134307387088164e-41 709 1012 0.9999999447224028
BGC0001866.1 BGC0001866.1_22 30150 30890 + PF14765 Pfam 7.778696660229127e-11 2.8992533209948296e-14 39 244 0.9999460955852995
BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00550 Pfam 5.884377030377924e-14 2.193207987468477e-17 67 128 0.9997314383315643
BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00550 Pfam 3.9212317886052276e-10 1.461510170930014e-13 174 238 0.9997314383315643
BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00550 Pfam 1.367829688372301e-08 5.098135252971677e-12 299 360 0.9997314383315643
BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00975 Pfam 6.711355516947163e-24 2.5014370171252933e-27 443 550 0.9997314383315643