Commits (40)
......@@ -13,21 +13,27 @@ variables:
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- ci/cache
- build/
before_script:
- ./ci/gitlab/test.before_script.sh
- python -m pip install -U wheel coverage tqdm pyhmmer
script:
- ./ci/gitlab/test.script.sh
- python setup.py build_data --inplace bdist_wheel
- python -m pip install --find-links=dist gecco[train]
- python -m coverage run -p -m unittest discover -vv
after_script:
- python -m coverage combine
- python -m coverage xml
- python -m coverage report
# - python ci/gitlab/after_script.hmmfilter.py
# artifacts:
# paths:
# - ci/artifacts
artifacts:
reports:
cobertura: coverage.xml
# --- Stages -------------------------------------------------------------------
test:python3.6:
image: python:3.6
<<: *test
test:python3.7:
image: python:3.7
<<: *test
......@@ -36,11 +42,18 @@ test:python3.8:
image: python:3.8
<<: *test
test:python3.9:
image: python:3.9
<<: *test
docs:
image: python:3.8
image: python:3.9
stage: pages
before_script:
- python -m pip install -U wheel coverage tqdm pyhmmer
- pip install -U -r docs/requirements.txt
- python setup.py build_data --inplace bdist_wheel
- python -m pip install --find-links=dist gecco
script:
- sphinx-build -b html docs public
artifacts:
......@@ -48,7 +61,7 @@ docs:
- public
lint:pydocstyle:
image: python:3.8
image: python:3.9
stage: lint
before_script:
- pip install -U pydocstyle
......@@ -56,7 +69,7 @@ lint:pydocstyle:
- pydocstyle gecco
lint:mypy:
image: python:3.8
image: python:3.9
stage: lint
allow_failure: true
before_script:
......@@ -65,7 +78,7 @@ lint:mypy:
- mypy gecco
lint:pylint:
image: python:3.8
image: python:3.9
stage: lint
allow_failure: true
before_script:
......@@ -74,7 +87,7 @@ lint:pylint:
- pylint gecco
lint:radon:
image: python:3.8
image: python:3.9
stage: lint
allow_failure: true
before_script:
......
......@@ -5,7 +5,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
## [Unreleased]
[Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.4...master
[Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.0...master
## [v0.5.0] - 2021-01-11
[v0.4.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.5...v0.5.0
### Added
- Explicit support for Python 3.9.
### Changed
- [`pyhmmer`](https://pypi.org/project/pyhmmer) is used to annotate protein sequences instead of HMMER3 binary `hmmsearch`.
- HMM files are stored in binary format to speedup parsing and reduce storage size.
- `tqdm` is now a *training*-only dependency.
- `gecco cv` now requires *training* dependencies.
## [v0.4.5] - 2020-11-23
[v0.4.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.4...v0.4.5
......@@ -19,15 +29,15 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
## [v0.4.4] - 2020-09-30
[v0.4.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.3...v0.4.4
### Added
- `gecco cv loto` command to run LOTO cross-validation using BGC types
- `gecco cv loto` command to run LOTO cross-validation using BGC types
for stratification.
- `header` keyword argument to `FeatureTable.dump` and `ClusterTable.dump`
to write the table without the column header allowing to append to an
to write the table without the column header allowing to append to an
existing table.
- `__getitem__` implementation for `FeatureTable` and `ClusterTable`
- `__getitem__` implementation for `FeatureTable` and `ClusterTable`
that returns a single row or a sub-table from a table.
### Fixed
- `gecco cv` command now writes results iteratively instead of holding
- `gecco cv` command now writes results iteratively instead of holding
the tables for every fold in memory.
### Changed
- Bumped `pandas` training dependency to `v1.0`.
......@@ -43,7 +53,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
## [v0.4.2] - 2020-08-07
[v0.4.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.1...v0.4.2
### Fixed
- `TypeClassifier.predict_types` using inverse type probabilities when
- `TypeClassifier.predict_types` using inverse type probabilities when
given several clusters to process.
## [v0.4.1] - 2020-08-07
......
......@@ -6,12 +6,6 @@
## Requirements
* [Python](https://www.python.org/downloads/) 3.6 or higher
* [HMMER](http://hmmer.org/) v3.2 or higher
HMMER can be installed through [conda](https://anaconda.org/). It has to
be in the `$PATH` variable before running GECCO. GECCO also requires
additional Python libraries, but they are normally installed automatically
by `pip` or `conda` when installing GECCO.
## Installing GECCO
......@@ -24,13 +18,8 @@ $ pip install git+https://git.embl.de/grp-zeller/GECCO/
```
Note that this command can take a long time to complete as it need to download
around 250MB of data from the EBI FTP server. You will need to have writing
rights to the site folder of Python; if this is not the case, use `pip` with
the `--user` flag to install it to a local folder. Another option is to use
a virtual environment, either with `virtualenv` or with `conda`.
Once the install is finished, a `gecco` command will be available in your path
automatically.
around 250MB of data from the EBI FTP server. Once the install is finished, a
`gecco` command should be available in your path.
## Running GECCO
......@@ -46,6 +35,13 @@ $ gecco run --genome some_genome.fna -o some_output_dir
## Training GECCO
By default, GECCO is only installed with prediction support; to be able to train GECCO,
you need to install it with its training requirements:
```console
$ pip install GECCO[train]
```
### Resources
For this, you need to get the FASTA file from MIBiG containing all the proteins
......
#!/bin/sh
log() {
tput bold
tput setaf 2
printf "%12s " $1
tput sgr0
shift 1
echo $@
}
error() {
tput bold
tput setaf 1
printf "%12s " $1
tput sgr0
shift 1
echo $@
}
import contextlib
import gzip
import re
import os
import io
import sys
import tqdm
import pkg_resources
sys.path.insert(0, os.path.realpath(os.path.join(__file__, "..", "..", "..")))
from gecco.hmmer import embedded_hmms
from gecco.interpro import InterPro
# Create the artifact folder
os.makedirs(os.path.join("ci", "artifacts"), exist_ok=True)
# Load InterPro to know how many entries we have to process
interpro = InterPro.load()
# Load the domains used by the CRF and compile a regex that matches the domains
# known to the CRF (i.e. useful domains for us to annotate with)
with pkg_resources.resource_stream("gecco.types", "domains.tsv") as f:
domains = [ domain.strip() for domain in f ]
rx = re.compile(b"|".join(domains))
# Filter the hmms
for hmm in embedded_hmms():
out = os.path.join("ci", "artifacts", "{}.hmm.gz".format(hmm.id))
in_ = os.path.join("ci", "cache", "{}.{}.hmm.gz".format(hmm.id, hmm.version))
size = sum(1 for e in interpro.entries if e.source_database.upper().startswith(hmm.id.upper()))
pbar = tqdm.tqdm(desc=hmm.id, total=size)
with contextlib.ExitStack() as ctx:
pbar = ctx.enter_context(pbar)
src = ctx.enter_context(gzip.open(in_, "rb"))
dst = ctx.enter_context(gzip.open(out, "wb"))
blocklines = []
for line in src:
blocklines.append(line)
if line == b"//\n":
if any(rx.search(line) is not None for line in blocklines):
dst.writelines(blocklines)
blocklines.clear()
pbar.update(1)
#!/bin/sh
set -e
. $(dirname $(dirname $0))/functions.sh
# --- Install software dependencies ------------------------------------------
if [ ! -x "$(command -v hmmsearch)" ]; then
log Installing executable dependencies with aptitude
apt update
apt install -y hmmer
fi
log Installing Python dependencies with pip
pip install -U coverage tqdm
# --- Install data dependencies ----------------------------------------------
mkdir -p ci/cache
mkdir -p build/lib/gecco/data/hmms
if [ "$CI_SERVER" = "yes" ]; then
QUIET="-q"
else
QUIET=""
fi
for ini_file in gecco/hmmer/*.ini; do
url=$(grep "url" $ini_file | cut -d'=' -f2 | sed 's/ //g')
hmm=$(grep "id" $ini_file | cut -d'=' -f2 | sed 's/ //g')
version=$(grep "version" $ini_file | cut -d'=' -f2 | sed 's/ //g')
hmm_file=${ini_file%.ini}.hmm.gz
cache_file="ci/cache/${hmm}.${version}.hmm.gz"
if ! [ -e "$cache_file" ]; then
if [ "$hmm" = "Panther" ]; then
log Extracting $hmm v$version
wget "$url" $QUIET -O- \
| tar xz --wildcards --no-wildcards-match-slash --no-anchored PTHR\*/hmmer.hmm -O \
| gzip > "$cache_file"
else
log Downloading $hmm v$version
wget "$url" $QUIET -O "$cache_file"
fi
else
log Using cached $hmm v$version
fi
mkdir -p $(dirname "build/lib/${hmm_file}")
cp "$cache_file" "build/lib/${hmm_file}"
done
#!/bin/sh
set -e
. $(dirname $(dirname $0))/functions.sh
python setup.py bdist_wheel
python -m pip install --find-links=dist gecco[train]
python -m coverage run -p -m unittest discover -vv
......@@ -19,7 +19,7 @@ from Bio import SeqIO
from ._base import Command
from .._utils import guess_sequences_format
from ...crf import ClusterCRF
from ...hmmer import HMMER, HMM, embedded_hmms
from ...hmmer import PyHMMER, HMMER, HMM, embedded_hmms
from ...model import FeatureTable, ClusterTable
from ...orf import PyrodigalFinder
from ...types import TypeClassifier
......@@ -128,10 +128,10 @@ class Run(Command): # noqa: D101
self.logger.debug(
"Starting annotation with HMM {} v{}", hmm.id, hmm.version
)
features = HMMER(hmm, self.args["--jobs"]).run(genes)
features = PyHMMER(hmm, self.args["--jobs"]).run(genes)
self.logger.debug("Finished running HMM {}", hmm.id)
with multiprocessing.pool.ThreadPool(self.args["--jobs"]) as pool:
with multiprocessing.pool.ThreadPool(min(self.args["--jobs"], 2)) as pool:
pool.map(annotate, embedded_hmms())
# Count number of annotated domains
......
......@@ -11,11 +11,12 @@ import re
import subprocess
import tempfile
import typing
from typing import Dict, Optional, Iterable, Iterator, List, Mapping, Type, Sequence
from typing import Callable, Dict, Optional, Iterable, Iterator, List, Mapping, Type, Sequence
import pkg_resources
from Bio import SeqIO
from .._meta import requires
from .._base import BinaryRunner
from ..model import Gene, Domain
from ..interpro import InterPro
......@@ -174,10 +175,67 @@ class HMMER(BinaryRunner):
return list(gene_index.values())
class PyHMMER(object):
def __init__(self, hmm: HMM, cpus: Optional[int] = None) -> None:
self.hmm = hmm
self.cpus = cpus
@requires("pyhmmer")
def run(self, genes: Iterable[Gene], callback: Optional[Callable[..., None]] = None) -> List[Gene]:
# collect genes and build an index of genes by protein id
gene_index = collections.OrderedDict([(gene.id, gene) for gene in genes])
# convert to Easel sequences
esl_abc = pyhmmer.easel.Alphabet.amino()
esl_sqs = [
pyhmmer.easel.TextSequence(
name=gene.protein.id.encode(),
sequence=str(gene.protein.seq)
).digitize(esl_abc)
for gene in gene_index.values()
]
# Run HMMER subprocess.run(cmd, stdout=subprocess.DEVNULL).check_returncode()
with pyhmmer.plan7.HMMFile(self.hmm.path) as hmm_file:
hmms_hits = pyhmmer.hmmsearch(hmm_file, esl_sqs, cpus=self.cpus, callback=callback)
# Load InterPro metadata for the annotation
interpro = InterPro.load()
# Transcribe HMMER hits to GECCO model
for hits in hmms_hits:
for hit in hits:
target_name = hit.name.decode('utf-8')
for domain in hit.domains:
raw_acc = domain.alignment.hmm_name
accession = self.hmm.relabel(raw_acc.decode('utf-8'))
entry = interpro.by_accession.get(accession)
# extract qualifiers
qualifiers: Dict[str, List[str]] = {
"inference": ["protein motif"],
"note": ["e-value: {}".format(domain.i_evalue)],
"db_xref": ["{}:{}".format(self.hmm.id.upper(), accession)],
"function": [] if entry is None else [entry.name]
}
if entry is not None and entry.integrated is not None:
qualifiers["db_xref"].append("InterPro:{}".format(entry.integrated))
# add the domain to the protein domains of the right gene
assert domain.env_from < domain.env_to
domain = Domain(accession, domain.env_from, domain.env_to, self.hmm.id, domain.i_evalue, None, qualifiers)
gene_index[target_name].protein.domains.append(domain)
# return the updated list of genes that was given in argument
return list(gene_index.values())
def embedded_hmms() -> Iterator[HMM]:
"""Iterate over the embedded HMMs that are shipped with GECCO.
"""
for ini in glob.glob(pkg_resources.resource_filename(__name__, "*.ini")):
cfg = configparser.ConfigParser()
cfg.read(ini)
yield HMM(path=ini.replace(".ini", ".hmm"), **dict(cfg.items("hmm")))
yield HMM(path=ini.replace(".ini", ".h3m"), **dict(cfg.items("hmm")))
......@@ -2,6 +2,7 @@
"""
import csv
import datetime
import enum
import functools
import itertools
......@@ -9,11 +10,11 @@ import operator
import re
import typing
from array import array
from collections import OrderedDict
from collections.abc import Sized
from dataclasses import dataclass, field
from typing import Dict, Iterable, List, Mapping, Optional, Sequence, TextIO, NamedTuple, Union, Iterator
import Bio.Alphabet
import numpy
from Bio.Seq import Seq
from Bio.SeqFeature import SeqFeature, FeatureLocation, CompoundLocation
......@@ -288,7 +289,9 @@ class Cluster:
# ---
def domain_composition(
self, all_possible: Optional[Sequence[str]] = None
self,
all_possible: Optional[Sequence[str]] = None,
normalize: bool = True,
) -> numpy.ndarray:
"""Compute weighted domain composition with respect to ``all_possible``.
......@@ -313,7 +316,9 @@ class Cluster:
for i, dom in enumerate(all_possible):
if dom in unique_names:
composition[i] = numpy.sum(weights[names == dom])
return composition / (composition.sum() or 1) # type: ignore
if normalize:
return composition / (composition.sum() or 1) # type: ignore
return composition
# ---
......@@ -330,13 +335,27 @@ class Cluster:
# but slicing expects 0-based ranges with exclusive ends
bgc = self.source[self.start - 1 : self.end]
bgc.id = bgc.name = self.id
bgc.seq.alphabet = Bio.Alphabet.generic_dna
# copy sequence annotations
bgc.annotations = self.source.annotations.copy()
bgc.annotations["topology"] = "linear"
bgc.annotations["organism"] = self.source.annotations.get("organism")
bgc.annotations["source"] = self.source.annotations.get("source")
bgc.annotations["comment"] = ["Detected with GECCO v{}".format(__version__)]
bgc.annotations["molecule_type"] = "DNA"
bgc.annotations.setdefault("comment", []).append(f"Detected with GECCO v{__version__}")
# add GECCO-specific annotations as a structured comment
structured_comment = bgc.annotations.setdefault("structured_comment", OrderedDict())
structured_comment['GECCO-Data'] = {
"version": f"GECCO v{__version__}",
"creation_date": datetime.datetime.now().isoformat(),
"biosyn_class": ",".join(ty.name for ty in self.type.unpack()),
"alkaloid_probability": self.type_probabilities.get(ProductType.Alkaloid, 0.0),
"polyketide_probability": self.type_probabilities.get(ProductType.Polyketide, 0.0),
"ripp_probability": self.type_probabilities.get(ProductType.RiPP, 0.0),
"saccharide_probability": self.type_probabilities.get(ProductType.Saccharide, 0.0),
"terpene_probability": self.type_probabilities.get(ProductType.Terpene, 0.0),
"nrp_probability": self.type_probabilities.get(ProductType.NRP, 0.0),
"other_probability": self.type_probabilities.get(ProductType.Other, 0.0),
}
# add proteins as CDS features
for gene in self.genes:
......@@ -454,7 +473,7 @@ class FeatureTable(Dumpable, Sized):
Raises:
ImportError: if the `pandas` module could not be imported.
"""
frame = pandas.DataFrame() # type: ignore
for column in self.__annotations__:
......
......@@ -9,9 +9,9 @@ import queue
import threading
import tempfile
import typing
from multiprocessing.sharedctypes import Value
from typing import Callable, Iterable, Iterator, List, Optional
import Bio.Alphabet
import Bio.SeqIO
import pyrodigal
from Bio.Seq import Seq
......@@ -59,7 +59,7 @@ class PyrodigalFinder(ORFFinder):
def __init__(
self,
metagenome: bool,
record_count: multiprocessing.Value,
record_count: "Value",
record_queue: "queue.Queue[typing.Optional[SeqRecord]]",
genes_queue: "queue.Queue[Gene]",
callback: Optional[Callable[[SeqRecord, int], None]],
......@@ -81,7 +81,7 @@ class PyrodigalFinder(ORFFinder):
orfs = self.pyrodigal.find_genes(str(record.seq))
for j, orf in enumerate(orfs):
# wrap the protein into a Protein object
prot_seq = Seq(orf.translate(), Bio.Alphabet.generic_protein)
prot_seq = Seq(orf.translate())
protein = Protein(id=f"{record.id}_{j+1}", seq=prot_seq)
# wrap the gene into a Gene
self.genes_queue.put(Gene(
......@@ -134,7 +134,7 @@ class PyrodigalFinder(ORFFinder):
_cpus = self.cpus if self.cpus > 0 else multiprocessing.cpu_count()
# create the queue to pass the objects around
record_count = multiprocessing.Value('i')
record_count = Value('i')
record_queue = typing.cast("queue.Queue[typing.Optional[SeqRecord]]", queue.Queue())
genes_queue = typing.cast("queue.Queue[SeqRecord]", queue.Queue())
......
......@@ -21,7 +21,7 @@ class TypeBinarizer(sklearn.preprocessing.MultiLabelBinarizer):
def __init__(self):
self.classes_ = sorted(x for x in ProductType.__members__.values() if x)
super().__init__(self.classes_)
super().__init__(classes=self.classes_)
def transform(self, y: Iterable[ProductType]) -> Iterable[Iterable[int]]:
matrix = numpy.zeros((len(y), len(self.classes_)))
......
......@@ -15,51 +15,53 @@ classifiers =
License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System :: POSIX
Programming Language :: Python :: 3 :: Only
Programming Language :: Python :: 3.5
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Topic :: Scientific/Engineering :: Bio-Informatics
Topic :: Scientific/Engineering :: Medical Science Apps.
Typing :: Typed
[options]
zip_safe = false
packages = find:
packages = find:
include_package_data = true
python_requires = >= 3.6
setup_requires =
setuptools >=39.2
tqdm ~=4.41
pyhmmer ~=0.1.1
install_requires =
better-exceptions ~=0.2.2
biopython ~=1.73, <1.78
coloredlogs ~=14.0
dataclasses ~=0.7 ; python_version < '3.7'
biopython ~=1.78
coloredlogs ~=15.0
dataclasses ~=0.8 ; python_version < '3.7'
docopt ~=0.6.2
numpy ~=1.18
scikit-learn ~=0.22.1
numpy ~=1.16
pyhmmer ~=0.1.2
pyrodigal ~=0.4.1
scikit-learn ~=0.24.0
scipy ~=1.4
sklearn-crfsuite ~=0.3.6
pyrodigal ~=0.2.1
tqdm ~=4.41
verboselogs ~=1.7
[options.extras_require]
train =
fisher ~=0.1.9
statsmodels ~=0.11.1
pandas ~=1.0
statsmodels ~=0.12.1
tqdm ~=4.41
[options.packages.find]
include = gecco
include = gecco, gecco.crf, gecco.hmmer, gecco.types, gecco.cli
[options.package_data]
gecco = _version.txt, py.typed
[options.entry_points]
gecco.cli.commands =
cv = gecco.cli.commands.cv:Cv
cv = gecco.cli.commands.cv:Cv [train]
embed = gecco.cli.commands.embed:Embed [train]
run = gecco.cli.commands.run:Run
train = gecco.cli.commands.train:Train [train]
......@@ -68,7 +70,7 @@ console-scripts =
gecco = gecco.cli:main
[bdist_wheel]
universal = false
universal = true
[coverage:report]
include = gecco/*
......
......@@ -19,9 +19,10 @@ from functools import partial
from xml.etree import ElementTree as etree
import setuptools
from setuptools.command.build_ext import build_ext as _build_ext
from distutils.command.build import build as _build
from setuptools.command.sdist import sdist as _sdist
from tqdm import tqdm
from pyhmmer.plan7 import HMMFile
class sdist(_sdist):
......@@ -110,39 +111,54 @@ class update_model(setuptools.Command):
return entries
class build_ext(_build_ext):
"""A hacked `build_ext` command to download data before wheel creation.
class build_data(setuptools.Command):
"""A custom `setuptools` command to download data before wheel creation.
"""
Using ``build_ext`` switches `setuptools` to build in non-universal mode,
and to generate platform-specific wheels. We do not have platform-specific
code in GECCO, but we have platform-specific data: binary HMM pressed with
``hmmpressed`` are CPU-architecture-specific, so we can only install them
on a x86-64 machine if they were pressed on a x86-64 machine.
description = "download the HMM libraries used by GECCO to annotate proteins"
user_options = [
("inplace", "i", "ignore build-lib and put data alongside your Python code")
]
"""
def initialize_options(self):
self.inplace = False
def run(self):
for ext in self.extensions:
in_ = ext.sources[0]
pressed = os.path.join(self.build_lib, in_).replace(".ini", ".hmm.h3i")
self.make_file([in_], pressed, self.download_and_press, (in_,))
def finalize_options(self):
_build_py = self.get_finalized_command("build_py")
self.build_lib = _build_py.build_lib
def download_and_press(self, in_):
def info(self, msg):
self.announce(msg, level=2)
def run(self):
self.mkpath(self.build_lib)
domains_file = os.path.join("gecco", "types", "domains.tsv")
self.info("loading domain accesssions from {}".format(domains_file))
with open(domains_file, "rb") as f:
domains = [line.strip() for line in f]
for in_ in glob.iglob(os.path.join("gecco", "hmmer", "*.ini")):
local = os.path.join(self.build_lib, in_).replace(".ini", ".h3m")
self.mkpath(os.path.dirname(local))
self.make_file([in_], local, self.download, (in_, domains))
if self.inplace:
copy = in_.replace(".ini", ".h3m")
self.make_file([local], copy, shutil.copy, (local, copy))
def download(self, in_, domains):
cfg = configparser.ConfigParser()
cfg.read(in_)
out = os.path.join(self.build_lib, in_.replace(".ini", ".hmm"))
out = os.path.join(self.build_lib, in_.replace(".ini", ".h3m"))
try:
self.download_hmm(out, dict(cfg.items("hmm")))
self.download_hmm(out, domains, dict(cfg.items("hmm")))
except:
if os.path.exists(out):
os.remove(out)
raise
self.spawn(["hmmpress", out])
os.remove(out)
def download_hmm(self, output, options):
def download_hmm(self, output, domains, options):
base = "https://github.com/althonos/GECCO/releases/download/v{version}/{id}.hmm.gz"
url = base.format(id=options["id"], version=self.distribution.get_version())
# attempt to use the GitHub releases URL, otherwise fallback to official URL
......@@ -160,14 +176,28 @@ class build_ext(_build_ext):
)
with tqdm.wrapattr(response, "read", **format) as src:
with open(output, "wb") as dst:
shutil.copyfileobj(gzip.open(src), dst)
nwritten = 0
for hmm in HMMFile(gzip.open(src)):
if hmm.accession.split(b".")[0] in domains:
hmm.write(dst, binary=True)
nwritten += 1
class build(_build):
"""A hacked `build` command that will also run `build_data`.
"""
def run(self):
self.run_command("build_data")
_build.run(self)
if __name__ == "__main__":
setuptools.setup(
cmdclass={"build_ext": build_ext, "sdist": sdist, "update_model": update_model,},
ext_modules=[
setuptools.Extension("Pfam", [os.path.join("gecco", "hmmer", "Pfam.ini")]),
setuptools.Extension("Tigrfam", [os.path.join("gecco", "hmmer", "Tigrfam.ini")]),
]
cmdclass={
"build": build,
"build_data": build_data,
"sdist": sdist,
"update_model": update_model,
},
)
......@@ -55,7 +55,9 @@ class TestRun(TestCommand, unittest.TestCase):
mock.patch.object(gecco.cli.commands.run.PyrodigalFinder, "find_genes", new=_find_genes)
)
argv = ["-vv", "--traceback", "run", "--genome", sequence, "--output", self.tmpdir]
self.assertEqual(main(argv, stream=io.StringIO()), 0)
with io.StringIO() as stderr:
retcode = main(argv, stream=stderr)
self.assertEqual(retcode, 0, stderr.getvalue())
# make sure we have generated the files we want
# and that we found one cluster
......