Code refactor and new CLI for training GECCO
Proteinwill raise a
domainsnot sharing the same length
- Removed redundant arguments for
- Made code consistently refer to
domaininstead of previously using
gecco.data.realpathnow raises error when asked for the real path to a data file that does not exist.
HMMERresults are now collected from the domain table to a
pandas.DataFramedirectly instead of using a temporary (and potentially buggy) TSV file as an intermediate.
gecco.refine.ClusterRefinerdoes not change its attributes anymore depending on a function call (i.e. it should be pure now).
- Changed typing and name of some attributes in
gecco.bgc.Proteinto be more consistent.
- Made instantiating a
BGCfrom a list of
gecco.interfacemodules that were not relevant anymore.
CRF.fitinternally storing its weight each time it was called (doing it in very buggy manner, and super not explicit to the end user).
--debugflags allowing to control the program logging level more intuitively.
--versionflag to simply emit the version number of the program.
gecco helpsubcommand to display the help about another subcommand.
gecco annotatesubcommand to run one or several HMM on a genome or on proteins (plus a specific case when training on MIBiG sequences to extract the strand/sequence coordinates/sequence id properly).
gecco embedto embed a BGC feature table into a feature table of non-BGC sequences.
gecco trainto train the internal CRF model.
- Help/usage messages are now displayed to STDOUT when they are explicitly asked for (i.e. not when they are shown because the program was not used properly).
--min-orfsflag to the
gecco runcommand to control the minimum length of a BGC (default is still 5).
- Proper API doc to
- Short guide on how to train the CRF model in
- Guide on how to update the model or the HMMs stored inside GECCO in