Change internal data layer and output clusters as GenBank files
Added
- Export of cluster sequences with genes and domain annotations in GenBank format.
- Proper data model to allow retaining more information from input.
- Dedicated functions to load the KNN training matrix and the CRF model.
Fixed
- Sorting issue causing
ClusterCRF
to be passed a feature table with potentially scrambled proteins.
Changed
- All internal classes but
ClusterCRF
expect objects from gecco.model
instead of pandas.DataFrame
.
-
KNN
training matrix is now stored as a compressed sparse matrix.
- Internal data is stored in the relevant module next to the code needing it instead of in a dedicated
gecco.data
module.
-
gecco.orf.ORFFinder
works at the single record level.
- Reduce code complexity in
gecco.hmmer.DomainRow
and gecco.refiner.ClusterRefiner
.
Removed
- Export of individual cluster proteins in FASTA files.
- Unused
ProdigalFinder
(since PyrodigalFinder
is used instead).
Edited by Martin Larralde