Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • GECCO GECCO
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1
    • Issues 1
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Zeller Group
  • GECCOGECCO
  • Issues
  • #5

Closed
Open
Created Jan 20, 2020 by Martin Larralde@larraldeMaintainer

Improve management of required database(s)

Current GECCO does a site install and unpacks the PFAM HMM to the data folder. However, it should be possible to make it position independent by managing the setup of PFAM ourselves. The solutions are the following

Vendoring on PyPI

GZip Compressed, the PFAM HMMs are only 250MB, which is above the PyPI size limit, but this limit could be raised on request. Using that, we could directly distribute the HMMs with the source and avoid worrying about the availability of PFam.

Compression benchmarks:

  • Brotli: 219MB
  • Gzip: 253MB
  • LZMA: 190MB

Installing in site package

During the install step, automatically download PFAM from the FTP server, and install it somewhere in the source tree before installing. This would make the database uninstallable without having to distribute it on PyPI, but would not allow GECCO to be released in wheel format.

Downloading in cache

Using a cache directory, PFAM could be downloaded if not present already. The advantage is that there is no PyPI size limit to worry about anymore. The inconvenient is that the database would stay on the filesystem if GECCO gets uninstalled because pip does not have any uninstall hook that could erase the database while uninstalling.

Edited Jan 20, 2020 by Martin Larralde
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking