Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
PyHMMER
Manage
Activity
Members
Labels
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Martin Larralde
PyHMMER
Commits
e32302cc
Commit
e32302cc
authored
2 years ago
by
Martin Larralde
Browse files
Options
Downloads
Patches
Plain Diff
Fix capitalization of project name in documentation [ci skip]
parent
4cfeb88f
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Pipeline
#36630
skipped
Changes
4
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
README.md
+2
-2
2 additions, 2 deletions
README.md
docs/benchmarks.rst
+2
-2
2 additions, 2 deletions
docs/benchmarks.rst
docs/index.rst
+12
-6
12 additions, 6 deletions
docs/index.rst
docs/performance.rst
+11
-11
11 additions, 11 deletions
docs/performance.rst
with
27 additions
and
21 deletions
README.md
+
2
−
2
View file @
e32302cc
# 🐍🟡♦️🟦
p
yHMMER [](https://github.com/althonos/pyhmmer/stargazers)
# 🐍🟡♦️🟦
P
yHMMER [](https://github.com/althonos/pyhmmer/stargazers)
*[Cython](https://cython.org/) bindings and Python interface to [HMMER3](http://hmmer.org/).*
...
...
@@ -159,7 +159,7 @@ A possible explanation for this observation would be that HMMER
platform-specific code requires too many
[
SIMD
](
https://en.wikipedia.org/wiki/SIMD
)
registers per thread to benefit from
[
simultaneous multi-threading
](
https://en.wikipedia.org/wiki/Simultaneous_multithreading
)
.
To read more about how
p
yHMMER achieves better parallelism than HMMER for
To read more about how
P
yHMMER achieves better parallelism than HMMER for
many-to-many searches, have a look at the
[
Performance page
](
https://pyhmmer.readthedocs.io/en/stable/performance.html
)
of the documentation.
...
...
This diff is collapsed.
Click to expand it.
docs/benchmarks.rst
+
2
−
2
View file @
e32302cc
...
...
@@ -50,7 +50,7 @@ v0.4.0 - 2021-06-05
.. image:: _images/bench-v0.4.0.svg
The overhead of
p
yHMMER has been reduced, and has a much smaller effect when
The overhead of
P
yHMMER has been reduced, and has a much smaller effect when
using a high number of threads.
The main thread has been updated so that it only loads the next `pyhmmer.plan7.HMM`
...
...
@@ -79,5 +79,5 @@ total number of physical CPUs (6 cores). This could be a hint of hindrance
between the different threads.
Loading from a pressed HMM saves a constant time, independently of the number
of threads.
p
yHMMER also has a constant overhead compared to HMMER for a
of threads.
P
yHMMER also has a constant overhead compared to HMMER for a
higher number of threads.
This diff is collapsed.
Click to expand it.
docs/index.rst
+
12
−
6
View file @
e32302cc
p
yHMMER |Stars|
P
yHMMER |Stars|
===============
.. |Stars| image:: https://img.shields.io/github/stars/althonos/pyhmmer.svg?style=social&maxAge=3600&label=Star
...
...
@@ -125,12 +125,18 @@ Library
Changelog <changes>
Related Project
---------------
Related Project
s
---------------
-
If despite of all the advantages listed earlier, you would rather use HMMER through its CLI,
this package will not be of great help. You should then check the
`hmmer-py <https://github.com/EBI-Metagenomics/hmmer-py>`_ package developed
Building a HMM from scratch? Then you may be interested in the `PyFAMSA <https://pypi.org/project/pyfamsa/>`_
package, providing bindings to `FAMSA <https://github.com/refresh-bio/FAMSA>`_,
a very fast multiple sequence aligner. In addition, you may want to trim alignments:
in that case, consider `PytrimAl <https://pypi.org/project/pytrimal>`_, which
wraps `trimAl 2.0 <https://github.com/inab/trimal/tree/2.0_RC>`_.
If despite of all the advantages listed earlier, you would rather use HMMER
through its CLI, this package will not be of great help. You can instead check
the `hmmer-py <https://github.com/EBI-Metagenomics/hmmer-py>`_ package developed
by `Danilo Horta <https://github.com/horta>`_ at the `EMBL-EBI <https://www.ebi.ac.uk>`_.
...
...
This diff is collapsed.
Click to expand it.
docs/performance.rst
+
11
−
11
View file @
e32302cc
...
...
@@ -4,21 +4,21 @@ Performance
Background
----------
Benchmarks of
p
yHMMER conducted against the ``hmmsearch`` and ``hmmscan`` binaries
Benchmarks of
P
yHMMER conducted against the ``hmmsearch`` and ``hmmscan`` binaries
suggest that running a domain search pipeline takes about the same time in
single-threaded mode, and are faster when the right number of CPUs is used.
This comes from several changes in the implementation of the search pipeline
with the
p
yHMMER API compared to the original HMMER C code, both of which have
with the
P
yHMMER API compared to the original HMMER C code, both of which have
absolutely no effect on the final result.
Parallelisation strategy
------------------------
Both
p
yHMMER and HMMER support searching / scanning several targets with
Both
P
yHMMER and HMMER support searching / scanning several targets with
several queries in parallel using multithreading. However, benchmarks suggest
that
p
yHMMER takes a greater advantage of the number of available CPUs.
that
P
yHMMER takes a greater advantage of the number of available CPUs.
Querying modes
^^^^^^^^^^^^^^
...
...
@@ -47,7 +47,7 @@ sequence targets in either of two modes:
Although the threaded mode removes the potential I/O bottleneck, it only works for
a sufficiently large number of targets (:math:`1000 \times n_{cpus}`). To achieve
true parallelism,
p
yHMMER improves on the threaded mode by switching the worker
true parallelism,
P
yHMMER improves on the threaded mode by switching the worker
thread logic. Target sequences are pre-fetched in memory before looping
over the queries, and are passed by reference to all the worker threads. Each
worker then receives a HMM from the main thread, and process the entirety of
...
...
@@ -58,8 +58,8 @@ other before moving on to the next query**.
.. admonition:: Note
Obviously, the
p
yHMMER parallelisation strategy will only work for multiple
queries. But one main motivation to develop
p
yHMMER was to annotate protein
Obviously, the
P
yHMMER parallelisation strategy will only work for multiple
queries. But one main motivation to develop
P
yHMMER was to annotate protein
sequences with a subset of the `Pfam <http://pfam.xfam.org/>`_ HMM library,
which is why we benchmark this particular use case.
...
...
@@ -71,7 +71,7 @@ other before moving on to the next query**.
Example
^^^^^^^
To check how well
p
yHMMER and HMMER3 handle parallelism on a real dataset,
To check how well
P
yHMMER and HMMER3 handle parallelism on a real dataset,
we annotated proteins from representative genomes of the
`proGenomes <https://progenomes.embl.de/>`_ database with the
`Pfam <http://pfam.xfam.org/>`_ collection of HMMs. ``hmmsearch`` runs
...
...
@@ -87,8 +87,8 @@ and we measured the runtime of either the* ``hmmsearch`` *binary or the*
Memory allocation
-----------------
p
yHMMER is slightly more conservative with memory: in several places where
the original HMMER binary would reallocate memory within loops,
p
yHMMER tries
P
yHMMER is slightly more conservative with memory: in several places where
the original HMMER binary would reallocate memory within loops,
P
yHMMER tries
to simply clear the original buffers instead to allow reusing a previous
object.
...
...
@@ -119,7 +119,7 @@ For instance, the ``hmmsearch`` binary will reallocate a new ``P7_PROFILE`` and
These ``struct`` are not so large by themselves, but they in turn allocate a
buffer of sufficient size to store the :math:`N` nodes of a HMM.
In
p
yHMMER, the pipeline will cache memory to be used for the profile and optimized
In
P
yHMMER, the pipeline will cache memory to be used for the profile and optimized
profiles, and only reallocate if the new HMM to be processed is larger than what the
currently cached ``P7_OPROFILE`` can store.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment