CHANGELOG.md 17.5 KB
Newer Older
1
2
3
4
5
6
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

Martin Larralde's avatar
Martin Larralde committed
7

8
## [Unreleased]
Martin Larralde's avatar
Martin Larralde committed
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[Unreleased]: https://github.com/althonos/pyhmmer/compare/v0.4.7...HEAD


## [v0.4.7] - 2021-09-28
[v0.4.7]: https://github.com/althonos/pyhmmer/compare/v0.4.6...v0.4.7

### Added
- `TraceAligner`, `Trace` and `Traces` classes to `pyhmmer.plan7` to get tracebacks after aligning several sequences against an HMM.
- `pyhmmer.hmmalign` function with the same features as the `hmmalign` binary from HMMER3.
- Support for out-of-band pickling in `easel.Vector` and `easel.Matrix`.

### Changed
- Allow creating an empty `Vector` or `Matrix` by calling their constructor without arguments.

### Fixed
- Potential unreported exceptions in `plan7.OptimizedProfile.write` and several `plan7.SSIWriter` methods.
Martin Larralde's avatar
Martin Larralde committed
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50


## [v0.4.6] - 2021-09-10
[v0.4.6]: https://github.com/althonos/pyhmmer/compare/v0.4.5...v0.4.6

### Added
- `pickle` protocol for `easel.Alphabet`, `easel.Bitfield`, `easel.KeyHash`, `easel.Vector`, `easel.Matrix` and `plan7.HMM`.
- `taxonomy_id` and `residue_markups` properties to `easel.Sequence`.
- `sum_score` property to `plan7.Hit`.
- `plan7.EvalueParameters` class to expose the e-value parameters of a `plan7.HMM` or a `plan7.Profile`.
- Equality checks and slicing for `easel.Matrix` and `easel.Vector`.
- Support for creating and manipulating zero-sized `easel` matrices and vectors.
- `plan7.Cutoffs` class to expose the Pfam score cutoffs of a `plan7.HMM` or a `plan7.Profile`.
- Keyword arguments to configure E-value thresholds when creating a `plan7.Pipeline` object.
- Support for using model-specific thresholding options in `plan7.Pipeline`.

### Changed
- Use the *replace* error handler when decoding error messages to skip potential decoding issues when already building an exception.
- Improve `pyhmmer.hmmer` to ensure background threads exit on a `KeyboardInterrupt`.
- `easel.VectorU8.__eq__` accepts any object implementing the buffer protocol.
- `plan7.HMM.creation_time` now takes and returns a `datetime.datetime` object, assuming the field is only ever set with `asctime`.
- Refactor `easel.Vector` and `easel.Matrix` and mark exposed memory as C-contiguous.

### Fixed
- `easel.Alphabet` not reporting potential allocation errors.
- Potential buffer overflow in `easel.Matrix` and `easel.Vector` when calling `__init__` more than once.
Martin Larralde's avatar
Martin Larralde committed
51
52
53


## [v0.4.5] - 2021-07-19
54
[v0.4.5]: https://github.com/althonos/pyhmmer/compare/v0.4.4...v0.4.5
Martin Larralde's avatar
Martin Larralde committed
55
56
57
58
59
60
61
62
63

### Added
- `OptimizedProfile.convert` method to configure an optimized profile from a `Profile` without reallocating a new `P7_OPROFILE` struct.

### Changed
- Rewrite the `plan7.Pipeline` search loop to avoid reacquiring the GIL between reference sequences.
- Require the reference sequences to be stored in a collection (instead of an iterable) when passing them to the `search_hmm`, `search_msa` and `search_seq` methods of `plan7.Pipeline`.
- Avoid reallocating a new `OptimizedProfile` every time a new HMM is passed to `Pipeline.search_hmm`.
- Relax the GIL while sorting and thresholding `TopHits` in `Pipeline` search methods.
Martin Larralde's avatar
Martin Larralde committed
64
65
66


## [v0.4.4] - 2021-07-07
Martin Larralde's avatar
Martin Larralde committed
67
[v0.4.4]: https://github.com/althonos/pyhmmer/compare/v0.4.3...v0.4.4
Martin Larralde's avatar
Martin Larralde committed
68
69
70
71
72
73
74
75
76

### Added
- `ignore_gaps` parameter to `pyhmmer.plan7.SequenceFile`, allowing to skip the gap characters when reading a sequence from an ungapped format.
- `__sizeof__` implementation for some
- Dedicated check for sequence length before running the platform-specific code in `pyhmmer.plan7.Pipeline`.

### Fixed
- Score system not being set in `pyhmmer.plan7.Builder.build_msa`.
- Alphabet not being checked after the first sequence in `Pipeline` search and scan methods.
Martin Larralde's avatar
Martin Larralde committed
77
78
79
80
81
82
83
84


## [v0.4.3] - 2021-07-03
[v0.4.3]: https://github.com/althonos/pyhmmer/compare/v0.4.2...v0.4.3

### Fixed
- File object wrappers not reporting exceptions raised when seeking on OSX/BSD platforms.

Martin Larralde's avatar
Martin Larralde committed
85

Martin Larralde's avatar
Martin Larralde committed
86
87
88
89
90
91
92
93
94
95
96
97
## [v0.4.2] - 2021-06-20
[v0.4.2]: https://github.com/althonos/pyhmmer/compare/v0.4.1...v0.4.2

### Added
- `pyhmmer.easel.Randomness` class exposing a deterministic random number generator.
- `pyhmmer.plan7.Builder.randomness` and `pyhmmer.plan7.Pipeline.randomness` attributes exposing the internal random number generator used by each object.
- `pyhmmer.plan7.Hit.best_domain` property mapping to the highest scoring domain of a hit.
- `pyhmmer.plan7.OptimizedProfile.rbv` property exposing match scores.
- `pyhmmer.plan7.Domain.pvalue` and `pyhmmer.plan7.Hit.pvalue` reporting the p-value for a domain or hit bitscore.

### Fixed
- Dimensions of the `pyhmmer.plan7.OptimizedProfile.sbv` matrix not being properly set.
Martin Larralde's avatar
Martin Larralde committed
98

Martin Larralde's avatar
Martin Larralde committed
99

Martin Larralde's avatar
Martin Larralde committed
100
101
102
103
104
105
106
107
108
109
110
111
## [v0.4.1] - 2021-06-06
[v0.4.1]: https://github.com/althonos/pyhmmer/compare/v0.4.0...v0.4.1

### Fixed
- Main buffer not being freed in `MatrixF.__dealloc__` and `MatrixU8.__dealloc__` when created without owner.

### Added
- Additional configuration values for `pyhmmer.plan7.Pipeline` as both constructor arguments and mutable properties.
- `consensus`, `consensus_structure` and `offsets` properties to `pyhmmer.plan7.Profile` objects.

### Changed
- Make `OptimizedProfile.ssv_filter` check the alphabet of the given sequence.
Martin Larralde's avatar
Martin Larralde committed
112
113


114
## [v0.4.0] - 2021-06-05 - YANKED
Martin Larralde's avatar
Martin Larralde committed
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
[v0.4.0]: https://github.com/althonos/pyhmmer/compare/v0.3.1...v0.4.0

### Added
- Linear algebra primitives to expose 1D (`Vector`) and 2D (`Matrix`) contiguous buffers containing numerical values to `pyhmmer.easel`.
- Documentation for the `Z` and `domZ` parameters of the `pyhmmer.plan7.Pipeline` constructor.
- `pyhmmer.errors.AlphabetMismatch` exception deriving from `ValueError` to specifically report mismatching Easel alphabets where applicable.
- `scale` and `normalize` methods to `pyhmmer.plan7.HMM` objects.
- Property to access `pyhmmer.plan7.Background` residue frequencies as a `VectorF` object.
- Property to access `pyhmmer.plan7.HMM` mean residue composition as a `VectorF` object.
- Property to access `pyhmmer.plan7.HMM` probabilities and emissions as `MatrixF` objects.
- `ssv_filter` methods to `pyhmmer.plan7.OptimizedProfile` to get the SSV filter score of the profile for a given sequence.
- Several additional properties to access the `pyhmmer.plan7.OptimizedProfile` internals.

### Removed
- Unused `report_e` parameter of `pyhmmer.plan7.Pipeline` constructor.
- `pyhmmer.plan7.TopHits.clear` method which could lead to segfault if it was called while a `Hit` is being held.

### Changed
- Multithreaded loop in `pyhmmer.hmmer` to reduce memory consumption while still yielding hits in order.
- `pyhmmer.easel.DigitalSequence.sequence` property is now a `VectorU8`.

### Fixed
- Type annotations in `pyhmmer.hmmer`.
- Potential double free in `pyhmmer.plan7.HMM.command_line` property setter.
- Minor floating-point precision issues in `pyhmmer.plan7.Builder` constructor.
- Segfault in `TextMSA.digitize` caused by `esl_msa_Copy` not digitizing on-the-fly like `esl_sq_Copy`.
- Exceptions not being raised in some methods of `pyhmmer.plan7.Profile` and `pyhmmer.plan7.TopHits`.
Martin Larralde's avatar
Martin Larralde committed
142
143
144
145


## [v0.3.1] - 2021-05-08
[v0.3.1]: https://github.com/althonos/pyhmmer/compare/v0.3.0...v0.3.1
Martin Larralde's avatar
Martin Larralde committed
146

147
### Added
Martin Larralde's avatar
Martin Larralde committed
148
- `Pipeline.scan_seq` method to query a database of profiles with one or more sequences.
149
150
151
- `transition_probabilities`, `match_emissions`, `insert_emissions` properties to the `HMM` class, providing access to the numerical parameters of the HMM.
- `consensus_structure` and `consensus_accessibility` properties to the `HMM` class to get consensus lines from the source alignment if the HMM was created from a MSA.
- `nseq` and `nseq_effective` properties to the `HMM` class to get the number of training sequences and effective sequences used to build the HMM.
152

153
154
### Changed
- `HMM.checksum` is now `None` if the `p7H_CHKSUM` flag is not set.
Martin Larralde's avatar
Martin Larralde committed
155
- `Builder` methods will now record `sys.argv` when creating a HMM.
156

157
### Fixed
158
- `HMM.write(..., binary=False)` crashing on HMMs without a consensus line. ([#5](https://github.com/althonos/pyhmmer/issues/5)). Fixed upstream in ([EddyRivasLab/HMMER#236](https://github.com/EddyRivasLab/hmmer/pull/236)).
Martin Larralde's avatar
Martin Larralde committed
159
160
- `Pipeline.reset` mishandling the `Z` and `domZ` values if those were detected from the number of targets.
- `pyhmmer.hmmer` functions will not block until all results have been collected anymore when run in multithreaded mode.
161

Martin Larralde's avatar
Martin Larralde committed
162
163
164
165

## [v0.3.0] - 2021-03-11
[v0.3.0]: https://github.com/althonos/pyhmmer/compare/v0.2.2...v0.3.0

166
### Added
Martin Larralde's avatar
Martin Larralde committed
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
- `easel.MSAFile` to read from a file containing
- `accession`, `author`, `name` and `description` properties to `easel.MSA` objects.
- `plan7.Builder.build_msa` to build a pHMM from a sequence alignment.
- Additional methods to `easel.KeyHash`, allowing to use it as a `dict`/`set` hybrid.
- `Sequence.write` and `MSA.write` methods to format a sequence or an alignment to a file handle.
- `plan7.TopHist.to_msa` method to convert all the top hits of a query against a database into a multiple sequence alignment.
- `easel.MSA.sequences` attribute to access individual sequences of an alignment using the `collections.abc.Sequence` interface.
- `easel.DigitalMSA.textize` method to convert a multiple sequence alignment in digital mode to its text-mode counterpart.
- Read-only `name`, `accession` and `description` properties to `plan7.Profile` showing attributes inherited from the HMM it was configured with.
- `plan7.HMM.consensus` property, allowing to access the consensus sequence of a pHMM.
- `plan7.HMM` equality implementation, using zero tolerance.
- `plan7.Pipeline.search_msa` to query a MSA against a sequence database.
- `easel.Sequence.reverse_complement` method allowing to reverse-complement inplace or to build a copy.
- `errors.AlphabetMismatch` exception for use in cases where an alphabet is expected but not matched by the input.
- `hmmer.nhmmer` function with the same behaviour as `hmmer.phmmer`, except it expects inputs with a DNA alphabet.

### Fixed
- `plan7.Builder.copy` not copying some parameters correctly, causing `pyhmmer.hmmer.phmmer` to give inconsistent results in multithreaded mode.
- `easel.Bitfield` not properly handling index overflows.
- Documentation not rendering for the `__init__` method of all classes.

### Changed
- `plan7.Builder` gap-open and gap-extend probabilities are now set on instantiation and depend on the alphabet type.
- Constructors for `easel.TextMSA` and `easel.DigitalMSA`, which can now be given an iterable of `easel.Sequence` objects to store in the alignment.

### Removed
- Unimplemented `easel.SequenceFile.fetch` and `easel.SequenceFile.fetchinto` methods.
Martin Larralde's avatar
Martin Larralde committed
194
195
196
197
198
199
200
201


## [v0.2.2] - 2021-03-04
[v0.2.2]: https://github.com/althonos/pyhmmer/compare/v0.2.1...v0.2.2

### Fixed
- Linking issues on OSX caused by aggressive stripping of intermediate libraries.
- `plan7.Builder` RNG not reseeding between different HMMs.
Martin Larralde's avatar
Martin Larralde committed
202
203
204
205
206
207
208


## [v0.2.1] - 2021-01-29
[v0.2.1]: https://github.com/althonos/pyhmmer/compare/v0.2.0...v0.2.1

### Added
- `pyhmmer.plan7.HMM.checksum` property to get the 32-bit checksum of an HMM.
Martin Larralde's avatar
Martin Larralde committed
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233


## [v0.2.0] - 2021-01-21
[v0.2.0]: https://github.com/althonos/pyhmmer/compare/v0.1.4...v0.2.0

### Added
- `pyhmmer.plan7.Builder` class to handle building a HMM from a sequence.
- `Pipeline.search_seq` to query a sequence against a sequence database.
- `psutil` dependency to detect the most efficient thread count for `hmmsearch` based on the number of *physical* CPUs.
- `pyhmmer.hmmer.phmmer` function to run a search of query sequences against a sequence database.

### Changed
- `Pipeline.search` was renamed to `Pipeline.search_hmm` for disambiguation.
- `libeasel.random` sequences do not require the GIL anymore.
- Public API now have proper signature annotations.

### Fixed
- Inaccurate exception messages in `Pipeline.search_hmm`.
- Unneeded RNG reallocation, replaced with re-initialisation where possible.
- `SequenceFile.__next__` not working after being set in digital mode.
- `sequences` argument of `hmmsearch` now only requires a `typing.Collection[DigitalSequence]` instead of a `typing.Collection[Sequence]` (not more `__getitem__` needed).

### Removed
- `hits` argument to `Pipeline.search_hmm` to reduce risk of issues with `TopHits` reuse.
- Broken alignment coordinates on `Domain` classes.
Martin Larralde's avatar
Martin Larralde committed
234
235
236
237
238
239
240
241
242


## [v0.1.4] - 2021-01-15
[v0.1.4]: https://github.com/althonos/pyhmmer/compare/v0.1.3...v0.1.4

### Added
- `DigitalSequence.textize` to convert a digital sequence to a text sequence.
- `DigitalSequence.__init__` method allowing to create a digital sequence from any object implementing the buffer protocol.
- `Alignment.hmm_accession` property to retrieve the accession of the HMM in an alignment.
Martin Larralde's avatar
Martin Larralde committed
243
244
245
246
247
248
249


## [v0.1.3] - 2021-01-08
[v0.1.3]: https://github.com/althonos/pyhmmer/compare/v0.1.2...v0.1.3

### Fixed
- Compilation issues in OSX-specific Cython code.
Martin Larralde's avatar
Martin Larralde committed
250
251
252
253
254
255
256


## [v0.1.2] - 2021-01-07
[v0.1.2]: https://github.com/althonos/pyhmmer/compare/v0.1.1...v0.1.2

### Fixed
- Required Cython files not being included in source distribution.
Martin Larralde's avatar
Martin Larralde committed
257
258


259
260
## [v0.1.1] - 2020-12-02
[v0.1.1]: https://github.com/althonos/pyhmmer/compare/v0.1.0...v0.1.1
Martin Larralde's avatar
Martin Larralde committed
261

262
### Fixed
Martin Larralde's avatar
Martin Larralde committed
263
- `HMMFile` calling `file.peek` without arguments, causing it to crash when passed some types, e.g. `gzip.GzipFile`.
264
265
266
- `HMMFile` failing to work with PyPy file objects because of a bug with their implementation of `readinto`.
- C/Python file object implementation using `strcpy` instead of `memcpy`, causing issues when null bytes were read.

Martin Larralde's avatar
Martin Larralde committed
267
268
269
270
271
272
273
274
275
276
277

## [v0.1.0] - 2020-12-01
[v0.1.0]: https://github.com/althonos/pyhmmer/compare/v0.1.0-a5...v0.1.0

Initial beta release.

### Fixed
- `TextSequence` uses the sequence argument it's given on instantiation.
- Segmentation fault in `Sequence.__eq__` caused by implicit type conversion.
- Segmentation fault on `SequenceFile.read` failure.
- Missing type annotations for the `pyhmmer.easel` module.
Martin Larralde's avatar
Martin Larralde committed
278
279
280
281


## [v0.1.0-a5] - 2020-11-28
[v0.1.0-a5]: https://github.com/althonos/pyhmmer/compare/v0.1.0-a4...v0.1.0-a5
Martin Larralde's avatar
Martin Larralde committed
282

283
284
### Added
- `Sequence.__len__` magic method so that `len(seq)` returns the number of letters in `seq`.
Martin Larralde's avatar
Martin Larralde committed
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
- Python file-handle support when opening an `pyhmmer.plan7.HMMFile`.
- Context manager protocol to `pyhmmer.easel.SSIWriter`.
- Type annotations for `pyhmmer.easel.SSIWriter`.
- `add_alias` to `pyhmmer.easel.SSIWriter`.
- `write` method to `pyhmmer.plan7.OptimizedProfile` to write an optimized profile in binary format.
- `offsets` property to interact with the disk offsets of a `pyhmmer.plan7.OptimizedProfile` instance.
- `pyhmmer.hmmer.hmmpress` emulating the `hmmpress` binary from HMMER.
- `M` property to `pyhmmer.plan7.HMM` exposing the number of nodes in the model.

### Changed
- Bumped vendored Easel to `v0.48`.
- Bumped vendored HMMER to `v3.3.2`.
- `pyhmmer.plan7.HMMFile` will raise an `EOFError` when given an empty file.
- Renamed `length` property to `L` in `pyhmmer.plan7.Background`.

### Fixed
- Segmentation fault when `close` method of `pyhmmer.easel.SSIWriter` was called more than once.
- `close` method of `pyhmmer.easel.SSIWriter` not writing the index contents.

304

Martin Larralde's avatar
Martin Larralde committed
305
## [v0.1.0-a4] - 2020-11-24
306
[v0.1.0-a4]: https://github.com/althonos/pyhmmer/compare/v0.1.0-a3...v0.1.0-a4
Martin Larralde's avatar
Martin Larralde committed
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324

### Added
- `MSA`, `TextMSA` and `DigitalMSA` classes representing a multiple sequence alignment to `pyhmmer.easel`.
- Methods and protocol to copy a `Sequence` and a `MSA`.
- `pyhmmer.plan7.OptimizedProfile` wrapping a platform-specific optimized profile.
- `SSIReader` and `SSIWriter` classes interacting with sequence/subsequence indices to `pyhmmer.easel`.
- Exception handler using Python exceptions to report Easel errors.

### Changed
- `pyhmmer.hmmsearch` returns an iterator of `TopHits`, with one instance per `HMM` in the input.
- `pyhmmer.hmmsearch` properly raises errors happenning in the background threads without deadlock.
- `pyhmmer.plan7.Pipeline` recycles memory between `Pipeline.search` calls.

### Fixed
- Missing type annotations for the `pyhmmer.errors` module.

### Removed
- Unneeded or private methods from `pyhmmer.plan7`.
Martin Larralde's avatar
Martin Larralde committed
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349


## [v0.1.0-a3] - 2020-11-19
[v0.1.0-a3]: https://github.com/althonos/pyhmmer/compare/v0.1.0-a2...v0.1.0-a3

### Added
- `TextSequence` and `DigitalSequence` representing a `Sequence` in a given mode.
- E-value properties to `Hit` and `Domain`.
- `TopHits` now stores a reference to the pipeline it was obtained from.
- `Pipeline.Z` and `Pipeline.domZ` properties.
- Experimental pickling support to `Alphabet`.
- Experimental freelist to `Sequence` class to avoid allocation bottlenecks when iterating on a `SequenceFile` without recycling sequence buffers.

### Changed
- Made `Sequence` an abstract base class.
- Additional `Pipeline` parameters can be passed as keyword arguments to `pyhmmer.hmmsearch`.
- `SequenceFile.read` can now be configured to skip reading the metadata or the content of a sequence.

### Removed
- Redundant `SequenceFile` methods.

### Fixed
- `doctest` loader crashing on Python 3.5.
- `TopHits.threshold` segfaulting when being called without prior `Tophits.sort` call
- Unknown `format` argument to `SequenceFile` constructor not raising the right error.
Martin Larralde's avatar
Martin Larralde committed
350
351
352
353
354
355
356
357
358
359


## [v0.1.0-a2] - 2020-11-12
[v0.1.0-a2]: https://github.com/althonos/pyhmmer/compare/v0.1.0-a1...v0.1.0-a2

### Added
- Support for compilation on PowerPC big-endian platforms.
- Type annotations and stub files for Cython modules.

### Changed
360
361
- [`distutils`](https://docs.python.org/3/library/distutils.html) is now used to compile the package, instead of calling `autotools` and letting HMMER configure itself.
- `Bitfield.count` now allows passing an argument (for compatibility with [`collections.abc.Sequence`](https://docs.python.org/3/library/collections.abc.html#collections.abc.Sequence)).
Martin Larralde's avatar
Martin Larralde committed
362
363
364
365


## [v0.1.0-a1] - 2020-11-10
[v0.1.0-a1]: https://github.com/althonos/pyhmmer/compare/fe4c279...v0.1.0-a1
366

Martin Larralde's avatar
Martin Larralde committed
367
Initial alpha release (test deployment to PyPI).