# πβ©π§¬ pyFastANI [](https://github.com/althonos/pyfastani/stargazers) *[Cython](https://cython.org/) bindings and Python interface to [FastANI](https://github.com/ParBLiSS/FastANI/), a method for fast whole-genome similarity estimation.* [](https://github.com/althonos/pyfastani/) [](https://github.com/althonos/pyfastani/issues) ## πΊοΈ Overview FastANI is a method published in 2018 by Jain *et al.* for high-throughput computATION of whole-genome [Average Nucleotide Identity (ANI)](https://img.jgi.doe.gov/docs/ANI.pdf). It uses [MashMap](https://github.com/marbl/MashMap) to compute orthologous mappings without the need for expensive alignments. `pyfastani` is a Python module, implemented using the [Cython](https://cython.org/) language, that provides bindings to FastANI. It directly interacts with the FastANI internals, which has the following advantages over CLI wrappers: - **simpler compilation**: FastANI requires several additional libraries, which make compilation of the original binary non-trivial. In pyFastANI, libraries that were needed for threading or I/O are provided as stubs, so you only need to have `boost::math` to build. Or even better, just install from one of the provided wheels! - **single dependency**: If your software or your analysis pipeline is distributed as a Python package, you can add `pyfastani` as a dependency to your project, and stop worrying about the FastANI binary being present on the end-user machine. - **sans I/O**: Everything happens in memory, in Python objects you control, making it easier to pass your sequences to FastANI without needing to write them to a temporary file. *This library is still a work-in-progress, and in an experimental stage, but it should already pack enough features to run one-to-one computations.* ## π‘ Example The following snippets show how to compute the ANI between two genomes, with the reference being a draft genome. For one-to-many or many-to-many searches, simply add additional references with `m.add_draft` before indexing. *Note that any name can be given to the reference sequences, this will just affect the `name` attribute of the hits returned for a query.* ### π¬ [Biopython](https://github.com/biopython/biopython) Biopython does not let us access to the sequence directly, so we need to convert it to bytes first with the `bytes` builtin function. For older versions of Biopython (earlier than 1.79), use `record.seq.encode()` instead of `bytes(record.seq).` ```python import pyfastani import Bio.SeqIO m = pyfastani.Mapper() # add a single draft genome to the mapper, and index it ref = list(Bio.SeqIO.parse("vendor/FastANI/data/Shigella_flexneri_2a_01.fna", "fasta")) m.add_draft("Shigella_flexneri_2a_01", (bytes(record.seq) for record in ref)) m.index() # read the query and query the mapper query = Bio.SeqIO.read("vendor/FastANI/data/Escherichia_coli_str_K12_MG1655.fna", "fasta") hits = m.query_sequence(bytes(query.seq)) for hit in hits: print("Escherichia_coli_str_K12_MG1655", hit.name, hit.identity, hit.matches, hit.fragments) ``` ### π§ͺ [Scikit-bio](https://github.com/biocore/scikit-bio) Scikit-bio lets us access to the sequence directly as a `numpy` array, but shows the values as byte strings by default. To make them readable as `char` (for compatibility with the C code), they must be cast with `seq.values.view('B')`. ```python import pyfastani import skbio.io m = pyfastani.Mapper() ref = list(skbio.io.read("vendor/FastANI/data/Shigella_flexneri_2a_01.fna", "fasta")) m.add_draft("Shigella_flexneri_2a_01", (seq.values.view('B') for seq in ref)) m.index() # read the query and query the mapper query = next(skbio.io.read("vendor/FastANI/data/Escherichia_coli_str_K12_MG1655.fna", "fasta")) hits = m.query_genome(query.values.view('B')) for hit in hits: print("Escherichia_coli_str_K12_MG1655", hit.name, hit.identity, hit.matches, hit.fragments) ``` ## π Feedback ### β οΈ Issue Tracker Found a bug ? Have an enhancement request ? Head over to the [GitHub issue tracker](https://github.com/althonos/pyFastANI/issues) if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation. ### ποΈ Contributing Contributions are more than welcome! See [`CONTRIBUTING.md`](https://github.com/althonos/pyFastANI/blob/master/CONTRIBUTING.md) for more details. ## βοΈ License This library is provided under the [MIT License](https://choosealicense.com/licenses/mit/). The FastANI code was written by [Chirag Jain](https://github.com/cjain7) and is distributed under the terms of the [Apache License 2.0](https://choosealicense.com/licenses/apache-2.0/) license, unless otherwise specified in vendored sources. See `vendor/FastANI/LICENSE` for more information. *This project is in no way not affiliated, sponsored, or otherwise endorsed by the [original FastANI authors](https://github.com/cjain7). It was developed by [Martin Larralde](https://github.com/althonos/) during his PhD project at the [European Molecular Biology Laboratory](https://www.embl.de/) in the [Zeller team](https://github.com/zellerlab).*