Welcome to ProPhyle
ProPhyle brings metagenomic classification from clusters to laptops. This is possible thanks to a novel indexing strategy, based on the bottom-up propagation of k-mers in the phylogenetic/taxonomic tree, assembling contigs at each node and matching using a full-text search.
Compared to other state-of-the-art classifiers, ProPhyle provides several unique features:
- Low memory requirements. Compared to Kraken, ProPhyle has 7x smaller memory footprint for index construction and 5x smaller footprint for querying, while providing a more expressive index.
- Flexibility. ProPhyle is easy to use with any user-provided phylogenetic trees and reference genomics sequences (e.g., reads or assemblies). It can classify short reads, long reads, or even assembled contigs.
- Standard bioinformatics formats. Newick/NHX is used for representing phylogenetic trees and SAM/BAM for reporting assignments.
- Lossless k-mer indexing. ProPhyle stores a list of all genomes containing a k-mer. The classification is, therefore, accurate even with trees containing similar genomes (e.g, phylogenetic trees for a single species).
- Reproducibility. ProPhyle is fully deterministic, with a mathematically well-defined behavior. Databases are versioned and distributed via Zenodo.
Documentation
Quick example |
Contents |
Search page |
Releases
|
Auxiliary tools
ProphEx |
ProphAsm |
Cite
[1] K. Břinda, L. Lima, S. Pignotti, N. Quinones-Olvera, K. Salikhov, R. Chikhi, G. Kucherov, Z. Iqbal, and M. Baym, Efficient and robust search of microbial genomes via phylogenetic compression, bioRxiv 2023.04.15.536996, 2023. https://doi.org/10.1101/2023.04.15.536996.
[2] Břinda K, Salikhov K, Pignotti S, Kucherov G. ProPhyle 0.3.1.0, Zenodo, 2017. https://doi.org/10.5281/zenodo.1045429.
[3] Břinda K, Salikhov K, Pignotti S, Kucherov G. ProPhyle: a phylogeny-based metagenomic classifier using the Burrows-Wheeler Transform. Poster at HiTSeq 2017. https://doi.org/10.5281/zenodo.1045427
[4] Břinda K. Novel computational techniques for mapping and classifying Next-Generation Sequencing data. PhD Thesis, Université Paris-Est, 2016. https://doi.org/10.5281/zenodo.1045317
[5] Salikhov K. Efficient algorithms and data structures for indexing DNA sequence data. PhD Thesis, Université Paris-Est, 2017.
[1] introduces phylogenetic compression, which is the fundamental concept behind ProPhyle, [2] is the main reference for the entire ProPhyle package, [3] contains a summary of the ProPhyle algorithm, [4] provides a thorough description (see Chapter 12), and [5] explains details of the BWT-indexing technique.
Authors
- Karel Břinda <karel.brinda@inria.fr>
- Kamil Salikhov <kamil.salikhov@univ-mlv.fr>
- Simone Pignotti <pignottisimone@gmail.com>
- Gregory Kucherov <gregory.kucherov@univ-mlv.fr>