Welcome to ProPhyle

ProPhyle brings metagenomic classification from clusters to laptops. This is possible thanks to a novel indexing strategy, based on the bottom-up propagation of k-mers in the phylogenetic/taxonomic tree, assembling contigs at each node and matching using a full-text search.

Compared to other state-of-the-art classifiers, ProPhyle provides several unique features:

  • Low memory requirements. Compared to Kraken, ProPhyle has 7x smaller memory footprint for index construction and 5x smaller footprint for querying, while providing a more expressive index.
  • Flexibility. ProPhyle is easy to use with any user-provided phylogenetic trees and reference genomics sequences (e.g., reads or assemblies). It can classify short reads, long reads, or even assembled contigs.
  • Standard bioinformatics formats. Newick/NHX is used for representing phylogenetic trees and SAM/BAM for reporting assignments.
  • Lossless k-mer indexing. ProPhyle stores a list of all genomes containing a k-mer. The classification is, therefore, accurate even with trees containing similar genomes (e.g, phylogenetic trees for a single species).
  • Reproducibility. ProPhyle is fully deterministic, with a mathematically well-defined behavior. Databases are versioned and distributed via Zenodo.

Documentation

Auxiliary tools

Cite

[1] K. Břinda, L. Lima, S. Pignotti, N. Quinones-Olvera, K. Salikhov, R. Chikhi, G. Kucherov, Z. Iqbal, and M. Baym, Efficient and robust search of microbial genomes via phylogenetic compression, bioRxiv 2023.04.15.536996, 2023. https://doi.org/10.1101/2023.04.15.536996.

[2] Břinda K, Salikhov K, Pignotti S, Kucherov G. ProPhyle 0.3.1.0, Zenodo, 2017. https://doi.org/10.5281/zenodo.1045429.

[3] Břinda K, Salikhov K, Pignotti S, Kucherov G. ProPhyle: a phylogeny-based metagenomic classifier using the Burrows-Wheeler Transform. Poster at HiTSeq 2017. https://doi.org/10.5281/zenodo.1045427

[4] Břinda K. Novel computational techniques for mapping and classifying Next-Generation Sequencing data. PhD Thesis, Université Paris-Est, 2016. https://doi.org/10.5281/zenodo.1045317

[5] Salikhov K. Efficient algorithms and data structures for indexing DNA sequence data. PhD Thesis, Université Paris-Est, 2017.

[1] introduces phylogenetic compression, which is the fundamental concept behind ProPhyle, [2] is the main reference for the entire ProPhyle package, [3] contains a summary of the ProPhyle algorithm, [4] provides a thorough description (see Chapter 12), and [5] explains details of the BWT-indexing technique.

Authors