10. Command-line interface

10.1. ProPhyle – the main program

prophyle

$ prophyle  -h

usage: prophyle.py [-h] [-v]  ...

Program: prophyle (phylogeny-based metagenomic classification)
Version: 0.3.1.0
Authors: Karel Brinda, Kamil Salikhov, Simone Pignotti, Gregory Kucherov
Contact: kbrinda@hsph.harvard.edu

Usage:   prophyle <command> [options]

optional arguments:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit

subcommands:

    download     download a genomic database
    index        build index
    classify     classify reads
    analyze      analyze results (experimental)
    compress     compress a ProPhyle index
    decompress   decompress a compressed ProPhyle index
    compile      compile auxiliary ProPhyle programs

prophyle download

$ prophyle download -h

usage: prophyle.py download [-h] [-d DIR] [-l STR] [-F] [-c [STR [STR ...]]]
                            <library> [<library> ...]

positional arguments:
  <library>           genomic library ['bacteria', 'viruses', 'plasmids',
                      'hmp', 'all']

optional arguments:
  -h, --help          show this help message and exit
  -d DIR              directory for the tree and the sequences [~/prophyle]
  -l STR              log file
  -F                  rewrite library files if they already exist
  -c [STR [STR ...]]  advanced configuration (a JSON dictionary)

prophyle index

$ prophyle index -h

usage: prophyle.py index [-h] [-g DIR] [-j INT] [-k INT] [-l STR] [-s FLOAT]
                         [-F] [-M] [-P] [-K] [-T] [-A] [-c [STR [STR ...]]]
                         <tree.nw> [<tree.nw> ...] <index.dir>

positional arguments:
  <tree.nw>           phylogenetic tree (in Newick/NHX)
  <index.dir>         index directory (will be created)

optional arguments:
  -h, --help          show this help message and exit
  -g DIR              directory with the library sequences [dir. of the first
                      tree]
  -j INT              number of threads [auto (4)]
  -k INT              k-mer length [31]
  -l STR              log file [<index.dir>/log.txt]
  -s FLOAT            rate of sampling of the tree [no sampling]
  -F                  rewrite index files if they already exist
  -M                  mask repeats/low complexity regions (using DustMasker)
  -P                  do not add prefixes to node names when multiple trees
                      are used
  -K                  skip k-LCP construction (then restarted search only)
  -T                  keep temporary files from k-mer propagation
  -A                  autocomplete tree (names of internal nodes and FASTA
                      paths)
  -c [STR [STR ...]]  advanced configuration (a JSON dictionary)

prophyle classify

$ prophyle classify -h

usage: prophyle.py classify [-h] [-k INT] [-m {h1,c1,h2,c2}] [-f {kraken,sam}]
                            [-l STR] [-P] [-A] [-L] [-X] [-M] [-C] [-K]
                            [-c [STR [STR ...]]]
                            <index.dir> <reads1.fq> [<reads2.fq>]

positional arguments:
  <index.dir>         index directory
  <reads1.fq>         first file with reads in FASTA/FASTQ (- for standard
                      input)
  <reads2.fq>         second file with reads in FASTA/FASTQ

optional arguments:
  -h, --help          show this help message and exit
  -k INT              k-mer length [detect automatically from the index]
  -m {h1,c1,h2,c2}    measure: h1=hit count, c1=coverage, h2=norm.hit count,
                      c2=norm.coverage [h1]
  -f {kraken,sam}     output format [sam]
  -l STR              log file
  -P                  incorporate sequences and qualities into SAM records
  -A                  annotate assignments (using tax. information from NHX)
  -L                  replace read assignments by their LCA
  -X                  replace k-mer matches by their LCA
  -M                  mimic Kraken (equivalent to "-m h1 -f kraken -L -X")
  -C                  use C++ impl. of the assignment algorithm (experimental)
  -K                  force restarted search mode
  -c [STR [STR ...]]  advanced configuration (a JSON dictionary)

prophyle analyze

$ prophyle analyze -h

usage: prophyle.py analyze [-h] [-s ['w', 'u', 'wl', 'ul']]
                           [-f ['sam', 'bam', 'cram', 'uncompressed_bam', 'kraken', 'histo']]
                           [-c [STR [STR ...]]]
                           {index_dir, tree.nw} <out.pref> <classified.bam>
                           [<classified.bam> ...]

positional arguments:
  {index_dir, tree.nw}     index directory or phylogenetic tree
  <out.pref>               output prefix
  <classified.bam>         classified reads (use '-' for stdin)

optional arguments:
  -h, --help               show this help message and exit
  -s ['w', 'u', 'wl', 'ul']
                           statistics to use for the computation of
                           histograms: w (default) => weighted assignments; u
                           => unique assignments, non-weighted; wl => weighted
                           assignments, propagated to leaves; ul => unique
                           assignments, propagated to leaves.
  -f ['sam', 'bam', 'cram', 'uncompressed_bam', 'kraken', 'histo']
                           Input format of assignments [auto]
  -c [STR [STR ...]]       advanced configuration (a JSON dictionary)

prophyle compress

$ prophyle compress -h

usage: prophyle.py compress [-h] [-c [STR [STR ...]]]
                            <index.dir> [<archive.tar.gz>]

positional arguments:
  <index.dir>         index directory
  <archive.tar.gz>    output archive [<index.dir>.tar.gz]

optional arguments:
  -h, --help          show this help message and exit
  -c [STR [STR ...]]  advanced configuration (a JSON dictionary)

prophyle decompress

$ prophyle decompress -h

usage: prophyle.py decompress [-h] [-K] [-c [STR [STR ...]]]
                              <archive.tar.gz> [<output.dir>]

positional arguments:
  <archive.tar.gz>    output archive
  <output.dir>        output directory [./]

optional arguments:
  -h, --help          show this help message and exit
  -K                  skip k-LCP construction
  -c [STR [STR ...]]  advanced configuration (a JSON dictionary)

10.2. prophyle_assembler

$ prophyle_assembler -h


Program:  prophyle_assembler (greedy assembler for ProPhyle)
Contact:  Karel Brinda <karel.brinda@gmail.com>

Usage:    prophyle_assembler [options]

Examples: prophyle_assembler -k 15 -i f1.fa -i f2.fa -x fx.fa
             - compute intersection of f1 and f2
          prophyle_assembler -k 15 -i f1.fa -i f2.fa -x fx.fa -o g1.fa -o g2.fa
             - compute intersection of f1 and f2, and subtract it from them
          prophyle_assembler -k 15 -i f1.fa -o g1.fa
             - re-assemble f1 to g1

Command-line parameters:
 -k INT   K-mer size.
 -i FILE  Input FASTA file (can be used multiple times).
 -o FILE  Output FASTA file (if used, must be used as many times as -i).
 -x FILE  Compute intersection, subtract it, save it.
 -s FILE  Output file with k-mer statistics.
 -S       Silent mode.

Note that '-' can be used for standard input/output.

10.3. prophyle_index

$ prophyle_index -h


Program: prophyle_index (alignment of k-mers)
Contact: Kamil Salikhov <kamil.salikhov@univ-mlv.fr>

Usage:   prophyle_index command [options]

Command: build     construct index
         query     query reads against index
$ prophyle_index build -h


Usage:   prophyle_index build <prefix>

Options: -k INT    length of k-mer
         -s        construct k-LCP and SA in parallel
         -i        sampling distance for SA
$ prophyle_index query -h


Usage:   prophyle_index query [options] <prefix> <in.fq>

Options: -k INT    length of k-mer
         -u        use k-LCP for querying
         -v        output set of chromosomes for every k-mer
         -p        do not check whether k-mer is on border of two contigs, and show such k-mers in output
         -b        print sequences and base qualities
         -l STR    log file name to output statistics
         -t INT    number of threads [1]

10.4. prophyle_assignment

$ prophyle_assignment.py -h

usage: prophyle_assignment.py [-h] [-f {kraken,sam}] [-m {h1,c1,c2,h2}] [-A]
                              [-L] [-X] [-c [STR [STR ...]]]
                              <tree.nhx> <k> <assignments.txt>

Implementation of assignment algorithm

positional arguments:
  <tree.nhx>          phylogenetic tree (Newick/NHX)
  <k>                 k-mer length
  <assignments.txt>   assignments in generalized Kraken format

optional arguments:
  -h, --help          show this help message and exit
  -f {kraken,sam}     format of output [sam]
  -m {h1,c1,c2,h2}    measure: h1=hit count, c1=coverage, h2=norm.hit count,
                      c2=norm.coverage [h1]
  -A                  annotate assignments
  -L                  use LCA when tie (multiple assignments with the same
                      score)
  -X                  use LCA for k-mers (multiple hits of a k-mer)
  -c [STR [STR ...]]  configuration (a JSON dictionary)

10.5. prophyle_analyze.py

$ prophyle_analyze.py -h

usage: prophyle_analyze.py [-h] [-s ['w', 'u', 'wl', 'ul']]
                           [-f ['sam', 'bam', 'cram', 'uncompressed_bam', 'kraken', 'histo']]
                           {index_dir, tree.nw} <out_prefix> <input_fn>
                           [<input_fn> ...]

Program: prophyle_analyze.py

Analyze results of ProPhyle's classification.
Stats:
w: weighted assignments
u: unique assignments (ignore multiple assignments)
wl: weighted assignments, propagated to leaves
ul: unique assignments, propagated to leaves

positional arguments:
  {index_dir, tree.nw}  Index directory or phylogenetic tree
  <out_prefix>          Prefix for output files (the complete file names will
                        be <out_prefix>_rawhits.tsv for the raw hit counts
                        table and <out_prefix>_otu.tsv for the otu table)
  <input_fn>            ProPhyle output files whose format is chosen with the
                        -f option. Use '-' for stdin or multiple files with
                        the same format (one per sample)

optional arguments:
  -h, --help            show this help message and exit
  -s ['w', 'u', 'wl', 'ul']
                        Statistics to use for the computation of histograms: w
                        (default) => weighted assignments; u => unique
                        assignments, non-weighted; wl => weighted assignments,
                        propagated to leaves; ul => unique assignments,
                        propagated to leaves.
  -f ['sam', 'bam', 'cram', 'uncompressed_bam', 'kraken', 'histo']
                        Input format of assignments [auto]. If 'histo' is
                        selected the program expects hit count histograms
                        (*_rawhits.tsv) previously computed using prophyle
                        analyze, it merges them and compute OTU table from the
                        result (assignment files are not required)

10.6. prophyle_propagation_makefile.py

$ prophyle_propagation_makefile.py -h

usage: prophyle_propagation_makefile.py [-h] -k int
                                        <tree.nw> <library.dir> <output.dir>
                                        <Makefile>

Create Makefile for parallelized ProPhyle k-mer propagation.

positional arguments:
  <tree.nw>      phylogenetic tree (in Newick/NHX).
  <library.dir>  directory with the library
  <output.dir>   output directory for the index
  <Makefile>     output Makefile

optional arguments:
  -h, --help     show this help message and exit
  -k int         k-mer length

10.7. prophyle_propagation_preprocessing.py

$ prophyle_propagation_preprocessing.py -h

usage: prophyle_propagation_preprocessing.py [-h] [-s FLOAT] [-A] [-V] [-P]
                                             <in_tree.nw{@node_name}>
                                             [<in_tree.nw{@node_name}> ...]
                                             <out_tree.nw>

Merge multiple ProPhyle trees. Specific subtrees might be extracted before merging. Examples:
        $ prophyle_merge_trees.py ~/prophyle/bacteria.nw ~/prophyle/viruses.nw bv.nw
        $ prophyle_merge_trees.py ~/prophyle/bacteria.nw@562 ecoli.nw

positional arguments:
  <in_tree.nw{@node_name}>
                        input tree
  <out_tree.nw>         output tree

optional arguments:
  -h, --help            show this help message and exit
  -s FLOAT              rate of sampling the tree [no sampling]
  -A                    autocomplete tree (names of internal nodes and FASTA paths)
  -V                    verbose
  -P                    do not add prefixes to node names

10.8. prophyle_propagation_postprocessing.py

$ prophyle_propagation_postprocessing.py -h

usage: prophyle_propagation_postprocessing.py [-h]
                                              <propagation.dir> <index.fa>
                                              <in.tree.nw> <counts.tsv>
                                              <out.tree.nw>

K-mer propagation postprocessing: merging FASTA files and k-mer annotation.

positional arguments:
  <propagation.dir>  directory with FASTA files
  <index.fa>         output fast file
  <in.tree.nw>       input phylogenetic tree
  <counts.tsv>       input phylogenetic tree
  <out.tree.nw>      output phylogenetic tree

optional arguments:
  -h, --help         show this help message and exit

10.9. prophyle_validate_tree.py

$ prophyle_validate_tree.py -h

usage: prophyle_validate_tree.py [-h] <tree.nw> [<tree.nw> ...]

Verify a Newick/NHX tree

positional arguments:
  <tree.nw>   phylogenetic tree (in Newick/NHX)

optional arguments:
  -h, --help  show this help message and exit