Download and install
The current version is 1.2.6. Then follow their instructions (http://www.loucalab.com/archive/FAPROTAX/lib/php/index.php?section=Instructions) from which I learnt a lot.
No installation is needed. The whole package is basically a python script plus a text format data file. But make sure you have some Cython
and biom
.
$ pip install Cython
$ pip install biom-format
Options
The input file is different from PICRUST2. This one needs the taxonomy annotation, similar to the one I received from Rhonin. It is clearly explained in their instruction page under Taxonomy format in the input table
. See function ps2total_table
in amplicon_functions.R
for generating the input file from a phyloseq object.
Then we run the script.
$ path_to/FAPROTAX_1.2.6/collapse_table.py -i input_table.txt -o func_table.tsv -g path_to/FAPROTAX_1.2.6/FAPROTAX.txt -d “taxonomy” -c “#” -v –omit_columns 0 -r out_report.tsv -s sub_table –group_leftovers_as “Unassigned” -fHelp on the options:
-i, --input_table Path to input OTU table listing OTU abundances per sample, in classical (tabular) or BIOM format. By default columns should represent samples and rows should represent OTUs or taxa.
-g, --input_groups_file Path to FAPROTAX database file, or any other similar specification of groups by which to collapse the OTU table.
-o, --out_collapsed Path to output function table, listing functional group abundances per sample. (optional)
-r, --out_report Path to output report file, listing OTUs associated with each functional group and some other summary statistics (optional).
-d, --row_names_are_in_column Column listing the taxonomic paths in the input OTU table (if in classical format). If column names are available as a header (see option --column_names_are_in), this specifies a column name, otherwise it specifies a column index (first column is 0).
-s, --out_sub_tables_dir Path to output directory, to which sub-tables of the original OTU table (one per functional group) shall be saved. Each sub-table will only list OTUs included in the particular functional group. (optional)
--omit_columns Comma-separated list of any column indices to ignore in the input OTU table (if in classical format). For example, if the first column lists OTU IDs (not taxonomic paths), you should pass '--omit_columns 0', otherwise the first column will be treated as another sample.
--group_leftovers_as Optional group name for listing all OTUs not assigned to any functional group.
-f, --force (Flag) Replace all existing output files without warning.
My understanding on the options:
-i Discussed
-g Came with the package
-o Abundance matrix for downstream analysis. But why is this optional?
-r Recorded the verbose part and reported the grouping info of each ASV organized by functinal groups in -g.
-d name of the column that contains the taxonomy information.
-s A directory to hold seperate -o for groups found in -i.
--omit_columns Index of the columns that are neither taxonomy info nor sample, such as ASV ids or other metadata.
--group_leftovers_as In my data, only 20% of my ASVs were assigned to groups. Therefore it is worth identifying the rest. I'd like to call them "unassigned"
-f Very helpful
There is also a normalization option
-n, --normalize_collapsed How to normalize the output function table. Options include 'none' (no normalization, default), 'columns_before_collapsing' (TSS of the OTU table), 'columns_after_collapsing' (TSS of the function table), 'columns_before_collapsing_excluding_unassigned' (TSS of the OTU table restricted to functionally assigned OTUs).
TSS stands for total sum scaling, which divides feature read counts (the number of reads from a particular sample that cluster within the same OTU) by the total number of reads in each sample, i.e., it converts feature counts to appropriately scaled ratios. a.k.a naive percentage. I don’t think it is very helpful.
Output files
There are two output files.
The first one is --out_collapsed
, with rows for functions and columns for samples. This can be directly used for downstream analysis. However, my pipeline for functional composition is the same as taxonomic composition, therefore I need a stratified version (instead of collapsed), which can be generated by concatenating all the files in --out_sub_tables_dir
. Note that PICRUST2 can provide both collapsed and stratified results as well.
Downstream analyisis
To follow my plotting convention, I will make the collapsed output file into a phyloseq object and work from it. See picrust2_allthree.R
for details.