FAPROTAX | Huan Fan

Download and install

The current version is 1.2.6. Then follow their instructions (http://www.loucalab.com/archive/FAPROTAX/lib/php/index.php?section=Instructions) from which I learnt a lot.

No installation is needed. The whole package is basically a python script plus a text format data file. But make sure you have some Cython and biom.

$ pip install Cython
$ pip install biom-format

Options

The input file is different from PICRUST2. This one needs the taxonomy annotation, similar to the one I received from Rhonin. It is clearly explained in their instruction page under Taxonomy format in the input table. See function ps2total_table in amplicon_functions.R for generating the input file from a phyloseq object.

Then we run the script.

$ path_to/FAPROTAX_1.2.6/collapse_table.py -i input_table.txt -o func_table.tsv -g path_to/FAPROTAX_1.2.6/FAPROTAX.txt -d “taxonomy” -c “#” -v –omit_columns 0 -r out_report.tsv -s sub_table –group_leftovers_as “Unassigned” -f

Help on the options:

	-i, --input_table		Path to input OTU table listing OTU abundances per sample, in classical (tabular) or BIOM format. By default columns should represent samples and rows should represent OTUs or taxa. 
	-g, --input_groups_file		Path to FAPROTAX database file, or any other similar specification of groups by which to collapse the OTU table.
	-o, --out_collapsed		Path to output function table, listing functional group abundances per sample. (optional)
	-r, --out_report		Path to output report file, listing OTUs associated with each functional group and some other summary statistics (optional).
	-d, --row_names_are_in_column		Column listing the taxonomic paths in the input OTU table (if in classical format). If column names are available as a header (see option --column_names_are_in), this specifies a column name, otherwise it specifies a column index (first column is 0).
	-s, --out_sub_tables_dir		Path to output directory, to which sub-tables of the original OTU table (one per functional group) shall be saved. Each sub-table will only list OTUs included in the particular functional group. (optional)
	--omit_columns		Comma-separated list of any column indices to ignore in the input OTU table (if in classical format). For example, if the first column lists OTU IDs (not taxonomic paths), you should pass '--omit_columns 0', otherwise the first column will be treated as another sample.
	--group_leftovers_as		Optional group name for listing all OTUs not assigned to any functional group.
	-f, --force		(Flag) Replace all existing output files without warning.

My understanding on the options:

	-i	Discussed
	-g	Came with the package
	-o	Abundance matrix for downstream analysis. But why is this optional?
	-r	Recorded the verbose part and reported the grouping info of each ASV organized by functinal groups in -g.
	-d	name of the column that contains the taxonomy information.
	-s	A directory to hold seperate -o for groups found in -i.
	--omit_columns	Index of the columns that are neither taxonomy info nor sample, such as ASV ids or other metadata.
	--group_leftovers_as	In my data, only 20% of my ASVs were assigned to groups. Therefore it is worth identifying the rest. I'd like to call them "unassigned"
	-f	Very helpful

There is also a normalization option

	-n, --normalize_collapsed		How to normalize the output function table. Options include 'none' (no normalization, default), 'columns_before_collapsing' (TSS of the OTU table), 'columns_after_collapsing' (TSS of the function table), 'columns_before_collapsing_excluding_unassigned' (TSS of the OTU table restricted to functionally assigned OTUs).

TSS stands for total sum scaling, which divides feature read counts (the number of reads from a particular sample that cluster within the same OTU) by the total number of reads in each sample, i.e., it converts feature counts to appropriately scaled ratios. a.k.a naive percentage. I don’t think it is very helpful.

Output files

There are two output files.

The first one is --out_collapsed, with rows for functions and columns for samples. This can be directly used for downstream analysis. However, my pipeline for functional composition is the same as taxonomic composition, therefore I need a stratified version (instead of collapsed), which can be generated by concatenating all the files in --out_sub_tables_dir. Note that PICRUST2 can provide both collapsed and stratified results as well.

Downstream analyisis

To follow my plotting convention, I will make the collapsed output file into a phyloseq object and work from it. See picrust2_allthree.R for details.

Huan Fan / 2022-12-04
Published under (CC) BY-NC-SA in categories analysis tagged with amplicon microbiome