Differential abundance testing after humann3

Dear All,
Thank you for the help forum. I am first time working on Metagenomics. I have a clarification on the downstream processing step.
I downloaded the IBDMDB
processed functioan profile 3.0 from (https://ibdmdb.org/tunnel/public/HMP2/WGS/1818/products). The downloaded files have gene_families with CPM normalization.

The next step is to perform differential analysis. I am trying to compare the results between aldex2 and simple wilcoxon test.

For simple wilcoxon test, Can I provide the CPM normalized counts directly as input or should i convert it into relative abundance?

and for Aldex2, it requirs raw counts as it perform internal clr transformation. Can i simply roundup the CPM counts and give as input?

If the above method is not appropriate, kindly suggest the suitable method
/
And is it possible to get the relative abundance of functional profiles for IBD dataset?

Thank you in advance

  • We don’t offer true count data outputs with HUMAnN (for reasons I’ve described in other posts). The closest we get are the initial RPK outputs from the software, which are similar to fold coverage and adjust the amount of read mass hitting a gene (or set of genes) according to their alignable length.

  • You could apply a CLR transformation to RPK-based data, since the CLR is an alternative to sum-normalization as a means to adjust for differences in sample read depth.

  • Note that the CPM-based data are a result of sum-normalizing RPKs to relative abundance and then multiplying by 1e6 to yield bigger numbers, so a Wilcoxon test on CPMs will be equivalent to a test on relative abundances.

  • Simply rounding CPMs to nearest integers in order to simulate counts might violate assumptions of a true count-based statistical method, so I would not recommend this approach.

  • MaAsLin 2 is appropriate for analysis of CPM-based data from HUMAnN.

2 Likes