Hi,
We performed metagenomic sequencing on a set of samples, followed by functional annotation using GMM and MetaCyc databases based on the metagenomic data. The data processing was conducted by a company , which provided us with GMM and MetaCyc functional potential profiles derived from species-level profiles. Specifically, the GMM and MetaCyc profiles were generated based on counts, relative abundances, rarefied counts, and rarefied relative abundances, calculated by summing the values (counts or relative abundances) of all species annotated to each GMM or MetaCyc pathway. As a result, the “relative abundance” values in these profiles do not sum to 1 within a sample, because they represent the sum of the relative abundances of species annotated to a given functional pathway.
Maaslin3 worked very well for differential abundance and prevalence analysis across different taxonomic levels. However, I am uncertain how to correctly handle the functional profiles (MetaCyc and GMM) in Maaslin3.
Therefore, I have the following questions:
-
Since the MetaCyc and GMM relative abundance values I have are semi-compositional (if it is a correct term to use and it is not sum to 1 but is calculated based on relative abundance), is it appropriate to set normalization = “None” in Maaslin3 or not?
-
Is it correct to include the sample read depth (the total number of reads per sample, calculated as the sum of counts across all species) as a covariate also for MetaCyc and GMM in Maaslin3, in order to control for sequencing depth?
Kind regards, Pegah
Hi Pegah,
I’ll answer 2 first since it’s easier. For anything that’s not rarified to the same number of reads, you should include reads per sample as a covariate since increasing the reads makes detection more likely in the prevalence model.
For 1, I’m not sure that I follow what was done with the GMM and MetaCyc profiles. Can you give a short numeric example? Are these values still bounded between 0 and 1 as a proportion? Is it that species can contribute to multiple GMM and MetaCyc annotations and therefore each sample sums up to something like (total number of species) * (average proportion of a species’ genes mapped)? If that’s the case, it probably doesn’t fit a relative abundance framework, and you’re probably best off using normalization “None” and interpreting as best you can.
Will
Hi Will,
Thank you so much for taking the time to answer my question, it was a huge help. Your explanation was exactly right. One species contributes to multiple GMM and MetaCyc pathways.
For each sample, I received relative abundance values at the species level (TSS normalized), where the sum for each sample equals 1. For GMM and MetaCyc, the pathway abundances were calculated by summing the relative abundances of all species contributing to that pathway.
For example, in sample 1:
Species A and B contribute to pathway 1, so its abundance is 0.3 + 0.2 = 0.5.
Species A and C contribute to pathway 2, so its abundance is 0.3 + 0.5 = 0.8.
Species A , B and C contribute to pathway 3, so its abundance is 0.3 + 0.2 +0.5 = 1.
And so on.
Total sum of GMM and MetaCyc is not 1 anymore.
I’m sorry if my question is stupid. I’m not a bioinformatician, just a wet-lab scientist trying to learn these things on my own.
Kind regards,
Pegah
Thanks Pegah, this makes sense.
For the taxonomic profiles, I’d use MaAsLin 3 with its standard configuration (TSS, LOG, median comparison for abundance).
For the GMM and MetaCyc pathways, I’d do normalization NONE, LOG transformation, and median comparison off for both abundance and prevalence. The normalization none is because they’re no longer relative abundances, and the log transformation is because I expect they’ll still be pretty right skewed. The median comparison is off because they’re no longer relative abundances, so there’s no reason to use the median comparison framework to deal with compositionality. I think you would then interpret the results as “the aggregate species abundance contributing to the XXX pathway was [increased/decreased] in [condition A] compared to in [condition B].”
Will
Thank you so much for your help.
Kind regards,
Pegah