we are a bioinformatics research group of the University of Padua. We are interested in carrying out analyzes on wgs data in patients with inflammatory bowel disease. Following your research we have downloaded HMP2 data (https://ibdmdb.org/). Specifically, from the “download data” tab (https://ibdmdb.org/tunnel/public/summary.html) we downloaded:
- Metadata from “download HMP2 Metadata” button
- the “Merged Tables” relating to the row “HMP2-metagenomes-2018.18” (https://ibdmdb.org/tunnel/public/HMP2/WGS/1818/products).
The data for both the pathways and taxa abundances matrices contain the relative abundances of each subject. From the metadata we extracted subjects with an age greater than 18 and “reads_filtered” greater than 10^7. Then, we divided the taxa abundances matrix into two taxonomic levels of interest (species and genus) and we verified that the relative abundances of each subject add up to 1 in both cases. Next, we divided the pathway abundances matrix into community and species level. The sum of each subject’s relative community-level abundances is 1 as expected. The problem is that at the species level the relative abundances of each subject do not add up to 1 as we expected, but around 0.5.
- Are the matrices downloaded by us the output tables of HUMAnN 2.0 (with relative abundance output) without any post filtering?
- Is it correct to expect the relative abundances of species-level pathways to add to 1 for all subjects?
We thank you for your support.