Queries regarding HMP2 project

We are a small group working on the government project titled " Effect of urbanization on gut microbiome, mycobiome and virome in patients with Inflammatory Bowel Disease from Northern India". We are following your excellent research and therefore downloaded the HMP2 data ( Lloyd-Price et al., 2019. https://doi.org/10.1038/s41586-019-1237-9) to study the distribution and abundance of ‘redox enzymes’ from different groups . Further, we have some queries and want your help in this regard.

1. The supplementary data provided on
https://ibdmdb.org/tunnel/public/summary.html in products files, is
normalized or not, if yes then which type of normalization has been
performed?

2. Why do some sample product folders don’t have ecs.tsv files?

We are facing some problems in the statistical analysis (normalization of data) and therefore need your help.

Hi Krishna,

Thanks for reaching out to bioBakery Lab.

  1. The different data types are normalized differently (usually used relative abundances, except for some data types).
  2. Only certain data types (mostly metagenomics + metatranscriptomics) are analyzed as ECs, hence the others having different feature types.

Hi sagunmaharjann,

  1. Some sample folders in metagenomics and metatranscriptomics data are also don’t have ecs.tsv files, is their any specific reason for that?(such as sample ID: CSM79HGV)
  2. For statistical validation did you used same normalized data or did you used different statistical method, such as Aitchison log ratio(clr,ilr,alr)

Thank you!

Hello, could you direct me to the explanations of the variables in the HMP2 metadata? For example, what does “interval_days” mean? Thank you!

interval_days mean interval between last sampling and recent sampling(the sampling in that row)