Questions about ibdmdb datasets

I have been working on the development of statistical models for microbiome data analysis. Recently, I am developing a statistical model for omics data analysis with one of my students.

While looking for interesting datasets for our model’s illustration, I found the datasets used for your paper, Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases (nature, 2019). Especially, I am interested in the datasets of metagenomics and metatranscriptomics, which showed high association in the paper. I see merged tables available from for the metagenomics and metatranscriptomics datasets.

I see that your merged tables have relative abundances estimated by MetaPhlAn for the metagenomics data and RPKs for the metatranscriptomics data

I wonder if you have data in estimated or raw counts. I think how to normalize sample’s sequencing depth may affect final inferences. I found from my experience in analyzing 16S sequencing data. Also, our method dose model-based sample normalization, and the normalization prior to analysis is not required. Also, working with counts gives us more flexibility. Can I find count data from the website?

Hi user,
The MetaPhlAn estimates can be “back-calculated” to count-like values based on the sequencing depths, unfortunately, there is no simple count interpretation like there is for 16S since the reference sequences are of different lengths (unlike amplicons).

Also, the raw data itself is available on the following pages:
MTX Raw Files | IBDMDB
Metagenomes Raw Files | IBDMDB