IBD data analysis

A consistent part of my thesis work consists in the analysis of metagenomics datasets and their taxonomic classification. In particular I’m interested in the study of human gut microbiome differences between healthy and IBD/UC/CD patients and understand them with statistical physics tools. In order to perform such analysis I’m collecting diverse datasets and the dataset used in your article “Gut microbiome structure and metabolic activity in inflammatory bowel disease” is a good candidate for the study. In particular we would like to know if it is possible to access diagnostic metadata in order to distinguish between control/healthy and clinic patients. In the ideal case we would like to obtain a labeling of the samples available online at ncbi [1] extending the metadata csv for the 220 samples, like the following:

RUN SUBJECT_ID CLINIC
SRR6468499 XYZ IBD/CONTROL
SRR6468500 XYT IBD/CONTROL

Please see the supporting information from the paper presenting this study:

https://www.nature.com/articles/s41564-018-0306-4

Let us know if you have trouble accessing it. Specifically Table S4 should allow you to link subject metadata to raw sequencing data via the sample “G” number (referred to as SRA_metagenome_name in the table).

Integrating SRA metadata with Table S4 with few pandas lines can lead to a nice labeling of each experimental sample. Personal observation for easier file handling: transpose the table S4, in order to get a new one where even patient ID is a feature like all the other. In this way it should be easier for pandas (or what you prefer) to read and manipulate them.