Meta-Data for metaproteomics samples (Multi-omics of the gut microbial ecosystem in IBD)

Unfortunately, after looking at the HMP2 metadata file I still could not figure out how the raw file names are mapped to the entries in the metadata table file. For example there these raw files:

"161014_pool0802C-7.raw 161014_SM-6EFPH_534.raw 161014_SM-7I9I9_611.raw 161014_SM-7MCW3_571.raw 161014_SM-9JGD6_552.raw 161014_SM-ARGGH_608.raw 161014_SM-AZAHS_584.raw 161014_SM-CHS7O_541.raw
161014_pool0802C-8.raw 161014_SM-77FYM_616.raw 161014_SM-7L41Y_614.raw 161014_SM-7PAR7_568.raw 161014_SM-9WOHR_572.raw 161014_SM-AVAFD_585.raw 161014_SM-BYMCQ_589.raw
161014_SM-6CAJG_529.raw 161014_SM-7CS3Y_593.raw 161014_SM-7M8TF_602.raw 161014_SM-7T2LO_557.raw 161014_SM-A77X7_538.raw 161014_SM-AXQRR_588.raw 161014_SM-C1MZD_545.raw "

and then in the metadata file the only column that I find that is somewhat related is the “PDO Number” column where I can find raws with PDO Number values as being 161014, but I still can not figure out a way to relate these individual raw files to entries in the metadata table. For reference I am attaching the metadata subtable that I extracted that is just concerned with the proteomics datasets.

Furthermore in the paper its mentioned that 447 stool samples are sequenced for metaproteomics analysis, and in the metadata table t here are 451 entries whereeas in the FTP site I downloaded 641 raw files. I am really confused by this. I’m guessing some of the samples are fractionated and therefore they need to be combined later for analysis. Could you also tell me which files (file names) are fractionated and are coming from the same sample?

1 Like

Hi user,

Matching to the interior parts of those IDs in the “Tube B: Proteomics” field of the master metadata table would be a good way to relate these individual raw files to entries in the metadata table.

For e.g.
Raw sample: 161014_SM-6EFPH_534.raw

Tube B:Proteomics: SM-6EFPH SM-6EFPH SM-6EFPH SM-6EFPH
Project: M2028C4_MBX G79974 1369333 G79169
External ID: MSM5LLGR MSM5LLGR_P MSM5LLGR MSM5LLGR
Participant ID: M2028 M2028 M2028 M2028
site_sub_coll: M2028C4 M2028C4 M2028C4 M2028C4
data_type: metabolomics metagenomics proteomics stool_16S
week_num: 6.0 6.0 6.0 6.0
date_of_receipt: 2014-06-12 2014-06-12 2014-06-12 2014-06-12
interval_days: 13.0 13.0 13.0 13.0
visit_num: 7 7 7 7

I am looking into the second part of the question.

Regards,
Sagun

@sagunmaharjann yep I figured that part and by doing so I was able to match 458 of the raw files to their metadatas, however there are still 183 raw files (641 - 458) that I cannot figure out what subject they’re coming from. Thanks.

Hi @sagunmaharjann did you figure out the unmapped files metadata? for example there’s a dataset with the file name: 160825_pool0713-2.raw can you tell me the metadata for this file and how you extract it?

another example is for this file:

160901_pool0713-8.raw

There are a total of 184 such cases, please let me know how to get the metadata for these

@sagunmaharjann I still have not heard from you could you please give me a definitive answer about this issue with your datasets? Its been FAR too long unfortunately!!