Metagenomic raw data size

Sheng · November 2, 2020, 3:34pm

Dear bioBakery team,

I have downloaded the raw data of the metagenomic data (MGX) from ftp://ftp.broadinstitute.org/raw/HMP2/MGX/2018-05-04/*.tar.
But I found the data size varies greatly. For example:
-rw-rw-r-- 1 carze broad 1228011520 May 4 2018 HSM6XRR3.tar
-rw-rw-r-- 1 carze broad 334336000 May 4 2018 HSM6XRR5.tar
-rw-rw-r-- 1 carze broad 681635840 May 4 2018 HSM6XRR7.tar
-rw-rw-r-- 1 carze broad 143360 May 4 2018 HSM6XRR9.tar
-rw-rw-r-- 1 carze broad 1518469120 May 4 2018 HSM6XRRB.tar

Prior to Illumina sequencing, were the libraries size of each sample close to each other? Or because the data on the HMP2 servers are clean data, and some low-quality reads and human reads were removed, leading to the difference in data size?
Thank you for your time.

Best regards
Sheng

franzosa · November 2, 2020, 3:53pm

While the target seq depths were similar, that doesn’t always produce the same number of reads per sample in the end. That, combined with the qc procedures, explains the variance in file size / read count.

Topic		Replies	Views
Greetings and regardes Data resource	1	213	November 3, 2023
Issues downloading Metagenomics data IBDMDB	12	627	August 11, 2023
Questions about ibdmdb datasets IBDMDB	1	576	April 20, 2021
Wgs files not complete? IBDMDB	7	222	September 29, 2025
Missing MGX samples on download page Data resource	0	253	March 4, 2022

Metagenomic raw data size

Related topics