🧬 Join us in Boston for the 2026 bioBakery Workshop
(Cambridge, MA · July 20–24) —
Learn more & apply
🔬 We’re also delighted to welcome you to the
8th Annual HCMPH Symposium
(May 18, 2026) —
Translating the Microbiome: Turning Discovery into Implementation
View symposium details
I have two batch of shotgun metagenomic sequences. Batch one has 20 samples with avg depth of 10 million sequences and batch 2 has 50 samples with avg depth of 25 million sequences. Will this difference in the sequencing depth cause a issue when I merge all the samples and do a diversity analysis (case vs control). Further is there a option like rarefying the sequences before diversity analysis like its there in qiime2?
Hi @KSK
yes, sequencing depth is a technical confounder, to make sure you’re not introducing any bias you can use the subsampling options. This is extracted from the documentation for metaphlan v4.2.* : It is possible to subsample the reads before the MetaPhlAn run by passing the number of reads to use (which must be < than the total number of reads of the sample) to --subsampling. In the following example, subsampling to 10,000 reads:
Since MetaPhlAn 4.1.1, it is possible to use paired-end information during subsampling (above, paired-end reads would be treated as single-end, i.e., independent). For that, use --subsampling_paired instead: