I have two batch of shotgun metagenomic sequences. Batch one has 20 samples with avg depth of 10 million sequences and batch 2 has 50 samples with avg depth of 25 million sequences. Will this difference in the sequencing depth cause a issue when I merge all the samples and do a diversity analysis (case vs control). Further is there a option like rarefying the sequences before diversity analysis like its there in qiime2?
Hi @KSK
yes, sequencing depth is a technical confounder, to make sure you’re not introducing any bias you can use the subsampling options. This is extracted from the documentation for metaphlan v4.2.* : It is possible to subsample the reads before the MetaPhlAn run by passing the number of reads to use (which must be < than the total number of reads of the sample) to --subsampling. In the following example, subsampling to 10,000 reads:
Since MetaPhlAn 4.1.1, it is possible to use paired-end information during subsampling (above, paired-end reads would be treated as single-end, i.e., independent). For that, use --subsampling_paired instead: