Difference in sequencing depth

I have two batch of shotgun metagenomic sequences. Batch one has 20 samples with avg depth of 10 million sequences and batch 2 has 50 samples with avg depth of 25 million sequences. Will this difference in the sequencing depth cause a issue when I merge all the samples and do a diversity analysis (case vs control). Further is there a option like rarefying the sequences before diversity analysis like its there in qiime2?

Thank you

Kindly requesting some insights ..

Hi @KSK
yes, sequencing depth is a technical confounder, to make sure you’re not introducing any bias you can use the subsampling options. This is extracted from the documentation for metaphlan v4.2.* : It is possible to subsample the reads before the MetaPhlAn run by passing the number of reads to use (which must be < than the total number of reads of the sample) to --subsampling. In the following example, subsampling to 10,000 reads:

$ metaphlan metagenome.fastq --input_type fastq --subsampling 10000 -o profiled_metagenome_subsampled_10000.txt

Since MetaPhlAn 4.1.1, it is possible to use paired-end information during subsampling (above, paired-end reads would be treated as single-end, i.e., independent). For that, use --subsampling_paired instead:

metaphlan --subsampling_paired <N_PAIRED_READS> -1 <R1_FASTQ> -2 <R2_FASTQ> --input_type fastq --subsampling_out <SUBSAMPLED_READS_OUTPUT> -o <METAPHLAN_OUTPUT> --mapout <MAPOUT>
1 Like