Difference in sequencing depth

I have two batch of shotgun metagenomic sequences. Batch one has 20 samples with avg depth of 10 million sequences and batch 2 has 50 samples with avg depth of 25 million sequences. Will this difference in the sequencing depth cause a issue when I merge all the samples and do a diversity analysis (case vs control). Further is there a option like rarefying the sequences before diversity analysis like its there in qiime2?

Thank you

Kindly requesting some insights ..

Hi @KSK
yes, sequencing depth is a technical confounder, to make sure you’re not introducing any bias you can use the subsampling options. This is extracted from the documentation for metaphlan v4.2.* : It is possible to subsample the reads before the MetaPhlAn run by passing the number of reads to use (which must be < than the total number of reads of the sample) to --subsampling. In the following example, subsampling to 10,000 reads:

$ metaphlan metagenome.fastq --input_type fastq --subsampling 10000 -o profiled_metagenome_subsampled_10000.txt

Since MetaPhlAn 4.1.1, it is possible to use paired-end information during subsampling (above, paired-end reads would be treated as single-end, i.e., independent). For that, use --subsampling_paired instead:

metaphlan --subsampling_paired <N_PAIRED_READS> -1 <R1_FASTQ> -2 <R2_FASTQ> --input_type fastq --subsampling_out <SUBSAMPLED_READS_OUTPUT> -o <METAPHLAN_OUTPUT> --mapout <MAPOUT>
1 Like

Hi, a follow up question: if I specify --subsampling 10000 in my command, does this means it will subsample 5000 reads from R1 and 5000 reads from R2? or it 10000 reads from R1 and 10000 reads from R2?

Thank you