Normalising the input reads of the samples

saras22 · May 12, 2021, 6:22am

Hi Biobakery_forum
I have analyzed my shotgun metagenomic datasets(.fastq.gz) files of varying sizes with MetaPhlAn3. So, I have got different number of clades for different files. Now I my doubt is that are the read files normalized before analysis or I have to do it before analyzing? Shall I make all the files of same size?
Also mention any method that you would recommend for normalizing the total reads for all the samples?
for e.g I found the file having 10.3 Mbp total yield is giving 542 clades and file having 1.2 Mbp is giving 402 clades. Is it there any threshold after which the total yield would not matter?

Thanks in Advance
Saraswati

fbeghini · May 17, 2021, 9:09am

You could try having a look at the profiles of the rarefied 10,3 M metagenome but if you run the analysis with --unknown_estimation you can obtain the profiles normalized by the total metagenome size.

Topic		Replies	Views
Default minimum read length change MetaPhlAn	2	380	January 13, 2022
Clear guidance needed for comparing across samples with varying sequencing depth MetaPhlAn	14	1410	December 1, 2022
Reverse "normalize by number of reads in the sample" ShortBRED	0	386	March 19, 2021
Inquiry about the minimum number of input reads MetaPhlAn	2	235	December 27, 2022
MetaPhlAn3 results not giving expected profiles MetaPhlAn	1	254	May 27, 2022

Normalising the input reads of the samples

Related topics