The bioBakery help forum

Normalising the input reads of the samples

Hi Biobakery_forum
I have analyzed my shotgun metagenomic datasets(.fastq.gz) files of varying sizes with MetaPhlAn3. So, I have got different number of clades for different files. Now I my doubt is that are the read files normalized before analysis or I have to do it before analyzing? Shall I make all the files of same size?
Also mention any method that you would recommend for normalizing the total reads for all the samples?
for e.g I found the file having 10.3 Mbp total yield is giving 542 clades and file having 1.2 Mbp is giving 402 clades. Is it there any threshold after which the total yield would not matter?

Thanks in Advance

You could try having a look at the profiles of the rarefied 10,3 M metagenome but if you run the analysis with --unknown_estimation you can obtain the profiles normalized by the total metagenome size.