Rarefy metagenome sequence data before MetaPhlAn3 analysis

DEEPCHANDA7 · July 9, 2020, 1:19pm

Hi community!!! @fbeghini,
Should I rarefy WGS metagenome sequence data before MetaPhlAn3 analysis?
Is this step super necessary? If yes, can I do this step on bowtie2out.txt or profile.txt files?
Thanks, DC7

fbeghini · July 10, 2020, 8:30am

Usually, no, but it depends on what you have to do. If you have to perform alpha diversity analyses and the metagenomes sizes are quite different, you should go for it

DEEPCHANDA7 · July 10, 2020, 8:57am

Thanks @fbeghini

In that case, in which step should I do this rarefy step? Can I do it on bowtie2out.txt file?

And,

I really don’t understand what do you mean by “quite different”? I have samples ranges between 600MB to 2.2 GB. Should I do?

Thanks,
DC7

paolinomanghi · July 10, 2020, 10:23am

Hi @DEEPCHANDA7 , a common rarefaction procedure is to rarefy each sample to the 10th percentile of the dataset, meaning that 10% of the sample will be below the threshold and will be excluded.

You can subset to the 10% percentile either the raw-reads or the bowtie alignment (and passing the rarefied bowties as input to metaphlan). I can’t think of any procedure capable of rarefying directly the taxonomic profiles.

Since doing rarefaction you will not loose 10% of the samples (in case you apply this type of rarefaction) but you will also loose great part of the sample diversity, I would personally use the rarefacted profiles just for alpha-diversity. Since 2.2 GB is a normal size and 600 MB is a medium-small size for a metagenome, in your case I would personally go with rarefaction, but just for the alpha-diversity computation.

Thanks,
Paolo

DEEPCHANDA7 · July 10, 2020, 2:54pm

Hi Paolo, many thanks for your reply.
Can you please tell me which software I can use for rarefaction of the bowtie2out.txt file? Because I already have run MetaPhlAn with a large dataset. So, I don’t want to repeat, actually. Also, as you said regarding the 10%tile, it is not clear to me, do you have any article about this?

Thanks and regards,
DC7

paolinomanghi · July 10, 2020, 3:25pm

Hi @DEEPCHANDA7,
I don’t know any specific software. I would point you to a repository with a custom python script, but I feel confident only in the version rarefying the raw-reads at the moment.
For what concers the procedure, it’s easy: the lower 10 percentile of the distribution of the number of reads is the one you want to decrease the amount of each sample to. For each sample which a higher or equal number of reads you compute the corresponding proportion of the sample reads number which makes it equal to the percentile, which means percentile / sample N. reads. Than you iterate over the reads and generate for each one a random number [0-1] and subset the read if the random number is minor or equal the retain probability. If you do this with the bowtie alignments, it’s the same, but you have to subset according the reads retain-probabilty the lines in the bowties instead of the reads themselves.

Welcome!

plicht · January 25, 2021, 5:14pm

Hi @paolinomanghi, @fbeghini, @DEEPCHANDA7

I would like to come back to this discussion. I have mostly samples between 2GB and 3,8GB with some samples having only 100-300MB. Therefore I would like to subsample to calculate reliable alpha-diversity metrics. Do you know a tool to use for in random subsampling on fastq-Data? Or should I use a tool like phyloseq, which subsamples after the taxonomic classification?

Best,
Philipp

Scott · September 26, 2022, 5:22pm

I am curious about rarefaction too. I would like to generate rarefaction curves for my samples to determine if we are sequencing deep enough to capture the diversity of each sample. Is there a way to extract the taxon count data from MetaPhlAn3 prior to the relative abundance normalization?

Topic		Replies	Views
Clear guidance needed for comparing across samples with varying sequencing depth MetaPhlAn	14	1346	December 1, 2022
Metaphlan Read count output instead of relative abundances MetaPhlAn	11	5866	August 11, 2020
MetaPhlan output MetaPhlAn	6	1223	December 10, 2020
Rarefying raw file for taxonomic classification MetaPhlAn3 MetaPhlAn	2	503	July 3, 2021
Calculating Alpha and Beta Diversity with Metaphlan3 Rel Abund MetaPhlAn	1	2133	May 27, 2022

Rarefy metagenome sequence data before MetaPhlAn3 analysis

Related topics