Rarecurve on long read from Metaphlan 4.2.2

GNas · October 24, 2025, 1:42pm

Hello,

I wanted to perform rare curves on the MetaPhlAn output to see if I am at a sequencing depth in which I am capturing a majority of my taxa. I was going to use the summary table but I noticed it is number of bases in each clade not number of reads. How is MetaPhlAn coming to this number? How does this effect me making a rarecurve? Thank you!

lindacova · October 27, 2025, 2:36pm

Hi @GNas !

when running MetaPhlAn with the option for long-read sequencing the clade coverage is computed by counting the number of bases mapping on the markers instead of the reads, because long reads have very variable length and counting the number of reads would not take that into consideration. Consequently, all computations in the MetaPhlAn workflow involving the number of reads are done with number of bases for long reads (e.g. subsampling is performed by number of bases). This should not affect your rarefaction curves as long as you are aware that the unit is number of bases and not reads.

Hope this is helpful!

Linda

GNas · October 27, 2025, 3:30pm

Thank you for the explanation @lindacova !

To clarify then, when I am using the summary table, what I am doing is “sampling” using the probability of each clade, so its as simple as a total number of bases / bases for that clade calculation to simulate sampling at various depths?

lindacova · October 27, 2025, 4:10pm

Hi @GNas ,

If you want to simulate various sequencing depths, I would suggest to run MetaPhlAn on different subsamplings of your sample.

Regarding the use of the summary table, if you are referring to the MetaPhlAn output obtained with the --rel_ab_w_read_stats option, that file provides an estimation of the number of bases covering each clade. This value is an estimate calculated by multiplying the clade coverage by the clade-specific average genome length.

Topic		Replies	Views
To get the absolute counts of each taxa MetaPhlAn	3	2853	May 18, 2021
Can we add a column with the number of reads for each taxon next to relative_abundance? MetaPhlAn	7	574	July 25, 2025
Metaphlan3 relative abundance MetaPhlAn	14	7405	June 9, 2025
Average Read Coverage per Species Output MetaPhlAn	0	380	August 27, 2021
Clear guidance needed for comparing across samples with varying sequencing depth MetaPhlAn	14	1592	December 1, 2022

Rarecurve on long read from Metaphlan 4.2.2

Related topics