To get the absolute counts of each taxa

molly · May 17, 2021, 8:42pm

I have read your paper on bioBakery3 and I am using metephlan3. Thanks for making it available and it is a really good tool. I want to do alpha diversity and DESeq2 on the taxa and need to use absolute reads counts. So I used the code as with additional flag of “-t rel_ab_w_read_stats” :

metaphlan r1.fastq.gz, r2.fastq.gz -t rel_ab_w_read_stats -o s1.tsv --input_type fastq —bowtie2out s1.bowtie2.bz2 --nproc 4

In the output file, there is

My questions are:

Are “estimated_number_of_reads_from_the_clade” the absolute reads counts for specific taxa?
There is also “coverage” column? What does it really mean? Should I filter the output table with this coverage number to get a good taxa table? If so, which number would be good to use?

Best

franzosa · May 17, 2021, 9:13pm

The read count is an estimate of the number of reads contributed by a given clade. It is computed by extending the “reads per kilobase” estimate from a species’ marker genes over the species’ average genome length, and then summing species counts to higher-level clades. You could use these counts for count-based models (like DESeq2) but I would not use them directly in alpha diversity estimates (they are not equivalent to organism/cell counts).
“Coverage” is a measure of sequencing depth per unit length of a genome (or summed over genomes within a clade). Coverage is proportional to cell count (if A has twice B’s coverage, we infer that twice as many A cells were present vs. B cells). Relative abundance is sum-normalized species coverage (which can then be summed to higher taxonomic levels). We typically filter on relative abundance and prevalence, e.g. keeping taxa that exceed 0.1% abundance in 10% of samples and collapsing everything else into a lower-confidence “other” bin.

molly · May 17, 2021, 9:32pm

thanks for your reply…

If i want to get the counts table for each clade, is there any other way with metaphlan3 besides of adding the flag of “-t rel_ab_w_read_stats”?
i am checking on the results which has relative abundance. i saw numbers which is bigger than 1.

Screen Shot 2021-05-17 at 5.04.10 PM1652×332 36.3 KB

I thought this is already in relative abundance and should be between 0 and 1. Are these number actually already multiplied by 100?
thanks

franzosa · May 18, 2021, 5:55pm

I believe that’s the best way to get the estimated read counts. MetaPhlAn is more focused on relative abundance estimation (hence focusing on those numbers in the primary output).
I believe the numbers add to 100% rather than 1.0 in the default output.

Topic		Replies	Views
Metaphlan Read count output instead of relative abundances MetaPhlAn	11	5939	August 11, 2020
To find the best fit distribution for the absolute abundance of species present in my dataset MetaPhlAn	1	364	May 27, 2022
Can we add a column with the number of reads for each taxon next to relative_abundance? MetaPhlAn	6	426	September 26, 2024
Metaphlan output question MetaPhlAn	4	907	August 25, 2022
Metaphlan3 relative abundance MetaPhlAn	13	6924	July 31, 2023

To get the absolute counts of each taxa

Related topics