To get the absolute counts of each taxa

franzosa · May 17, 2021, 9:13pm

The read count is an estimate of the number of reads contributed by a given clade. It is computed by extending the “reads per kilobase” estimate from a species’ marker genes over the species’ average genome length, and then summing species counts to higher-level clades. You could use these counts for count-based models (like DESeq2) but I would not use them directly in alpha diversity estimates (they are not equivalent to organism/cell counts).
“Coverage” is a measure of sequencing depth per unit length of a genome (or summed over genomes within a clade). Coverage is proportional to cell count (if A has twice B’s coverage, we infer that twice as many A cells were present vs. B cells). Relative abundance is sum-normalized species coverage (which can then be summed to higher taxonomic levels). We typically filter on relative abundance and prevalence, e.g. keeping taxa that exceed 0.1% abundance in 10% of samples and collapsing everything else into a lower-confidence “other” bin.

Topic		Replies	Views
Metaphlan Read count output instead of relative abundances MetaPhlAn	11	5838	August 11, 2020
To find the best fit distribution for the absolute abundance of species present in my dataset MetaPhlAn	1	360	May 27, 2022
Can we add a column with the number of reads for each taxon next to relative_abundance? MetaPhlAn	6	387	September 26, 2024
Metaphlan output question MetaPhlAn	4	894	August 25, 2022
Metaphlan3 relative abundance MetaPhlAn	13	6735	July 31, 2023