To get the absolute counts of each taxa

  • The read count is an estimate of the number of reads contributed by a given clade. It is computed by extending the “reads per kilobase” estimate from a species’ marker genes over the species’ average genome length, and then summing species counts to higher-level clades. You could use these counts for count-based models (like DESeq2) but I would not use them directly in alpha diversity estimates (they are not equivalent to organism/cell counts).

  • “Coverage” is a measure of sequencing depth per unit length of a genome (or summed over genomes within a clade). Coverage is proportional to cell count (if A has twice B’s coverage, we infer that twice as many A cells were present vs. B cells). Relative abundance is sum-normalized species coverage (which can then be summed to higher taxonomic levels). We typically filter on relative abundance and prevalence, e.g. keeping taxa that exceed 0.1% abundance in 10% of samples and collapsing everything else into a lower-confidence “other” bin.