Hi bioBakery community,
I am trying to find out more about the interpretation of MetaPhlan 4.2 results using the paramter -t rel_ab_w_read_stats.
Using raw reads from pure Escherichia coli sample, I get the following profiles result :
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia_coli 2|1224|1236|91347|543|561|562 100.0 0.55438 2741585
How to interpret the coverage value of 0.55 ? What is the scaling factor of this normalized coverage value ? What is minimum average coverage to identify a marker gene ?
Thanks for your advice!
Michael
0.55438 is the coverage of the clade in reads/(nucleotides of total marker length) units. If you multiply this value by average read length it yields an estimate of the total fold-coverage of the clade. If you multiply this value by average genome length of the clade it estimates the number of reads the clade contributed (that’s the final number in the output, 2741585).
I don’t believe there is a minimum coverage required to detect a marker, but to report a clade in the output, a minimum fraction of its markers needed to recruit 1+ reads (see the stat_q parameter in the MetaPhlAn CLI).
Great . Thanks for the explanations.