Coverage Data from MetaPhlAn 4.1

MetaPhlAn version 4.1.0 (23 Aug 2023)

Hi BioBakery team,

I’ve been reading through the MetaPhlAn 4 documentation and related forum threads, and I’m trying to better understand how marker-level coverage is estimated and whether those per-marker coverage values are available in the output files.

From the publication, it’s stated:

Using the quality-controlled mapping results, MetaPhlAn estimates the coverage of each marker and computes the clade’s coverage as the robust average of the coverage across the markers of the same clade, but excluding the top and bottom quantiles of the marker abundances.

To better understand and potentially make use of this information, I’m wondering whether it’s possible to access the raw coverage values for each marker..

After looking at the -t options, it seems marker_counts and marker_ab_table are somewhat relevant, but I’m not entirely sure which is most appropriate or safe to use if I want to:

  1. Inspect or export the raw coverage for each marker per sample,
  2. Understand which markers were included/excluded during the robust averaging process.

Would -t marker_counts or -t marker_ab_table be the best way to get this information?

Additionally, I’d like to ask:
Is it meaningful to use samtools coverage or samtools depth on the .sam file generated by MetaPhlAn to gain further insight into read mapping rate, per-marker coverage, or sequencing depth?

Any insights or suggestions would be greatly appreciated. Thanks in advance!

Hi,

When running MetaPhlAn 4.1 with:

metaphlan read1.fq.gz,read2.fq.gz --input_type fastq -t rel_ab_w_read_stats \
–index mpa_vJun23_CHOCOPhlAnSGB_202307 --bowtie2db /metaphlan_analysis/bt2db/ \
–bowtie2out output.bowtie2.bz2 -s output.sam -o profile_abs.txt

I notice there is a coverage column in profile_abs.txt, however when I compare the value I realized it is much lower than coverage estimated from the SAM file (via samtools coverage).

For example, the range of values for coverage in the SAM are 36 to 50; but the range of values for coverage in the profile_abs.txt are range from 0.003 to 0.02

Is this lower value expected compared to raw SAM coverage?

Thanks!