MetaPhlAn viral module

Hello,

I’m running MetaPhlAn v4.1.1 [Conda installed in a new environment] and its corresponding “latest” database mpa_vJun23_CHOCOPhlAnSGB_202403. I was particularly interested in profiling the viral genomes because I had a hunch from Kraken 2 analysis, with MetaPhlAn’s robust read assignment and abundance calculations, I wanted to clarify Kraken 2’s results using the VSG marker gene database.

Following is my command:

metaphlan singleton.fq.gz,R1.fq.gz,R2.fq.gz --bowtie2db $index --bowtie2out sample.bowtie2.bz2 --nproc 35 --input_type fastq -o sample_metagenome.txt --sample_id sample --profile_vsc --vsc_out sample_vsc.txt

I have the following clarification due to lack of explanation in the paper or the docs:

  1. The output table [sample_vsc.txt] has columns "breadth_of_coverage", "depth_of_coverage_mean" and "depth_of_coverage_median". What do they mean in terms of reads and the reference?

Example: for a kVSG [an Ecoli phage] with length 1595, the
breadth_of_coverage=0.9410658307210031 depth_of_coverage_mean=7122.4763491006 and depth_of_coverage_median=7798.0.

How do I interpret these numbers to the sample?

  1. I do not understand if any of the above terms are abundance of the VSG? If not, why wouldn’t MetaPhlAn provide a read count [normalised or otherwise]?

  2. Since I lack the understanding of these output columns, I do not understand why the merged table is by default done using “breadth_of_coverage”?

  3. On a biological aspect, connecting the VSG with the CRISPR-based host means these hosts had the CRISPR mechanism to be immune to the VSG because of previous exposure to the phage? Sorry if I’m rambling here, just trying to make sense of the informed data from the MetaPhlAn documents.
    Refereing to the previous example, the tables VSG-to-species and VSG-to-SGBs refer to Parasutterella_excrementihominis/kSGB9262 with the column "Label" =37 and "(kVSG only) species hits genomes of the species bin(s)"=732.
    A kind explanation would be helpful.

  4. Lastly, including the MetaPhlAn parameters “–add_viruses” and “–mpa3” gives the error: No MetaPhlAn BowTie2 database found (–index option)! Expecting location bowtie2db. Hence I could not profile the viral organisms from the SGB marker gene database. Why? Not sure what am I doing wrong? [Previously reported Metaphlan4 --mpa3 --add_viruses failed]

I’m sure Q1,2,3 might have one answer, I have asked my collective questions due to lack of understanding. Any explanation or reference to published work would be much appreciated. Thank you.

1 Like