Virus detection with metaphlan 4.1

Hello MetaPhlAn community

I am a bit confused about the new virus detection pipeline in metaphlan4.1. Usually, we are interested in the abundance of a certain taxon in a metagenomic sample. In the case of virus detection, I am interested in the relative abundance of certain viruses compared to the whole sample. However, the viral detection module is somewhat isolated. I get the coverage & breadth for each virus reference but it’s not integrated into the main profiling analysis. Do you have plans to integrate this so that I get a single profile per sample where also viruses are considered?

How do you suggest to continue with the version now in order to get the relative abundance of the viral taxa in the same format as the bacterial taxa? I see the following issues:

  1. The format of the virus taxonomy is different to the rest (e.g. not d__[…];k__[…];etc. We rather have something like: NC_007924_Lactobacillus_phage_KC5a

  2. The statistics is either breadth, or mean/median coverage. However, this does not easily translate to relative abundance. How would you suggest to use those numbers?

3 Likes

I just wanted to revive this post in the hope to get an answer or start a discussion about this. Has anybody a suggestions for this issue?

Hi
I have a similar problem, we are interested how abundant viruses are in comparison to other microbes in our sample. With the output from metaphlan 4.1 it is not possible to merge the two separate files.
Is there anything planned on this issue? What is the reason for reporting the breadth/depth of viruses instead of the relative abundance?

Thanks for the help

Hi @Makrez and @NinaEld

As you correctly stated, for viruses we only report depth and breadth of coverage. The reason is that in MetaPhlAn we use markers to estimate the abundance of microbes, but for phages we use the full genome. This is because for viruses it is very difficult to define viral genomic markers that work due to the high diversity and mosaicity of their genome and therefore we consider a phage to be present if reads map against more than 75% of the phagic genome, but we can’t really estimate the abundance from this information.
There will soon be an update in the code to include at least an estimation of reads assigned to the virus (e.g. RPKM) and full taxonomy for each reported virus.