List of viruses included when --add_viruses is used

I am analysing paired-end sequencing data from human microbiome samples and am interested in viruses as well as bacteria. We know that various HPV (human papillomavirus) types are abundant in the samples.

In the process of designing our pipeline we have been trying out MetaPhlAn along with BLAST and Kraken2. When comparing results for a particular sample it seems that MetaPhlAn doesn’t include all HPV types picked up by other methods, omitting some major ones. I imagine this could be handled by use of a custom database. However, I’d be interested to see what HPV types are included in the default database.

I have been searching ‘mpa_v30_CHOCOPhlAn_201901_marker_info.txt’ that you had linked in another post but it doesn’t have very many clearly labeled HPV, so I’m wondering whether there is another list, a list of viruses?

Also, I wonder if it’s possible to dig deeper into the taxonomy? The metaphlan output stops at s__Alphapapillomavirus_5, which I think is the S level, but the actual HPV type designation happens at S1 level and is therefore omitted in the output. I apologise if this is described somewhere in the documentation and I missed it.

Thanks for your help!

Hi @emmaivansson
Unfortunately, the marker genes database used by MetaPhlAn is designed at the S level, this is why there is no S1 level information on the table. This means, while profiling, it does not distinguish whether the virus you are profiling is from one or other S1, it will just give you the info up to the S level