Hello @LEEzhu0110 ,
I believe this is a problem with an older MetaPhlAn database. Some viral “VDB” markers were duplicated producing duplicate entries in the SAM header which subsequently failed in sample2markers. I think this was a problem with “mpa_vOct22_CHOCOPhlAnSGB_202212”, which was then fixed in “mpa_vOct22_CHOCOPhlAnSGB_202403”. I see you’re using the newer one in sample2markers but maybe you ran MetaPhlAn with the older 2022 one? You can check by looking at the first lines of the “*_profile.tsv” file.
The most correct solution would be to re-profile your samples with newer metaphlan DB, I would suggest using the newest Jan25. If you want to stick to Oct22, you can use the 2024 fixed version.
The simplest but “hacky” solution is to filter the SAM file to remove the VDB entries, as you pointed out in the sample2markers code, they are not used anyway. Something like the following:
bzcat /your/sample.sam.bz2 | grep -v "VDB|" | bzip2 -zc > /your/sample__no_VDB.sam.bz2
and then use the filtered sam file for sample2markers.
Btw, your SAM file does not look like coming from MetaPhlAn/bowtie2 or maybe it was processed somehow?
Let me know if it helps
Michal