Strainphlan sample2markers.py ERROR

LEEzhu0110 · December 2, 2025, 3:07am

Dear developers,

I met a problem when running strainphlan(under metaphlan version=4.2.4) with command ”sample2markers.py -i vdb3.sam -f sam -o consensus_markers2 -n 1 -d /mpa_vOct22_CHOCOPhlAnSGB_202403.pkl”.It shows:[E::sam_hrecs_refs_from_targets_array] Duplicate entry “VDB|003B_0000_0_01C2|M1_c0_c0_c0” in target list
[E::sam_parse1] failed to parse header

I wonder if sample2markers.py cannot identification the information which in the SAM file that start with “VDB“.Because I reviewed the script and found that lines 58-59 showed “if marker.startwih(“VDB”): return Fasle“, and in another test I conducted, after deleting all the information strating with “VDB“ in the SAM file, the script could run normally

LEEzhu0110 · December 2, 2025, 3:09am

Thanks for your help! looking forward to your reply!

Michal_Puncochar · December 11, 2025, 1:59pm

Hello @LEEzhu0110 ,

I believe this is a problem with an older MetaPhlAn database. Some viral “VDB” markers were duplicated producing duplicate entries in the SAM header which subsequently failed in sample2markers. I think this was a problem with “mpa_vOct22_CHOCOPhlAnSGB_202212”, which was then fixed in “mpa_vOct22_CHOCOPhlAnSGB_202403”. I see you’re using the newer one in sample2markers but maybe you ran MetaPhlAn with the older 2022 one? You can check by looking at the first lines of the “*_profile.tsv” file.

The most correct solution would be to re-profile your samples with newer metaphlan DB, I would suggest using the newest Jan25. If you want to stick to Oct22, you can use the 2024 fixed version.

The simplest but “hacky” solution is to filter the SAM file to remove the VDB entries, as you pointed out in the sample2markers code, they are not used anyway. Something like the following:

bzcat /your/sample.sam.bz2 | grep -v "VDB|" | bzip2 -zc > /your/sample__no_VDB.sam.bz2

and then use the filtered sam file for sample2markers.

Btw, your SAM file does not look like coming from MetaPhlAn/bowtie2 or maybe it was processed somehow?

Let me know if it helps

Michal

LEEzhu0110 · December 18, 2025, 1:06am

Hello! @Michal_Puncochar,

Thank you for your help! I did indeed remove the “VDB” data from the sam file in the subsequent calculations.Afterwards i follow the instructions shown in: strainphlan4 · biobakery/biobakery Wiki.Because it seems that the version is more consistent with the one I used for Metaphlan.

Once again, I would like to express my gratitude to you.

Best wishes!

LEEzhu0110 · December 18, 2025, 1:20am

Hello!@Michal_Puncochar

I’m currently facing a new problem and I’m not sure if I can take up some more of your time.

When I was operating according to the guide as shown in strainphlan4 · biobakery/biobakery Wiki.When it reaches “Step 4: Generate trees from alignments”,I used the following command:<strainphlan -s consensus_markers5/* -m clade_markers/t__SGB6173.fna -r reference_genomes/*.fna.bz2 -o output3 -c t__SGB6173 --phylophlan_mode fast --nproc 4>.At the very beginning, I didn’t add any filtering parameters. Although the info file in the tree file indicates that the last retained samples were 22 in number, the final tree still contains less than 10 samples.After I adjusted the threshold for filtering, it still didn’t show any significant improvement.

I would like to ask, is this situation here because my data is like this, or do I need to modify more parameters?

I’m very sorry to bother you again.Looking forward to your reply.

Wish you all the best!

LEEzhu0110 · December 18, 2025, 1:26am

Hello!@Michal_Puncochar

I’m very sorry that I didn’t reply in time before.Because my account was just lifted from the mute status today.Thank you very much for your explanations and clarifications.

Best wishes!

Topic		Replies	Views
Strainphlan sample2markers.py AttributeError: 'NoneType' object has no attribute 'reference_free_consensus' StrainPhlAn	3	480	August 5, 2021
Sample2markers.py faced Error StrainPhlAn	1	123	October 24, 2024
Samfile error when applying strainphlan to output .bz2 files of metaphlan StrainPhlAn	2	274	July 3, 2024
Encountering an error in sample2markers.py -i SRS013951.sam.bz2 -o pkl_files/ StrainPhlAn	4	346	November 11, 2023
StrainPhlAn Tutorial Error StrainPhlAn	0	73	August 25, 2024

Strainphlan sample2markers.py ERROR

Related topics