Hi @leonard.dubois ,
Some pangenomes have duplicated fasta entries.
bowtie2-inspect Cutibacterium_acnes | seqkit rmdup > /dev/null
[INFO] 58 duplicated records removed
This is a problem for the bam to sam conversion:
[E::sam_hrecs_update_hashes] Duplicate entry "GL383855.1" in sam header
samtools view: failed to add PG line to the header
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from "-"
Could you fix this?
We are currently working on a new version of the database that will fix the issue along with expanding the pangenomes.
In the mean time you can use the
panphlan_clean_pangenome.py script from the GitHub repo. That should do the work
I wasn’t aware of this script so I performed the cleanup with my own.
I guess you already know it but the metaphlan4 vOct22 database has the same issue.
Btw, could you upload a new panphlan release on bioconda. The current version is quite old and buggy.