Panphlan_pangenome_generation.py

Hi and thanks for developing this great tool!
I used it with the older pangenome versions that you generated (v16) and its great!
But now i want to generate new pangenome (most updated refseq) and i am using PanPhlAn v1.2.2 with pangenome generation script and i have some issues:

The first i found and solved is changing the sort_cmd (line 945) to --fastaout from --output (otherwise you get an error for the version v11.0.667_i86linux32 of usearch7)

The second is gene2family[seq.id] (line 863) which cause a:
KeyError: ‘GCF_000157935.1_ASM15793v1_genomic:NZ_GG703855.1:262267-259368’

the data im woking on belongs to Prevotella_copri
the fna: GCF_000157935.1_ASM15793v1_genomic.fna.gz
the gff: GCF_000157935.1_ASM15793v1_genomic.gff.gz

i chose only 2 for the example set.

the cmd i am using is:
panphlan_pangenome_generation.py -c 165179 --i_fna fna/ --i_gff gff/ -o database/ --verbose

I will appreciate any help,
Many Thanks,

Hello,

thanks for using PanPhlAn and sorry for the late reply.
Indeed, the only pangenome database available is the one from 2016 and it can be quite old. We are currently working on generating a brand new one, including the most recent reference genomes. This will be included in PanphlAn v3.0 that will be released in the next weeks.

At the moment, one can use the script panphlan_pangenome_generation.py for custom pangenome generation. However, this script was design to work with Usearch v7 (rather old to be honest, the current version is Usearch V11) . I guess the issues that you are experiencing come from the difference in the version. Can you try using the older version and tell me what comes out ?

Many thanks for your patience, and sorry again for the delay