Pangenome generation

I cannot find a guide to pangenome generation for PanPhlAn 3. I’m especially interested in using Roary. Is there a resource available?

Thanks,

Andrew

Hello,

indeed this functionality (custom pangenome) was available in PanPhlAn 1.3 using USEARCH clustering. However it now presents some disadvantages since it rely on an rather old version of USEARCH and on top of that, cluster sequences in label-free clusters. By that, I mean that clusters will be named “Gene family 1” or “gene family 2” while using the pangenome database gives you UniRef90 ID and a mapping file of these IDs to other databases such as GO, KEGG, Pfam, eggNOG…

We aim as a medium term project to add some functionalities for adding some custom user-provided genomes to the PanPhlAn-provided pangenome’s database by clustering each coding sequence with the known ones and assign new ones if needed.

OK, thanks. I look forward to more details!

Best wishes,

Andrew

Hello Leonard,

Any update on this? I’ll definitely be seeking to use custom pangenomes, and it would be ideal to be able to apply the up-to-date panphlan.

Many thanks,

Andrew

Hello Andrew,

we’ve been working on this part of the code but re-organized things a bit because the software covers more than PanPhlAn pangenome generation. Atm the code is ready (but the repository still private). We still need a bit of time to write a minimal documentation. Thanks for reminding us that there is the demand for this outside of our lab :grin: We’ll try to make things available next week. I’ll keep you updated.

Hello,

the repo is ready : https://github.com/SegataLab/PanPhlAn_pangenome_exporter
This is an independent tool that requires more dependencies than PanPhlAn and can be quite long to run is you have a lot of genomes, however it’s worth it :grin: the results are really nice.

With all dependencies installed correctly and the databases downloaded (rather big, be patient in the download), this should do the work.

Let me know if you encounter troubles while running it, or if you have questions on the overall workflow.

Best wishes and have fun with the tool

That’s great, thanks for letting us know. I look forward to giving it a go!

Best wishes,

Andrew

Hi, I tried the GitHub - SegataLab/PanPhlAn_pangenome_exporter.
However, the download_databases.py failed.
And I tried to directly download the link from it and it returned 404 error. (Attached figures)


Without the database files, how can I generate database by myself? I found in Humann3 there are similar diamond indexes, but UniRef90to50_201906.tsv.bz2 was not found anywhere.

Best,

Yuxiang

Hi, should be fixed. Let me know if something’s still wrong

Hello Leonard,
thank you for this script. I am trying to use it following the instruction of the github page but I’m getting the following error message

Wed Jan 17 17:19:12 2024 Writing PanPhlAn tsv...Traceback (most recent call last):
  File "PanPhlAn_pangenome_exporter/panphlan_exporter.py", line 521, in <module>
    panphlan_exporter(args.input, args.tmp, args.output, args.clade_name, args.nprocs, args.db_path)
  File "PanPhlAn_pangenome_exporter/panphlan_exporter.py", line 502, in panphlan_exporter
    write_panphlan_tsv(inputdir, tmp_dir, ppa_outdir, clade_name, contigs_names_dict, contigs_names_dict_prokka, extend
_pangenome)
TypeError: __init__() got an unexpected keyword argument 'strand'

How can I fix that? thank you