PanPhlAn Identifier to Uniprot

Hi, I’m looking at some previously published data which reports genes identified from PanPhlAn with what they call a PanPhlAn identifier in the form “xg000002”. However, I cannot seem to find anywhere in the documentation that refers to identifiers in this form. When looking at the precomputed pangenomes I don’t find values like this either. So I have two questions:

  1. Does PanPhlAn actually use this ID structure anywhere?
  2. If so, can these IDs be used to unambiguously referr to a sequence, or is it impossible to recover the original sequence/annotation associated with this feature?

I fear this may be something unique to the study in question, but would be very happy to be proven wrong.

Thanks in advance.

Hi, sorry for the delay, I was out of the lab last week.

It seems that what you are dealing with are PanPhlAn 1.1 or 1.2 user-generated pangenome using UCLUST. Thus, the information of the gene family is lost and some custom unique family identifiers are assigned based on the clustering results. If you have some pangenome file provided it can solve you problem. By pangenome file, I mean a tsv-like file mapping these identifiers to reference genomes, contigs and location.

Again, I’m assuming that these data are from an older version of PanPhlAn and a custom pangenome. If you can share the link of these published data, I can check it.

Hope this answer you question. Feel free to ask if something remains unclear.


Thanks a bunch, Léonard. This is what I assumed, but wanted to be sure. The paper in question doesn’t even report the version of PanPhlAn they used nor does it publish their pangenome file, so looks like it won’t be too helpful. Thanks for offering to check up on it, but you’ve already been more than enough help!