I read the following in the BioBakery paper
“Moreover, new functionalities include a script for quick visualization of the presence/absence matrix with functionalities for clustering of gene family’s profiles across samples. An empirical p-value can be computed for each cluster based on the ratio between the sum of the genes’ lengths of one group and its total span along the contig. Thus, a significantly ‘close’ genes group can be identified and computation of empirical p-values assessing whether or not the genetic proximity of these families along the contigs could be considered significant. This eases the detection and identi- fication of mobile elements in metagenomic samples.”
However, I cannot find the script anywere?
indeed there was one more script by the time we submitted the paper. It was called
panphlan_find_gene_grp.py. You can actually find it in the 3.0.1 release of PanPhlAn here.
We actually removed it in the latest release for it was still very experimental and changed a lot. We are currently working on improving and expending these functionalities in a new release and the code still need some work.
This script (the old version in PanPhlAn 3.0.1) will perform clustering of genes families based on co-presence/absence.
It can also provide a basic Heatmap visualization of the PanPhlAn profile. If you are more interested in the visualization part, check also Tutorial; guidance on vizualisation - #2 by leonard.dubois
Let me know if you have any question regarding this script.