Script for visualisation - BioBakery paper (Beghinin, 2021)

I read the following in the BioBakery paper

“Moreover, new functionalities include a script for quick visualization of the presence/absence matrix with functionalities for clustering of gene family’s profiles across samples. An empirical p-value can be computed for each cluster based on the ratio between the sum of the genes’ lengths of one group and its total span along the contig. Thus, a significantly ‘close’ genes group can be identified and computation of empirical p-values assessing whether or not the genetic proximity of these families along the contigs could be considered significant. This eases the detection and identi- fication of mobile elements in metagenomic samples.”

However, I cannot find the script anywere?
indeed there was one more script by the time we submitted the paper. It was called You can actually find it in the 3.0.1 release of PanPhlAn here.

We actually removed it in the latest release for it was still very experimental and changed a lot. We are currently working on improving and expending these functionalities in a new release and the code still need some work.

This script (the old version in PanPhlAn 3.0.1) will perform clustering of genes families based on co-presence/absence.
It can also provide a basic Heatmap visualization of the PanPhlAn profile. If you are more interested in the visualization part, check also Tutorial; guidance on vizualisation - #2 by leonard.dubois

Let me know if you have any question regarding this script.
