Tutorial; guidance on vizualisation

Hello,

sorry for the late answer, I was away from work the past week.

indeed the visualization is not included in PanPhlAn yet, also because it can be done in many various ways. The Heatmap from the tutorial has been made using the package heatmap.2 function from gplots package in R. In fact this one was made quite a while ago, and is not the best one in the end.

To visualize your PanPhlAn profiles (and maybe the link to metadata if you have some) I advise your to use :

  • Heatmap from the seaborn package in Python. You’ll find some tutorial here. The advantage here is that if you have a species with a big pangenome and/or a lot of sample (several hundreds and more), it will be quite efficient to use Python.

  • Heatmap from the ComplexHeatmap package in R. Extensive documentation can be found here. It’s a very complete and powerful package, and also very flexible and efficient if you want to visualize your pangenome matrix alongside some metadata.

In general I would say that Python is more suited for big matrices and R is more easy-to-use for complex visualization. So choose depending on the size of your matrix, and your personal preferences between R and Python.

If your matrix is “hard” to visualize, what I usually do (that was not necessary for the tutorial example, but could be useful with some datasets), is to prune the matrix. Basically I remove all the gene of the pangenome present in more than 99% of the sample and/or less than 1%. Thus, I focus on the variable part of the pangenome, and I avoid a big monocolour chunks in the final figure.

I hope this will help you for visualization. Feel free to ask if you have other questions.
Léonard

2 Likes