I would like to use PanPhlAn and followed the tutorial. I am having some trouble with recreating the Heatmap from the tutorial. I realize this might be outside the scope of PanPhlAn itself, but could you provide guidance on how this clustered heatmap was constructed? I tried to recreate it with hclust2, but was unsuccesfull.
sorry for the late answer, I was away from work the past week.
indeed the visualization is not included in PanPhlAn yet, also because it can be done in many various ways. The Heatmap from the tutorial has been made using the package heatmap.2 function from gplots package in R. In fact this one was made quite a while ago, and is not the best one in the end.
To visualize your PanPhlAn profiles (and maybe the link to metadata if you have some) I advise your to use :
Heatmap from the seaborn package in Python. Youâll find some tutorial here. The advantage here is that if you have a species with a big pangenome and/or a lot of sample (several hundreds and more), it will be quite efficient to use Python.
Heatmap from the ComplexHeatmap package in R. Extensive documentation can be found here. Itâs a very complete and powerful package, and also very flexible and efficient if you want to visualize your pangenome matrix alongside some metadata.
In general I would say that Python is more suited for big matrices and R is more easy-to-use for complex visualization. So choose depending on the size of your matrix, and your personal preferences between R and Python.
If your matrix is âhardâ to visualize, what I usually do (that was not necessary for the tutorial example, but could be useful with some datasets), is to prune the matrix. Basically I remove all the gene of the pangenome present in more than 99% of the sample and/or less than 1%. Thus, I focus on the variable part of the pangenome, and I avoid a big monocolour chunks in the final figure.
I am using ComplexHeatmap package in R. My matrix from Panphlan created a heatmap without solid colors. I used yellow for 0 and brown for 1, but the figure showed the presence of genes (â1â) in a gradient brown color.
Could you suggest how to should fix it?
I think that the 0 and 1 being handled as a continuous scale instead of a discrete one comes from the type of your matrix object in R (matrix_ecoli in your code). Try transforming it as a matrix of characters â0â and â1â instead of 0 and 1, or transform it as factors.