Hclust2.py error distance matrix finite values

Hi
I was trying to create heatmap using command:
hclust2.py -i merged_abundance_table_species.txt -o abundance_heatmap_species.png --f_dist_f braycurtis --s_dist_f braycurtis --cell_aspect_ratio 0.5 -l --flabel_size 10 --max_flabel_len 100 --max_slabel_len 100 --minv 0.1 --dpi 300 --ftop 25

But I get this Error:
Traceback (most recent call last):
File “/home/rakesh/anaconda3/bin/hclust2.py”, line 825, in
hclust2_main()
File “/home/rakesh/anaconda3/bin/hclust2.py”, line 803, in hclust2_main
cl.shcluster()
File “/home/rakesh/anaconda3/bin/hclust2.py”, line 380, in shcluster
self.shclusters = sph.linkage(self.s_dm, method=self.args.slinkage)
File “/home/rakesh/anaconda3/lib/python3.8/site-packages/scipy/cluster/hierarchy.py”, line 1057, in linkage
raise ValueError("The condensed distance matrix must contain only "
ValueError: The condensed distance matrix must contain only finite value.

Kindly explain how would I resolve this error. I saw a similar issue they said it is due the rows with zero abundance but how should I remove those values?

Thank You
Saraswati Awasthi

1 Like

Hello I have the same problem since long time…
see: hclust2-1.0.0 :: test failure · Issue #5 · SegataLab/hclust2 · GitHub(among others)

after digging in the data I notidced that HClustering.s_dm may contains some NAN values.
I did not digg the code in order where theses NAN values comes from.

a workaround that I would like you to investigate // validate is to filter the NAN values in the init method HClustering class from

def __init__( self, s_dm, f_dm, args = None ):
        self.s_dm=s_dm
        self.f_dm=f_dm
        self.args = args
        self.sclusters = None
        self.fclusters = None
        self.sdendrogram = None
        self.fdendrogram = None

to

def __init__( self, s_dm, f_dm, args = None ):
        self.s_dm=[0.0 if np.isnan(x) else x for x in s_dm]
        self.f_dm=[0.0 if np.isnan(x) else x for x in f_dm]
        self.args = args
        self.sclusters = None
        self.fclusters = None
        self.sdendrogram = None
        self.fdendrogram = None

as I’m not abble to evaluate the changes it may introduce in the results pertinenece I would like you to review the potential impact of this change.

NB after patcheing hclust2.py with the above changes I am abble to generate the heatmap from HMP.species.txt example file.

best regards

Eric

PS matplotlib.cbook.iterable is deprected since matplotlib 3.1 should use np.iterable instead.

regards

Eric