Hclust2.py error distance matrix finite values

Hello I have the same problem since long time…
see: hclust2-1.0.0 :: test failure · Issue #5 · SegataLab/hclust2 · GitHub(among others)

after digging in the data I notidced that HClustering.s_dm may contains some NAN values.
I did not digg the code in order where theses NAN values comes from.

a workaround that I would like you to investigate // validate is to filter the NAN values in the init method HClustering class from

def __init__( self, s_dm, f_dm, args = None ):
        self.s_dm=s_dm
        self.f_dm=f_dm
        self.args = args
        self.sclusters = None
        self.fclusters = None
        self.sdendrogram = None
        self.fdendrogram = None

to

def __init__( self, s_dm, f_dm, args = None ):
        self.s_dm=[0.0 if np.isnan(x) else x for x in s_dm]
        self.f_dm=[0.0 if np.isnan(x) else x for x in f_dm]
        self.args = args
        self.sclusters = None
        self.fclusters = None
        self.sdendrogram = None
        self.fdendrogram = None

as I’m not abble to evaluate the changes it may introduce in the results pertinenece I would like you to review the potential impact of this change.

NB after patcheing hclust2.py with the above changes I am abble to generate the heatmap from HMP.species.txt example file.

best regards

Eric

PS matplotlib.cbook.iterable is deprected since matplotlib 3.1 should use np.iterable instead.

regards

Eric

1 Like