Spurious and indirection correlations pairwise halla?

marc0314 · March 11, 2021, 7:48am

Dear all

I have performed multiple pairwise halla runs between my taxonomic (e.g. kraken2 data), functional (humann3 data), metabolite data, and an environmental measurement dataset. I managed to get a large number of significant correlations after residualizing my datasets.

However, I can’t help but think that some of the correlations are spurious or indirect correlations.

For example, if my hypothesis is that exposure to the said environment variable results in an association with a microbial function, and in turn that function results in the synthesis of the metabolite, then the significant association between the environmental variable and the metabolite should not be there. I notice that in my dataset, there are quite a number of correlations where the edges between the environmental variable, the function, and the metabolite form a triangle in a network representation plot.

Is there a way that, after the correlation output to identify such spurious correlations and remove them from the results? Thank you very much

Kind regards

Marcus

andrewGhazi · March 17, 2021, 2:43pm

Hi Marcus. HAllA is aimed at identifying associations between a pair of high-dimensional datasets. Looking at the network structure between more than two datasets and accounting for some known or unknown influence structure between them is unfortunately beyond HAllA’s scope.

marc0314 · March 22, 2021, 8:49am

Hi Andrew thanks for the response.

In this case, how did the authors of the manuscript below account for any potential spurious indirect correlations by their pairwise HAllA analyses?

Thanks
Marcus

Lloyd-Price et al. 2019. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases
https://www.nature.com/articles/s41586-019-1237-9

andrewGhazi · March 22, 2021, 7:30pm

I believe in that instance they are accounting for additional covariates by first regressing them out using a mixed-effects model. Then HAllA was run on the matrices of residuals instead of the matrices of raw data.

So in your example, you’d run a mixed effects model to regress out the effect of the environment variable that you think might cause confounding.

Given how dataset-specific this modelling would need to be, HAllA doesn’t include the functionality to run the model and extract the residuals. You can see more discussion about this type of process in this thread:

Topic		Replies	Views
Adjusting for covariates in HAllA HAllA	9	789	February 9, 2021
About the HAllA category HAllA	0	785	November 12, 2019
Pearson/Spearman correlation and types of data in HAllA vs Maaslin2 HAllA	0	356	August 22, 2023
Count data preprocessing for HAllA HAllA	1	703	April 8, 2020
Handling zero values in metaphlan taxonomic profiles for getting residuals HAllA	0	91	April 23, 2024

Spurious and indirection correlations pairwise halla?

Related topics