I’m running into a similar problem as this post, but the Value column of the output is being filled in with the metadata variable instead of a number. I installed MaAsLin2 using the GitHub release as mentioned there and I’m still having the problem.
Here’s a snippet of the output table:
The variable how_born only has two options (vaginal and caesarean) and I set vaginal to be the reference. The fed variable has four options with breast_milk being the reference.
The issue also translates to the heatmap. In addition to the how_born and fed variables, it just says Site on the heatmap, but the significant results table correctly shows that as Site/Milwaukee.
First off, thanks for using our tool. I’m looking at your data and trying to figure out what might be going wrong but it’s tricky without an example of how the metadata is encoded.
In general the “value” column will show the level within the covariate that the model coefficient is associated with. So it shouldn’t ever be a number unless you have purposely encoded numbers as levels within a covariate.
That being said I’m not sure how you are getting both “how_born” and “caesarean” in the value column if there is indeed only two levels within the how_born column. As generally those two values are usually only the same if you are testing a continuous variable.
If you could provide an example of how the metadata is encoded (the format would be enough) that would help with trying to figure out what’s going on. Perhaps running str(df_meta) might give some hints into what’s happening at the very least.
The only thing potentially weird in the str command is that blank/empty cells are being counted as a factor in how_born and fed. It seems the software correctly excludes those samples that are missing metadata (the results file has N as 233, which is less than the 240 samples in the dataset), but could the inclusion of a missing/blank value as a factor level be the issue? If yes, should I not explicitly make those variables factors?
These are the commands I ran to load/format the metadata:
It’s certainly possible that leave blanks in for a factor level may be what is causing the issue (and when its blank the output then defaults to the factors name and not the name of the level). Is it possible to try and give them either explicit labels or filter them out and see if that fixes your issue.
I ended up giving the blank cells a value of “NA” and that appears to have fixed the problem! The only curious thing is that “NA” isn’t a significant value in the table, which I thought would be the case since the blank cells were apparently coming back as significant? Is this weird/wrong?
Also, is replacing blank cells with “NA” an acceptable work around with the statistics/math behind the model? The missing values aren’t the same across all variables, so I don’t necessarily want to throw them out fully.
I’m going to link you to this explanation by @himel.mallick (one of the original authors).
In essence during model fitting the models will not use any data that is missing values in the covariates. If you want to avoid this you could try imputing them in sensible manner or modeling the lack of data directly (although this could be messy depending on the data etc.).