Results have variable name as value

Samantha · July 25, 2024, 4:36pm

Hi MaAsLin2 team,

I’m running into a similar problem as this post, but the Value column of the output is being filled in with the metadata variable instead of a number. I installed MaAsLin2 using the GitHub release as mentioned there and I’m still having the problem.

Here’s a snippet of the output table:

The variable how_born only has two options (vaginal and caesarean) and I set vaginal to be the reference. The fed variable has four options with breast_milk being the reference.

The issue also translates to the heatmap. In addition to the how_born and fed variables, it just says Site on the heatmap, but the significant results table correctly shows that as Site/Milwaukee.

Any information to fix this would be great!
Thank you!
~Samantha

Here are my version/code information

packageVersion(“Maaslin2”)
[1] ‘1.18.0’

fit_data <- Maaslin2(
  input_data = df_counts,
  input_metadata = df_meta,
  min_abundance = 10,
  min_prevalence = 0.1,
  normalization = "TSS",
  output = "out_covars_ST_minAbund10_meanPrev0.1",
  fixed_effects = c("Site","Age_Years", "Sex","how_born",
                    "fed","adi_2013_natl_rank","Subject_Type"),
  random_effects = "Family",
  reference = "Subject_Type,Unrelated_Control;how_born,vaginal;fed,breast_milk"
)

nearinj · July 29, 2024, 1:34am

Hi @Samantha,

First off, thanks for using our tool. I’m looking at your data and trying to figure out what might be going wrong but it’s tricky without an example of how the metadata is encoded.

In general the “value” column will show the level within the covariate that the model coefficient is associated with. So it shouldn’t ever be a number unless you have purposely encoded numbers as levels within a covariate.

That being said I’m not sure how you are getting both “how_born” and “caesarean” in the value column if there is indeed only two levels within the how_born column. As generally those two values are usually only the same if you are testing a continuous variable.

If you could provide an example of how the metadata is encoded (the format would be enough) that would help with trying to figure out what’s going on. Perhaps running str(df_meta) might give some hints into what’s happening at the very least.

Thanks,
Jacob

Samantha · August 2, 2024, 3:05pm

Hi @nearinj!

Thanks for your response and sorry for my delay. Here is a snippet of the mapping file and the output of str(df_meta).

Metadata file:

Sample_Name	PatientID	Timepoint_Type	Site	Subject_Type	Subject_Type_2	Age_Years	Sex	adi_2013_natl_rank	Family	how_born	weeks33	fed
580-108-BL	580-108	Baseline	Milwaukee	Unrelated_Control	Control	4.42	Male	89	KK	caesarean	After
580-118-BL	580-118	Baseline	Milwaukee	case	case	12.25	Female	33	PP	vaginal	After	both
580-119-BL	580-119	Baseline	Milwaukee	case	case	15	Female	33	PP	vaginal	After	both
580-11-BL	580-11	Baseline	Milwaukee	case	case	4.67	Female	95	D	vaginal	Before	breast_milk

Output of str(df_meta):

str(df_meta)
'data.frame':	240 obs. of  15 variables:
 $ SampleID.1        : chr  "580-108-BL_S20" "580-118-BL_S155" "580-119-BL_S75" "580-11-BL_S1" ...
 $ Sample_Name       : chr  "580-108-BL" "580-118-BL" "580-119-BL" "580-11-BL" ...
 $ PatientID         : chr  "580-108" "580-118" "580-119" "580-11" ...
 $ Timepoint_Type    : chr  "Baseline" "Baseline" "Baseline" "Baseline" ...
 $ Site              : chr  "Milwaukee" "Milwaukee" "Milwaukee" "Milwaukee" ...
 $ Subject_Type      : Factor w/ 3 levels "Unrelated_Control",..: 1 3 3 3 3 3 3 2 3 2 ...
 $ Subject_Type_2    : chr  "Control" "case" "case" "case" ...
 $ Age_Years         : num  4.42 12.25 15 4.67 3.25 ...
 $ Sex               : Factor w/ 2 levels "Female","Male": 2 1 1 1 1 2 1 1 2 1 ...
 $ adi_2013_natl_rank: int  89 33 33 95 94 94 60 66 34 54 ...
 $ Family            : Factor w/ 149 levels "A","AAAAA","AAAAAAA",..: 66 96 96 20 103 103 109 75 27 70 ...
 $ how_born          : Factor w/ 3 levels "","caesarean",..: 2 3 3 3 2 2 3 3 3 2 ...
 $ weeks33           : chr  "After" "After" "After" "Before" ...
 $ fed               : Factor w/ 5 levels "","both","breast_milk",..: 1 2 2 3 2 2 4 4 3 4 ...

The only thing potentially weird in the str command is that blank/empty cells are being counted as a factor in how_born and fed. It seems the software correctly excludes those samples that are missing metadata (the results file has N as 233, which is less than the 240 samples in the dataset), but could the inclusion of a missing/blank value as a factor level be the issue? If yes, should I not explicitly make those variables factors?

These are the commands I ran to load/format the metadata:

df_meta <- read.csv("mapping_file_maaslin.txt", header = TRUE, sep = "\t", row.names = 1,
                    stringsAsFactors = FALSE)
rownames(df_meta) <- gsub("-",".", rownames(df_meta), fixed = TRUE)
df_meta[1:5,1:5]

df_meta$Subject_Type <- factor(df_meta$Subject_Type, 
                            levels = c("Unrelated_Control", "Related_Control", "case"))
df_meta$Sex <- as.factor(df_meta$Sex)
df_meta$how_born <- as.factor(df_meta$how_born)
df_meta$fed <- as.factor(df_meta$fed)
df_meta$Family <- as.factor(df_meta$Family)

Thanks!
~Samantha

nearinj · August 2, 2024, 5:37pm

Hello,

It’s certainly possible that leave blanks in for a factor level may be what is causing the issue (and when its blank the output then defaults to the factors name and not the name of the level). Is it possible to try and give them either explicit labels or filter them out and see if that fixes your issue.

Jacob

Samantha · August 7, 2024, 8:09pm

Hi Jacob,

I ended up giving the blank cells a value of “NA” and that appears to have fixed the problem! The only curious thing is that “NA” isn’t a significant value in the table, which I thought would be the case since the blank cells were apparently coming back as significant? Is this weird/wrong?

Also, is replacing blank cells with “NA” an acceptable work around with the statistics/math behind the model? The missing values aren’t the same across all variables, so I don’t necessarily want to throw them out fully.

Thanks!
Samantha

nearinj · August 7, 2024, 9:18pm

Hi Samantha,

I’m going to link you to this explanation by @himel.mallick (one of the original authors).

In essence during model fitting the models will not use any data that is missing values in the covariates. If you want to avoid this you could try imputing them in sensible manner or modeling the lack of data directly (although this could be messy depending on the data etc.).

thanks,
Jacob

Samantha · August 12, 2024, 3:22pm

Hi Jacob,

Thank you so much for the link, that was very helpful!

I think everything is running properly now, so thank you again for the troubleshooting help as well

Best,
Samantha

Topic		Replies	Views
Incorrect name/value in output MaAsLin	5	467	January 25, 2023
Maaslin2 (v1.12.0) issue with metadata catagories MaAsLin	2	32	November 28, 2024
Confusion over factors of the "value" column in significant_results.tsv MaAsLin	2	266	December 15, 2022
Questions about the parameter unit MaAsLin	1	33	December 18, 2024
MaasLin2: Heatmap Vs Significance MaAsLin	8	3541	January 5, 2022

Results have variable name as value

Related topics