Reference values for a categorical fixed effect variable

Hello,

I am trying to use the maaslin2 package to model the association between a categorical intervention variable (3 levels, character variable) and OTUs. My data is longitudinal with repeated measures so I am using a mixed model with fixed effects for the intervention and random effects for my participant ID.

I’d like to set the reference category for my intervention variable and have tried to install the latest version which has the reference option (as suggested here: Which factor level to compare with) but I am getting an unused argument error when I try to run my code. Example:

fit_data ← Maaslin2(
frmeTaxa, frmeMetadata,
“file”,
fixed_effects = c(‘intervention’), random_effects = c(‘ID’),
min_abundance = 0.0001, min_prevalence = 0.2,
reference=“int1”)

Error in Maaslin2(frmeTaxa, frmeMetadata, … :
unused argument (reference = Calcium - 2)

Any idea what my issue might be? Is there a way to confirm that I have installed the version of maaslin2 with the reference option?

(The code works nicely without the reference option added.)

Thanks!

Hi @Lyoon6 - this needs to be provided as a string of variable,reference (semi-colon delimited for multiple variables) which in your case translates to reference = intervention,int1 .

Hi Himel,

Sorry, I am still getting an unused argument error with my reference group.

My code:

fit_data <- Maaslin2(
frmeTaxa, frmeMetadata,
“file”,
fixed_effects = c(‘key’),
random_effects = c(‘ID’),
min_abundance = 0.0001,
min_prevalence = 0.2,
transform = “NONE”,
normalization = ‘TMM’,
analysis_method=‘ZINB’,
reference=key,Post-Combined
)

The variable ‘key’ is a categorical variable with character values (e.g., “Post-Combined”, “Pre-Combined”, etc.). Is this a problem with how I am specifying the name of the category? Or you do think there is something else going on? I’ve also tried putting quotes around the reference as shown under Examples here: https://bioconductor.org/packages/devel/bioc/manuals/Maaslin2/man/Maaslin2.pdf (e.g., reference=‘key,Post-Combined’)

Thanks in advance.

Hi @Lyoon6 - if your key variable has only 2 levels, this option will not work. Assuming that it has at least three levels, you might want to specify the reference as follows: reference = c('key,Post-Combined'). Let me know if that resolves the error.

1 Like

My variable has 6 levels, so it shouldn’t be an issue and unfortunately c('key,Post-Combined') also did not work.

I’d like to specify the reference variable so that I can better understand my output. As it stands, when I view the all_results file, I have trouble understanding the comparisons being made when the output looks like this. Hence my trying to specify a reference group.

metadata feature value coef stderr N N.not.0 pval qval
key OTU1 Pre-Inulin 0.16585027 0.089133456 60 60 0.068338653 0.90003414
key OTU2 Pre-Calcium -0.037791923 0.018726267 60 58 0.049543834 0.90003414
key OTU3 Post-Inulin 0.045776485 0.019071219 60 60 0.020575383 0.90003414
key OTU5 Post-Combined -0.024608982 0.01110134 60 58 0.031719596 0.90003414

Do you have any other suggestions on an alternative way to do this?

@Lyoon6 I am not sure how you are getting the all_results file if you have a fixed effect with more than 2 levels as you are expected to see something like this:

Please provide the reference for the variable 'key' which includes more than 2 levels: ...

Could you please clarify how you got the output file?

Hello, sure.
I have a .tsv file which contains metadata and OTU RA. Samples are rows. The key variable is a 6-level categorical variable with character values. The ID variable is the participant ID. This is a longitudinal study, so there are multiple samples per participant.

When I run my code, files are placed into the “maaslin” folder specified in the code. The files I see are: all_results.tsv, maaslin2, residuals.rds, significant_results.tsv, and a figures folder. The code appears to at the very least run (according to the log), though there are no significant associations. I can open the all_results file to see the table that I posted an excerpt from above.

Here is the full code.

#Read in data file
frmeData = read.csv( file = “file.tsv”,
sep = “\t”, header = T, row.names = 1 )

#Break up the data into two data frames (metadata and taxa)
frmeMetadata = data.frame(frmeData[1:16])
frmeTaxa = data.frame(t( frmeData[17:ncol( frmeData )] ))

#Run Maaslin2
fit_data <- Maaslin2(
frmeTaxa, frmeMetadata,
“C:/Users…/maaslin”,
fixed_effects = c(‘key’),
random_effects = c(‘ID’),
min_abundance = 0.0001,
min_prevalence = 0.2,
transform = “NONE”)

OK - that makes sense, which means the code is working without the reference option which is weird as it should throw an error for fixed effects with >2 levels. Does the following return TRUE?

is.factor(frmeMetadata$key)

No, it reuters FALSE. If I convert the variable to a factor (frmeData$key <- as_factor(frmeData$key)), it returns TRUE but still runs into the same unused argument issue.

Thanks, @Lyoon6. In order to debug this issue, I need to reproduce the error on my end. Would you be able to share a subset of your de-identified data?

Sure, I can do that. Should I send you an email?

Sure. Email works or if you are fine with sharing the feature and metadata files here that can work as well.

Hi @Lyoon6 - thanks for sharing the data. I have run the following which seems to be generating the correct output. Could you please verify if this chunk of code works on your end as well?

##################
# Load libraries #
##################

library(data.table)
library(tibble)
library(Maaslin2)

################
# Load dataset #
################

df<-data.table::fread('./RA_OTUs_sub.tsv')
frmeDataSub<-column_to_rownames(df, 'V1')
frmeMetadataSub = data.frame(frmeDataSub[1:3])
frmeTaxaSub = data.frame(frmeDataSub[4:ncol(frmeDataSub)])

################
# Run MaAsLin2 #
################

fit_data <- Maaslin2(input_data = frmeTaxaSub, 
                     input_metadata = frmeMetadataSub,
                     output = getwd(),
                     fixed_effects = 'key',
                     random_effects = 'ID',
                     min_abundance = 0.0001,
                     min_prevalence = 0.2,
                     transform = 'NONE',
                     reference = 'key,Post-Te')

Hi @himel.mallick-

Unfortunately, I’m still getting the same error message:

Error in Maaslin2(input_data = frmeTaxaSub, input_metadata = frmeMetadataSub, :
unused argument (reference = “key,Post-Te”)

I went and uninstalled-reinstalled Maaslin2 and still got the error with the exact code you provided:

library(data.table)
library(tibble)
library(Maaslin2)

################

Load dataset

################

df<-data.table::fread(“C:/Users/…/RA_OTUs_sub.tsv”)
frmeDataSub<-column_to_rownames(df, ‘V1’)
frmeMetadataSub = data.frame(frmeDataSub[1:3])
frmeTaxaSub = data.frame(frmeDataSub[4:nrow(frmeDataSub)])

################

Run MaAsLin2

################

fit_data <- Maaslin2(input_data = frmeTaxaSub,
input_metadata = frmeMetadataSub,
output = getwd(),
fixed_effects = ‘key’,
random_effects = ‘ID’,
min_abundance = 0.0001,
min_prevalence = 0.2,
transform = ‘NONE’,
reference = ‘key,Post-Te’)

Hi @Lyoon6 - have you tried installing directly from GitHub? Also, after installing and loading, could you please verify if you see the option reference in the function arguments when you type ?Maaslin2 in R? It looks like your function call is not recognizing the argument which can only happen with an older version.

Ah- that’s the problem! No reference option in the function arguments. It must be an older version of Maaslin2.

This is what I was using to download originally:

if(!requireNamespace(“BiocManager”, quietly = TRUE))
install.packages(“BiocManager”)
BiocManager::install(“Maaslin2”)

Are there instructions to download from GitHub?

You can simply do the following:

library(devtools)
devtools::install_github("biobakery/maaslin2")

I am glad we have identified the issue. Re-installing should resolve the issue on your end (let me know if not).

Reference argument now available & code running.

Thanks for your help working through this. Can’t believe the solution was as simple as updating the version.

1 Like

Hi, I am trying to use Maaslin2 and I have a significant results file generated which has many output results but only 10 figures in the figures folder. This is way less than the number of significant results written in the results file. Can you help?
Thank you

Best
DP

Hi @Dhrati_Patangia - the figures generally are multi-page PDFs one for each metadata combined across significant features. In addition, for categorical variables, the p-values are usually pairwise whereas the figures are across all categories - hence more rows in the results table than the corresponding PDFs.

Does it make sense?

Best,
Himel