Reference values for a categorical fixed effect variable

Lyoon6 · October 7, 2020, 3:28am

Hello,

I am trying to use the maaslin2 package to model the association between a categorical intervention variable (3 levels, character variable) and OTUs. My data is longitudinal with repeated measures so I am using a mixed model with fixed effects for the intervention and random effects for my participant ID.

I’d like to set the reference category for my intervention variable and have tried to install the latest version which has the reference option (as suggested here: Which factor level to compare with) but I am getting an unused argument error when I try to run my code. Example:

fit_data ← Maaslin2(
frmeTaxa, frmeMetadata,
“file”,
fixed_effects = c(‘intervention’), random_effects = c(‘ID’),
min_abundance = 0.0001, min_prevalence = 0.2,
reference=“int1”)

Error in Maaslin2(frmeTaxa, frmeMetadata, … :
unused argument (reference = Calcium - 2)

Any idea what my issue might be? Is there a way to confirm that I have installed the version of maaslin2 with the reference option?

(The code works nicely without the reference option added.)

Thanks!

himel.mallick · October 7, 2020, 1:05pm

Hi @Lyoon6 - this needs to be provided as a string of variable,reference (semi-colon delimited for multiple variables) which in your case translates to reference = intervention,int1 .

Lyoon6 · December 1, 2020, 6:57am

Hi Himel,

Sorry, I am still getting an unused argument error with my reference group.

My code:

fit_data <- Maaslin2(
frmeTaxa, frmeMetadata,
“file”,
fixed_effects = c(‘key’),
random_effects = c(‘ID’),
min_abundance = 0.0001,
min_prevalence = 0.2,
transform = “NONE”,
normalization = ‘TMM’,
analysis_method=‘ZINB’,
reference=key,Post-Combined
)

The variable ‘key’ is a categorical variable with character values (e.g., “Post-Combined”, “Pre-Combined”, etc.). Is this a problem with how I am specifying the name of the category? Or you do think there is something else going on? I’ve also tried putting quotes around the reference as shown under Examples here: https://bioconductor.org/packages/devel/bioc/manuals/Maaslin2/man/Maaslin2.pdf (e.g., reference=‘key,Post-Combined’)

Thanks in advance.

himel.mallick · December 1, 2020, 12:56pm

Hi @Lyoon6 - if your key variable has only 2 levels, this option will not work. Assuming that it has at least three levels, you might want to specify the reference as follows: reference = c('key,Post-Combined'). Let me know if that resolves the error.

Lyoon6 · December 1, 2020, 4:52pm

My variable has 6 levels, so it shouldn’t be an issue and unfortunately c('key,Post-Combined') also did not work.

I’d like to specify the reference variable so that I can better understand my output. As it stands, when I view the all_results file, I have trouble understanding the comparisons being made when the output looks like this. Hence my trying to specify a reference group.

metadata	feature	value	coef	stderr	N	N.not.0	pval	qval
key	OTU1	Pre-Inulin	0.16585027	0.089133456	60	60	0.068338653	0.90003414
key	OTU2	Pre-Calcium	-0.037791923	0.018726267	60	58	0.049543834	0.90003414
key	OTU3	Post-Inulin	0.045776485	0.019071219	60	60	0.020575383	0.90003414
key	OTU5	Post-Combined	-0.024608982	0.01110134	60	58	0.031719596	0.90003414

Do you have any other suggestions on an alternative way to do this?

himel.mallick · December 1, 2020, 5:29pm

@Lyoon6 I am not sure how you are getting the all_results file if you have a fixed effect with more than 2 levels as you are expected to see something like this:

Please provide the reference for the variable 'key' which includes more than 2 levels: ...

Could you please clarify how you got the output file?

Lyoon6 · December 1, 2020, 6:03pm

Hello, sure.
I have a .tsv file which contains metadata and OTU RA. Samples are rows. The key variable is a 6-level categorical variable with character values. The ID variable is the participant ID. This is a longitudinal study, so there are multiple samples per participant.

When I run my code, files are placed into the “maaslin” folder specified in the code. The files I see are: all_results.tsv, maaslin2, residuals.rds, significant_results.tsv, and a figures folder. The code appears to at the very least run (according to the log), though there are no significant associations. I can open the all_results file to see the table that I posted an excerpt from above.

Here is the full code.

#Read in data file
frmeData = read.csv( file = “file.tsv”,
sep = “\t”, header = T, row.names = 1 )

#Break up the data into two data frames (metadata and taxa)
frmeMetadata = data.frame(frmeData[1:16])
frmeTaxa = data.frame(t( frmeData[17:ncol( frmeData )] ))

#Run Maaslin2
fit_data <- Maaslin2(
frmeTaxa, frmeMetadata,
“C:/Users…/maaslin”,
fixed_effects = c(‘key’),
random_effects = c(‘ID’),
min_abundance = 0.0001,
min_prevalence = 0.2,
transform = “NONE”)

himel.mallick · December 1, 2020, 7:08pm

OK - that makes sense, which means the code is working without the reference option which is weird as it should throw an error for fixed effects with >2 levels. Does the following return TRUE?

is.factor(frmeMetadata$key)

Lyoon6 · December 1, 2020, 7:29pm

No, it reuters FALSE. If I convert the variable to a factor (frmeData$key <- as_factor(frmeData$key)), it returns TRUE but still runs into the same unused argument issue.

himel.mallick · December 1, 2020, 7:52pm

Thanks, @Lyoon6. In order to debug this issue, I need to reproduce the error on my end. Would you be able to share a subset of your de-identified data?

Lyoon6 · December 1, 2020, 7:55pm

Sure, I can do that. Should I send you an email?

himel.mallick · December 1, 2020, 8:09pm

Sure. Email works or if you are fine with sharing the feature and metadata files here that can work as well.

himel.mallick · December 1, 2020, 10:28pm

Hi @Lyoon6 - thanks for sharing the data. I have run the following which seems to be generating the correct output. Could you please verify if this chunk of code works on your end as well?

##################
# Load libraries #
##################

library(data.table)
library(tibble)
library(Maaslin2)

################
# Load dataset #
################

df<-data.table::fread('./RA_OTUs_sub.tsv')
frmeDataSub<-column_to_rownames(df, 'V1')
frmeMetadataSub = data.frame(frmeDataSub[1:3])
frmeTaxaSub = data.frame(frmeDataSub[4:ncol(frmeDataSub)])

################
# Run MaAsLin2 #
################

fit_data <- Maaslin2(input_data = frmeTaxaSub, 
                     input_metadata = frmeMetadataSub,
                     output = getwd(),
                     fixed_effects = 'key',
                     random_effects = 'ID',
                     min_abundance = 0.0001,
                     min_prevalence = 0.2,
                     transform = 'NONE',
                     reference = 'key,Post-Te')

Lyoon6 · December 2, 2020, 1:02am

Hi @himel.mallick-

Unfortunately, I’m still getting the same error message:

Error in Maaslin2(input_data = frmeTaxaSub, input_metadata = frmeMetadataSub, :
unused argument (reference = “key,Post-Te”)

I went and uninstalled-reinstalled Maaslin2 and still got the error with the exact code you provided:

library(data.table)
library(tibble)
library(Maaslin2)

################

Load dataset

################

df<-data.table::fread(“C:/Users/…/RA_OTUs_sub.tsv”)
frmeDataSub<-column_to_rownames(df, ‘V1’)
frmeMetadataSub = data.frame(frmeDataSub[1:3])
frmeTaxaSub = data.frame(frmeDataSub[4:nrow(frmeDataSub)])

################

Run MaAsLin2

################

fit_data <- Maaslin2(input_data = frmeTaxaSub,
input_metadata = frmeMetadataSub,
output = getwd(),
fixed_effects = ‘key’,
random_effects = ‘ID’,
min_abundance = 0.0001,
min_prevalence = 0.2,
transform = ‘NONE’,
reference = ‘key,Post-Te’)

himel.mallick · December 2, 2020, 2:07am

Hi @Lyoon6 - have you tried installing directly from GitHub? Also, after installing and loading, could you please verify if you see the option reference in the function arguments when you type ?Maaslin2 in R? It looks like your function call is not recognizing the argument which can only happen with an older version.

Lyoon6 · December 2, 2020, 2:42am

Ah- that’s the problem! No reference option in the function arguments. It must be an older version of Maaslin2.

This is what I was using to download originally:

if(!requireNamespace(“BiocManager”, quietly = TRUE))
install.packages(“BiocManager”)
BiocManager::install(“Maaslin2”)

Are there instructions to download from GitHub?

himel.mallick · December 2, 2020, 3:08am

You can simply do the following:

library(devtools)
devtools::install_github("biobakery/maaslin2")

I am glad we have identified the issue. Re-installing should resolve the issue on your end (let me know if not).

Lyoon6 · December 2, 2020, 4:12am

Reference argument now available & code running.

Thanks for your help working through this. Can’t believe the solution was as simple as updating the version.

Dhrati_Patangia · July 26, 2021, 10:37am

Hi, I am trying to use Maaslin2 and I have a significant results file generated which has many output results but only 10 figures in the figures folder. This is way less than the number of significant results written in the results file. Can you help?
Thank you

Best
DP

himel.mallick · July 26, 2021, 10:56am

Hi @Dhrati_Patangia - the figures generally are multi-page PDFs one for each metadata combined across significant features. In addition, for categorical variables, the p-values are usually pairwise whereas the figures are across all categories - hence more rows in the results table than the corresponding PDFs.

Does it make sense?

Best,
Himel

Topic		Replies	Views
Setting reference levels with 2 factors MaAsLin	10	2178	November 16, 2021
Maaslin2 for univariable analysis (one fixed-effect) MaAsLin	4	790	December 13, 2021
"value" column in the result only shows one level for fixed effect variable MaAsLin	3	267	September 21, 2023
Multiple variables used as the references MaAsLin	5	1260	April 19, 2022
How to select fixed_effects MaAsLin	2	425	August 15, 2023

Reference values for a categorical fixed effect variable

Load dataset

Run MaAsLin2

Related topics