I am trying to use the maaslin2 package to model the association between a categorical intervention variable (3 levels, character variable) and OTUs. My data is longitudinal with repeated measures so I am using a mixed model with fixed effects for the intervention and random effects for my participant ID.
I’d like to set the reference category for my intervention variable and have tried to install the latest version which has the reference option (as suggested here: Which factor level to compare with) but I am getting an unused argument error when I try to run my code. Example:
Hi @Lyoon6 - this needs to be provided as a string of variable,reference (semi-colon delimited for multiple variables) which in your case translates to reference = intervention,int1 .
The variable ‘key’ is a categorical variable with character values (e.g., “Post-Combined”, “Pre-Combined”, etc.). Is this a problem with how I am specifying the name of the category? Or you do think there is something else going on? I’ve also tried putting quotes around the reference as shown under Examples here: https://bioconductor.org/packages/devel/bioc/manuals/Maaslin2/man/Maaslin2.pdf (e.g., reference=‘key,Post-Combined’)
Hi @Lyoon6 - if your key variable has only 2 levels, this option will not work. Assuming that it has at least three levels, you might want to specify the reference as follows: reference = c('key,Post-Combined'). Let me know if that resolves the error.
My variable has 6 levels, so it shouldn’t be an issue and unfortunately c('key,Post-Combined') also did not work.
I’d like to specify the reference variable so that I can better understand my output. As it stands, when I view the all_results file, I have trouble understanding the comparisons being made when the output looks like this. Hence my trying to specify a reference group.
metadata
feature
value
coef
stderr
N
N.not.0
pval
qval
key
OTU1
Pre-Inulin
0.16585027
0.089133456
60
60
0.068338653
0.90003414
key
OTU2
Pre-Calcium
-0.037791923
0.018726267
60
58
0.049543834
0.90003414
key
OTU3
Post-Inulin
0.045776485
0.019071219
60
60
0.020575383
0.90003414
key
OTU5
Post-Combined
-0.024608982
0.01110134
60
58
0.031719596
0.90003414
Do you have any other suggestions on an alternative way to do this?
@Lyoon6 I am not sure how you are getting the all_results file if you have a fixed effect with more than 2 levels as you are expected to see something like this:
Please provide the reference for the variable 'key' which includes more than 2 levels: ...
Could you please clarify how you got the output file?
Hello, sure.
I have a .tsv file which contains metadata and OTU RA. Samples are rows. The key variable is a 6-level categorical variable with character values. The ID variable is the participant ID. This is a longitudinal study, so there are multiple samples per participant.
When I run my code, files are placed into the “maaslin” folder specified in the code. The files I see are: all_results.tsv, maaslin2, residuals.rds, significant_results.tsv, and a figures folder. The code appears to at the very least run (according to the log), though there are no significant associations. I can open the all_results file to see the table that I posted an excerpt from above.
Here is the full code.
#Read in data file
frmeData = read.csv( file = “file.tsv”,
sep = “\t”, header = T, row.names = 1 )
#Break up the data into two data frames (metadata and taxa)
frmeMetadata = data.frame(frmeData[1:16])
frmeTaxa = data.frame(t( frmeData[17:ncol( frmeData )] ))
OK - that makes sense, which means the code is working without the reference option which is weird as it should throw an error for fixed effects with >2 levels. Does the following return TRUE?
No, it reuters FALSE. If I convert the variable to a factor (frmeData$key <- as_factor(frmeData$key)), it returns TRUE but still runs into the same unused argument issue.
Thanks, @Lyoon6. In order to debug this issue, I need to reproduce the error on my end. Would you be able to share a subset of your de-identified data?
Hi @Lyoon6 - thanks for sharing the data. I have run the following which seems to be generating the correct output. Could you please verify if this chunk of code works on your end as well?
Hi @Lyoon6 - have you tried installing directly from GitHub? Also, after installing and loading, could you please verify if you see the option reference in the function arguments when you type ?Maaslin2 in R? It looks like your function call is not recognizing the argument which can only happen with an older version.
Hi, I am trying to use Maaslin2 and I have a significant results file generated which has many output results but only 10 figures in the figures folder. This is way less than the number of significant results written in the results file. Can you help?
Thank you
Hi @Dhrati_Patangia - the figures generally are multi-page PDFs one for each metadata combined across significant features. In addition, for categorical variables, the p-values are usually pairwise whereas the figures are across all categories - hence more rows in the results table than the corresponding PDFs.