Abundance values as input data


We are working on using maaslin2 as a possible exploratory analysis of our microbiome data and how it relates to social behavior. Currently, our code looks like the following:

fit_data ← Maaslin2(
input_data, input_metadata, ‘maaslin2_attempt/degree’,
fixed_effects = c(‘degree_log10_std’,‘group_size’,‘age’,‘valley_position’,‘sex’),
random_effects = c(‘uid’,‘year’),
standardize = FALSE)

Here, we have our input_data as our raw sequence reads table (of feature IDs) that we calculated relative abundance on and then CLR transformed that data. We are wondering if we need to input our raw data table, or OTU table, (I’ve attached a subset of this file [since it is too big to upload] and is labeled as final_merged_table.csv here!) that contains each feature ID and whether or not it contains reads OR if we need to calculate abundance first on each feature ID (as shown in the attached file labeled final_data_28Feb22.csv [this is also CLR transformed for reference]) as our input file?

Essentially, our question is whether input_data is supposed to be data that is already calculated for relative abundance or if maaslin2 will do that for us?

final_data_28Feb22.csv (2.7 MB)
final_merged_table_subset.csv (3.9 MB)

Thank you!

This section of the tutorial has guidance on this, let me know if that doesn’t answer your question.