Hello!
I am trying to run Maaslin2 with the code:
input_data = read.table(file = "4Masslin2_input.data_kos.taxonomy.archaea.mt.2group.tsv",
header = TRUE, sep = "\t")
rownames(input_data) <- input_data$Geneid_ord
input_data$Geneid_ord = NULL
metadata = read.table(file = "4Masslin2_metadata_kos.taxonomy.archaea.mt.2group.tsv",
header = TRUE, sep = "\t")
rownames(metadata) <- metadata$Geneid_ord
metadata$Geneid_ord = NULL
# Create the 'Ctrl' column
metadata$Ctrl <- ifelse(metadata$Diagnosis == "Ctrl", "Yes", "No")
# Create the 'PD' column
metadata$PD <- ifelse(metadata$Diagnosis == "PD", "Yes", "No")
# Create the 'iRBD' column
metadata$iRBD <- ifelse(metadata$Diagnosis == "iRBD", "Yes", "No")
reference <- unique(metadata$S)
reference <- c("Methanobrevibacter_A smithii","Methanobrevibacter_A smithii_A","Methanosphaera stadtmanae","Methanomethylophilus alvus","DTU008 sp001421185","Methanomassiliicoccus luminyensis","MX-02 sp006954405","Coprobacillus cateniformis","Methanobrevibacter_C arboriphilus_A","Methanosphaera cuniculi")
Maaslin2(input_data = input_data,
input_metadata = metadata,
fixed_effects = c("Ctrl", "PD", "iRBD", "S"),
reference = reference,
min_prevalence = 0,
output = "test",
transform = "LOG",
plot_heatmap = TRUE,
plot_scatter = TRUE,
heatmap_first_n = 50,
max_significance = 1)
Examples of my metadata and input data are below:
metadata
:
Diagnosis D P C O F G
K00053_1 Ctrl Archaea Methanobacteriota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobrevibacter_A
K00053_2 Ctrl Archaea Methanobacteriota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobrevibacter_A
K00053_3 Ctrl Archaea Methanobacteriota Methanobacteria Methanobacteriales Methanobacteriaceae Methanosphaera
K00053_4 Ctrl Archaea Thermoplasmatota Thermoplasmata Methanomassiliicoccales Methanomethylophilaceae Methanomethylophilus
K00053_5 PD Archaea Methanobacteriota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobrevibacter_A
K00053_6 PD Archaea Methanobacteriota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobrevibacter_A
S Ctrl PD iRBD
K00053_1 Methanobrevibacter_A smithii Yes No No
K00053_2 Methanobrevibacter_A smithii_A Yes No No
K00053_3 Methanosphaera stadtmanae Yes No No
K00053_4 Methanomethylophilus alvus Yes No No
K00053_5 Methanobrevibacter_A smithii No Yes No
K00053_6 Methanobrevibacter_A smithii_A No Yes No
input_data
:
tpm
K00053_1 166.502489
K00053_2 188.409788
K00053_3 69.970092
K00053_4 2.219452
K00053_5 642.522944
K00053_6 136.308126
As a result I receive an error:
2023-05-11 17:25:04 INFO::Writing function arguments to log file
2023-05-11 17:25:04 INFO::Verifying options selected are valid
2023-05-11 17:25:04 INFO::Determining format of input files
2023-05-11 17:25:04 INFO::Input format is data samples as rows and metadata samples as rows
2023-05-11 17:25:04 INFO::Formula for fixed effects: expr ~ Ctrl + PD + iRBD + S
Error in Maaslin2(input_data = input_data, input_metadata = metadata, :
Please provide the reference for the variable 'S' which includes more than 2 levels: Methanobrevibacter_A smithii, Methanobrevibacter_A smithii_A, Methanosphaera stadtmanae, Methanomethylophilus alvus, Methanomassiliicoccus_A intestinalis, UBA71 sp905187815, DTU008 sp001421185, Methanomassiliicoccus luminyensis, MX-02 sp006954405, Coprobacillus cateniformis, Methanobrevibacter_C arboriphilus_A, Methanosphaera cuniculi, Methanobrevibacter ruminantium_A.
Could you please suggest a solution to the error and probably the source of it?