MaAsLin2 pathway data variable names too large

Dear Sir or Madam at the bioBakery Team,

I am using MaAsLin2 to analyse HuManN3 pathway relative abundance (normalized and merged) with metadata. Below is my code:

path_lactoseASA_LNP <- Maaslin2(input_data = df_input_path_reformat, 
                        input_metadata = df_reformat, 
                        min_prevalence = 0,
                        normalization  = "NONE",
                        output = "path_lactoseASA_LNP_all",
                        fixed_effects = c("LNP", "lacs.asa"))

I keep running into this error message:

Error in do.call(cbind, lapply(x, is.na)) : 
  variable names are limited to 10000 bytes

I thought it was because the rownames of pathway abundance file (which are pathway names) are too long. However, I still recevied the same error message after shorting them.

Any help or tips would be appreciated!

Thank you for your time and consideration.
vicky291

Hi there @vicky291,

We believe this could actually be caused by issues in your column names try switching up your code with this and see what happens:

parsed_input_data <- data.frame(data.table::fread(df_input_path_reformat, header = TRUE, sep = "\t"), row.names = 1)

path_lactoseASA_LNP <- Maaslin2(input_data = parsed_input_data, 
                        input_metadata = df_reformat, 
                        min_prevalence = 0,
                        normalization  = "NONE",
                        output = "path_lactoseASA_LNP_all",
                        fixed_effects = c("LNP", "lacs.asa"))

If not I would also suggest subsetting the PATHWAY names to just their initial identified before the :. This would still allow you to map them back to the metacyc pathways of interest and would cut down on clutter.

Cheers,
Jacob Nearing

Dear @nearinj ,

Thank you very much for the suggested solution! It works.

What I did before was:

df_input_path = read.table(file = 'microbiome/humann_pathabundance_norm_merged_1.tsv', header = TRUE, sep = "\t",
                            row.names = NULL,
                            stringsAsFactors = FALSE)
df_input_path_reformat <- data.frame(df_input_path[,-1], row.names = df_input_path[,1])

path_lactoseASA_LNP <- Maaslin2(input_data = df_input_path_reformat, 
                        input_metadata = df_reformat, 
                        min_prevalence = 0,
                        normalization  = "NONE",
                        output = "path_lactoseASA_LNP_all",
                        fixed_effects = c("LNP", "lacs.asa"))

Do you happen to know why your way of reading the input file works but not mine?

Thank you for your time and help,
Vicky