Potential Bug in `maaslin3` with `input_metadata` Parameter

hulfred · January 2, 2025, 4:09pm

Hi,

Thank you for creating this excellent tool!

While experimenting with the maaslin3 package from Bioconductor, I encountered a potential bug related to the input_metadata parameter. It appears that it must be explicitly named metadata. Below is a minimal reproducible example:

library(maaslin3)
library(microbiome)
library(tidyverse)

data(atlas1006, package = "microbiome")
taxa_table = microbiome::abundances(atlas1006)
meta_data = microbiome::meta(atlas1006)
meta_data = meta_data %>%
  dplyr::mutate(
    # Get actual sequencing depth (total reads) from the example data
    reads = microbiome::readcount(atlas1006)
    )

set.seed(123)
out_maaslin3 = maaslin3(input_data = t(taxa_table),
                        input_metadata = meta_data,
                        output = "maaslin3_output",
                        formula = '~ age + reads',
                        normalization = 'TSS',
                        transform = 'LOG',
                        correction = 'BH',
                        augment = TRUE,
                        standardize = TRUE,
                        max_significance = 0.05,
                        median_comparison_abundance = TRUE,
                        median_comparison_prevalence = FALSE,
                        cores = 1,
                        plot_summary_plot = FALSE,
                        verbosity = 'WARN')

This resulted in the following error:
Error in maaslin_read_data(input_data, input_metadata, feature_specific_covariate, : object 'metadata' not found

However, when I renamed meta_data to metadata, the issue was resolved.

Additionally, I encountered a problem when using parallel computing. Despite ensuring the metadata was named metadata, the error persisted in the following example:

library(doParallel)
library(doRNG)

cl = makeCluster(2)
registerDoParallel(cl)
seed_list = 1:2
res_sim = foreach(i = seed_list, .combine = rbind, .verbose = TRUE, 
                  .packages = c("tidyverse", "maaslin3", "microbiome", "phyloseq")) %dorng% 
  {
    set.seed(i)
    data(atlas1006, package = "microbiome")
    taxa_table = microbiome::abundances(atlas1006)
    metadata = microbiome::meta(atlas1006)
    metadata = metadata %>%
      dplyr::mutate(
        # Get actual sequencing depth (total reads) from the example data
        reads = microbiome::readcount(atlas1006)
      )
    
    set.seed(123)
    out_maaslin3 = maaslin3(input_data = t(taxa_table),
                            input_metadata = metadata,
                            output = "maaslin3_output",
                            formula = '~ age + reads',
                            normalization = 'TSS',
                            transform = 'LOG',
                            correction = 'BH',
                            augment = TRUE,
                            standardize = TRUE,
                            max_significance = 0.05,
                            median_comparison_abundance = TRUE,
                            median_comparison_prevalence = FALSE,
                            cores = 1,
                            plot_summary_plot = FALSE,
                            verbosity = 'WARN')
  }
stopCluster(cl)

The error message in this case was:
Error in { : task 1 failed - "object 'metadata' not found"

For your reference, here is my session info:

R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.0

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doRNG_1.8.6       rngtools_1.5.2    foreach_1.5.2     DT_0.33           lubridate_1.9.4  
 [6] forcats_1.0.0     stringr_1.5.1     dplyr_1.1.4       purrr_1.0.2       readr_2.1.5      
[11] tidyr_1.3.1       tibble_3.2.1      tidyverse_2.0.0   microbiome_1.28.0 ggplot2_3.5.1    
[16] phyloseq_1.50.0   maaslin3_0.99.2   ANCOMBC_2.8.0    

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3              rstudioapi_0.17.1               jsonlite_1.8.9                 
  [4] magrittr_2.0.3                  TH.data_1.1-2                   farver_2.1.2                   
  [7] nloptr_2.1.1                    rmarkdown_2.29                  ragg_1.3.3                     
 [10] fs_1.6.5                        zlibbioc_1.52.0                 vctrs_0.6.5                    
 [13] multtest_2.62.0                 minqa_1.2.8                     base64enc_0.1-3                
 [16] htmltools_0.5.8.1               S4Arrays_1.6.0                  energy_1.7-12                  
 [19] haven_2.5.4                     Rhdf5lib_1.28.0                 cellranger_1.1.0               
 [22] rhdf5_2.50.1                    SparseArray_1.6.0               Formula_1.2-5                  
 [25] htmlwidgets_1.6.4               plyr_1.8.9                      sandwich_3.1-1                 
 [28] rootSolve_1.8.2.4               zoo_1.8-12                      igraph_2.1.2                   
 [31] lifecycle_1.0.4                 iterators_1.0.14                pkgconfig_2.0.3                
 [34] Matrix_1.7-1                    R6_2.5.1                        fastmap_1.2.0                  
 [37] GenomeInfoDbData_1.2.13         rbibutils_2.3                   MatrixGenerics_1.18.0          
 [40] digest_0.6.37                   Exact_3.3                       numDeriv_2016.8-1.1            
 [43] colorspace_2.1-1                S4Vectors_0.44.0                textshaping_0.4.1              
 [46] Hmisc_5.2-1                     GenomicRanges_1.58.0            vegan_2.6-8                    
 [49] labeling_0.4.3                  timechange_0.3.0                mgcv_1.9-1                     
 [52] httr_1.4.7                      TreeSummarizedExperiment_2.14.0 abind_1.4-8                    
 [55] compiler_4.4.2                  proxy_0.4-27                    bit64_4.5.2                    
 [58] withr_3.0.2                     doParallel_1.0.17               gsl_2.1-8                      
 [61] htmlTable_2.4.3                 backports_1.5.0                 BiocParallel_1.40.0            
 [64] MASS_7.3-61                     DelayedArray_0.32.0             biomformat_1.34.0              
 [67] permute_0.9-7                   gtools_3.9.5                    CVXR_1.0-15                    
 [70] gld_2.6.6                       optparse_1.7.5                  tools_4.4.2                    
 [73] foreign_0.8-87                  ape_5.8-1                       nnet_7.3-19                    
 [76] glue_1.8.0                      rhdf5filters_1.18.0             nlme_3.1-166                   
 [79] grid_4.4.2                      Rtsne_0.17                      checkmate_2.3.2                
 [82] ade4_1.7-22                     cluster_2.1.8                   reshape2_1.4.4                 
 [85] generics_0.1.3                  gtable_0.3.6                    tzdb_0.4.0                     
 [88] class_7.3-22                    data.table_1.16.4               lmom_3.2                       
 [91] hms_1.1.3                       XVector_0.46.0                  BiocGenerics_0.52.0            
 [94] pillar_1.10.0                   yulab.utils_0.1.8               logging_0.10-108               
 [97] splines_4.4.2                   getopt_1.20.4                   treeio_1.30.0                  
[100] lattice_0.22-6                  survival_3.8-3                  gmp_0.7-5                      
[103] bit_4.5.0.1                     tidyselect_1.2.1                SingleCellExperiment_1.28.1    
[106] pbapply_1.7-2                   Biostrings_2.74.1               knitr_1.49                     
[109] gridExtra_2.3                   IRanges_2.40.1                  SummarizedExperiment_1.36.0    
[112] stats4_4.4.2                    xfun_0.49                       expm_1.0-0                     
[115] Biobase_2.66.0                  matrixStats_1.4.1               stringi_1.8.4                  
[118] UCSC.utils_1.2.0                lazyeval_0.2.2                  boot_1.3-31                    
[121] evaluate_1.0.1                  codetools_0.2-20                cli_3.6.3                      
[124] rpart_4.1.23                    systemfonts_1.1.0               DescTools_0.99.58              
[127] Rdpack_2.6.2                    munsell_0.5.1                   Rcpp_1.0.13-1                  
[130] GenomeInfoDb_1.42.1             readxl_1.4.3                    parallel_4.4.2                 
[133] lme4_1.1-35.5                   Rmpfr_1.0-0                     mvtnorm_1.3-2                  
[136] tidytree_0.4.6                  lmerTest_3.1-3                  scales_1.3.0                   
[139] e1071_1.7-16                    crayon_1.5.3                    rlang_1.1.4                    
[142] multcomp_1.4-26

I hope this information helps identify and address the issue. Please let me know if you need further details!

Best regards,

WillNickols · January 6, 2025, 9:09pm

Hi hulfred,

Thanks for the explanation and reproducible example! I introduced a bug when updating the code the other day, but it should be fixed now. I’ve confirmed that those two examples work on my machine with the new MaAsLin 3 version (0.99.3). You can install the new version with:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("biobakery/maaslin3")

and then make sure there aren’t old MaAsLin 3 conflicts by unloading all packages with .rs.restartR().

Let me know if that works!
Will

Topic		Replies	Views
Input_data and input_metadata Specifications MaAsLin	1	748	July 15, 2021
Can't recognize input data in setting up MaAsLin	1	481	June 17, 2022
Error running Maaslin2 about input MaAsLin	3	1483	February 2, 2021
Error running Maaslin2 MaAsLin	8	3619	March 22, 2022
Error in Maaslin2(input_data = input_data, input_metadata = input_metadata, : Please provide the reference for the variable 'diagnosis' which includes more than 2 levels: UC, CD, nonIBD MaAsLin	4	2762	February 23, 2024

Potential Bug in `maaslin3` with `input_metadata` Parameter

Related topics