Hi,
Thank you for creating this excellent tool!
While experimenting with the maaslin3
package from Bioconductor, I encountered a potential bug related to the input_metadata
parameter. It appears that it must be explicitly named metadata
. Below is a minimal reproducible example:
library(maaslin3)
library(microbiome)
library(tidyverse)
data(atlas1006, package = "microbiome")
taxa_table = microbiome::abundances(atlas1006)
meta_data = microbiome::meta(atlas1006)
meta_data = meta_data %>%
dplyr::mutate(
# Get actual sequencing depth (total reads) from the example data
reads = microbiome::readcount(atlas1006)
)
set.seed(123)
out_maaslin3 = maaslin3(input_data = t(taxa_table),
input_metadata = meta_data,
output = "maaslin3_output",
formula = '~ age + reads',
normalization = 'TSS',
transform = 'LOG',
correction = 'BH',
augment = TRUE,
standardize = TRUE,
max_significance = 0.05,
median_comparison_abundance = TRUE,
median_comparison_prevalence = FALSE,
cores = 1,
plot_summary_plot = FALSE,
verbosity = 'WARN')
This resulted in the following error:
Error in maaslin_read_data(input_data, input_metadata, feature_specific_covariate, : object 'metadata' not found
However, when I renamed meta_data
to metadata
, the issue was resolved.
Additionally, I encountered a problem when using parallel computing. Despite ensuring the metadata was named metadata
, the error persisted in the following example:
library(doParallel)
library(doRNG)
cl = makeCluster(2)
registerDoParallel(cl)
seed_list = 1:2
res_sim = foreach(i = seed_list, .combine = rbind, .verbose = TRUE,
.packages = c("tidyverse", "maaslin3", "microbiome", "phyloseq")) %dorng%
{
set.seed(i)
data(atlas1006, package = "microbiome")
taxa_table = microbiome::abundances(atlas1006)
metadata = microbiome::meta(atlas1006)
metadata = metadata %>%
dplyr::mutate(
# Get actual sequencing depth (total reads) from the example data
reads = microbiome::readcount(atlas1006)
)
set.seed(123)
out_maaslin3 = maaslin3(input_data = t(taxa_table),
input_metadata = metadata,
output = "maaslin3_output",
formula = '~ age + reads',
normalization = 'TSS',
transform = 'LOG',
correction = 'BH',
augment = TRUE,
standardize = TRUE,
max_significance = 0.05,
median_comparison_abundance = TRUE,
median_comparison_prevalence = FALSE,
cores = 1,
plot_summary_plot = FALSE,
verbosity = 'WARN')
}
stopCluster(cl)
The error message in this case was:
Error in { : task 1 failed - "object 'metadata' not found"
For your reference, here is my session info:
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.0
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] doRNG_1.8.6 rngtools_1.5.2 foreach_1.5.2 DT_0.33 lubridate_1.9.4
[6] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[11] tidyr_1.3.1 tibble_3.2.1 tidyverse_2.0.0 microbiome_1.28.0 ggplot2_3.5.1
[16] phyloseq_1.50.0 maaslin3_0.99.2 ANCOMBC_2.8.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.17.1 jsonlite_1.8.9
[4] magrittr_2.0.3 TH.data_1.1-2 farver_2.1.2
[7] nloptr_2.1.1 rmarkdown_2.29 ragg_1.3.3
[10] fs_1.6.5 zlibbioc_1.52.0 vctrs_0.6.5
[13] multtest_2.62.0 minqa_1.2.8 base64enc_0.1-3
[16] htmltools_0.5.8.1 S4Arrays_1.6.0 energy_1.7-12
[19] haven_2.5.4 Rhdf5lib_1.28.0 cellranger_1.1.0
[22] rhdf5_2.50.1 SparseArray_1.6.0 Formula_1.2-5
[25] htmlwidgets_1.6.4 plyr_1.8.9 sandwich_3.1-1
[28] rootSolve_1.8.2.4 zoo_1.8-12 igraph_2.1.2
[31] lifecycle_1.0.4 iterators_1.0.14 pkgconfig_2.0.3
[34] Matrix_1.7-1 R6_2.5.1 fastmap_1.2.0
[37] GenomeInfoDbData_1.2.13 rbibutils_2.3 MatrixGenerics_1.18.0
[40] digest_0.6.37 Exact_3.3 numDeriv_2016.8-1.1
[43] colorspace_2.1-1 S4Vectors_0.44.0 textshaping_0.4.1
[46] Hmisc_5.2-1 GenomicRanges_1.58.0 vegan_2.6-8
[49] labeling_0.4.3 timechange_0.3.0 mgcv_1.9-1
[52] httr_1.4.7 TreeSummarizedExperiment_2.14.0 abind_1.4-8
[55] compiler_4.4.2 proxy_0.4-27 bit64_4.5.2
[58] withr_3.0.2 doParallel_1.0.17 gsl_2.1-8
[61] htmlTable_2.4.3 backports_1.5.0 BiocParallel_1.40.0
[64] MASS_7.3-61 DelayedArray_0.32.0 biomformat_1.34.0
[67] permute_0.9-7 gtools_3.9.5 CVXR_1.0-15
[70] gld_2.6.6 optparse_1.7.5 tools_4.4.2
[73] foreign_0.8-87 ape_5.8-1 nnet_7.3-19
[76] glue_1.8.0 rhdf5filters_1.18.0 nlme_3.1-166
[79] grid_4.4.2 Rtsne_0.17 checkmate_2.3.2
[82] ade4_1.7-22 cluster_2.1.8 reshape2_1.4.4
[85] generics_0.1.3 gtable_0.3.6 tzdb_0.4.0
[88] class_7.3-22 data.table_1.16.4 lmom_3.2
[91] hms_1.1.3 XVector_0.46.0 BiocGenerics_0.52.0
[94] pillar_1.10.0 yulab.utils_0.1.8 logging_0.10-108
[97] splines_4.4.2 getopt_1.20.4 treeio_1.30.0
[100] lattice_0.22-6 survival_3.8-3 gmp_0.7-5
[103] bit_4.5.0.1 tidyselect_1.2.1 SingleCellExperiment_1.28.1
[106] pbapply_1.7-2 Biostrings_2.74.1 knitr_1.49
[109] gridExtra_2.3 IRanges_2.40.1 SummarizedExperiment_1.36.0
[112] stats4_4.4.2 xfun_0.49 expm_1.0-0
[115] Biobase_2.66.0 matrixStats_1.4.1 stringi_1.8.4
[118] UCSC.utils_1.2.0 lazyeval_0.2.2 boot_1.3-31
[121] evaluate_1.0.1 codetools_0.2-20 cli_3.6.3
[124] rpart_4.1.23 systemfonts_1.1.0 DescTools_0.99.58
[127] Rdpack_2.6.2 munsell_0.5.1 Rcpp_1.0.13-1
[130] GenomeInfoDb_1.42.1 readxl_1.4.3 parallel_4.4.2
[133] lme4_1.1-35.5 Rmpfr_1.0-0 mvtnorm_1.3-2
[136] tidytree_0.4.6 lmerTest_3.1-3 scales_1.3.0
[139] e1071_1.7-16 crayon_1.5.3 rlang_1.1.4
[142] multcomp_1.4-26
I hope this information helps identify and address the issue. Please let me know if you need further details!
Best regards,