The bioBakery help forum

Adjust_batch 'Feature table does not appear to be either proportions or counts!'

Dear all,

I performed the demo with the CRC data with no hassle but now while loading my data I have a problem with the dimensions of the data

MMUPHinTax <- read_tsv("./METADATA/MMUPHin_TAXA.csv")
MMUPHinmeta <- read_tsv("./METADATA/MMUPHin_METAa.csv")
MMUPHinTaxa <- MMUPHinTax %>% column_to_rownames("ID") %>%  as.matrix()
MMUPHinMETA <- MMUPHinmeta %>% column_to_rownames("ID")

fit_adjust_batch <- adjust_batch(feature_abd = Taxa,
                                 batch = "study",
                                 covariates = NULL,
                                 data = META,
                                 control = list(verbose = FALSE))

The dimensions seem to match at least with this checking. The labels (sample or ID) match so I don’t know what can be causing trouble here.

> class(MMUPHinMETA)
[1] "data.frame"
> class(MMUPHinTaxa)
[1] "matrix" "array" 
> dim(MMUPHinTaxa)
[1] 807 344
> dim(MMUPHinMETA)
[1] 344   4

Loaded Packages
[1] vegan_2.5-7 lattice_0.20-41 permute_0.9-5
[4] magrittr_2.0.1 MMUPHin_1.4.2 lpsymphony_1.17.0
[7] compositions_2.0-2 forcats_0.5.1 stringr_1.4.0
[10] dplyr_1.0.7 purrr_0.3.4 readr_2.0.1
[13] tidyr_1.1.3 tibble_3.1.3 ggplot2_3.3.5
[16] tidyverse_1.3.1 nlme_3.1-152

I already checked this previous issue but maybe I am missing something obvious. Thanks in advance for any help on this issue.

I am attaching the files
MMUPHin_TAXA.csv (746.3 KB)
MMUPHin_META.csv (7.0 KB)



Sorry for the double post, I updated R and reinstalled MMUPHin but the problem persists.
Here is the sessionInfo() output:

R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Generic 34 (Generic)

Matrix products: default
BLAS/LAPACK: /usr/lib64/


attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] vegan_2.5-7 lattice_0.20-44 permute_0.9-5
[4] forcats_0.5.1 stringr_1.4.0 purrr_0.3.4
[7] tidyr_1.1.3 tibble_3.1.3 tidyverse_1.3.1
[10] readr_2.0.1 ggplot2_3.3.5 dplyr_1.0.7
[13] magrittr_2.0.1 MMUPHin_1.6.1 BiocManager_1.30.16

loaded via a namespace (and not attached):
[1] nlme_3.1-152 fs_1.5.0 usethis_2.0.1 lubridate_1.7.10
[5] devtools_2.4.2 bit64_4.0.5 httr_1.4.2 rprojroot_2.0.2
[9] tools_4.1.1 backports_1.2.1 metafor_3.0-2 utf8_1.2.2
[13] R6_2.5.0 DBI_1.1.1 mgcv_1.8-36 colorspace_2.0-2
[17] withr_2.4.2 tidyselect_1.1.1 prettyunits_1.1.1 processx_3.5.2
[21] bit_4.0.4 curl_4.3.2 compiler_4.1.1 cli_3.0.1
[25] rvest_1.0.1 logging_0.10-108 biglm_0.9-2.1 xml2_1.3.2
[29] desc_1.3.0 labeling_0.4.2 scales_1.1.1 mvtnorm_1.1-2
[33] DEoptimR_1.0-9 robustbase_0.93-8 pbapply_1.4-3 callr_3.7.0
[37] digest_0.6.27 pkgconfig_2.0.3 sessioninfo_1.1.1 lpsymphony_1.17.0
[41] dbplyr_2.1.1 fastmap_1.1.0 rlang_0.4.11 readxl_1.3.1
[45] rstudioapi_0.13 farver_2.1.0 generics_0.1.0 jsonlite_1.7.2
[49] vroom_1.5.4 Matrix_1.3-4 Rcpp_1.0.7 munsell_0.5.0
[53] fansi_0.5.0 lifecycle_1.0.0 stringi_1.7.3 Maaslin2_1.6.0
[57] mathjaxr_1.4-0 MASS_7.3-54 pkgbuild_1.2.0 grid_4.1.1
[61] parallel_4.1.1 crayon_1.4.1 haven_2.4.3 cowplot_1.1.1
[65] splines_4.1.1 hash_2.2.6.1 hms_1.1.0 ps_1.6.0
[69] pillar_1.6.2 optparse_1.6.6 igraph_1.2.6 pkgload_1.2.1
[73] reprex_2.0.1 glue_1.4.2 remotes_2.4.0 modelr_0.1.8
[77] vctrs_0.3.8 tzdb_0.1.2 testthat_3.0.4 cellranger_1.1.0
[81] getopt_1.20.3 gtable_0.3.0 assertthat_0.2.1 cachem_1.0.5
[85] broom_0.7.9 pcaPP_1.9-74 memoise_2.0.0 cluster_2.1.2
[89] ellipsis_0.3.2

  1. You have your abundance table formatted as what I think are supposed to be percentages. You can convert them to proportions in [0,1] by dividing out the sum of each column like so:

Taxa = Taxa %>% apply(2, function(x) x / sum(x))

  1. You also seem to have duplicate study identifiers. Below I deduplicate the identifiers in the metadata the same way read_tsv does it when it reads in the abundances, but you should make sure that this is appropriate.
META$ID[duplicated(META$ID)] = paste0(META$ID[duplicated(META$ID)], '_1')
META = META %>% %>% column_to_rownames("ID")

fit_adjust_batch runs after these two steps for me.

Dear Andrew, thank you very much for your response. I used the correction and it worked!
A follow up question, I used this code since I want to see differences between response, considering the study and controlling for age & sex.

fit_adjust_batch <- adjust_batch(feature_abd = TaXa,
                                 batch = "study",
                                 covariates = c("age", "sex"), 
                                 data = MeTa,
                                 control = list(verbose = FALSE))
TaXa <- TaXa %>% column_to_rownames("ID") %>% as.matrix()
MeTa <- MeTa %>% column_to_rownames("ID")
TaXa = TaXa %>% apply(2, function(x) x / sum(x))
CRC_abd_adj <- fit_adjust_batch$feature_abd_adj # I just recycled this 

fit_lm_meta <- lm_meta(feature_abd = TaXa,
                       batch = "study",
                       exposure = "response",
                       data = MeTa,
                       control = list(verbose = FALSE))
meta_fits <- fit_lm_meta$meta_fits

Could you expand on why the results in the response_R_forest.pdf (23 species) differ from the bar plot (with just 2 features).
Inspecting the meta_fits object I found 40 features/species with p-value <0.05 (and just 2 with FDR<0.05). Please find attached these results.

meta_fits_FORUM.tsv (160.6 KB)
I replaced the sample by a sequence of numbers in both files
META.csv (5.5 KB)
TAXA.csv (744.8 KB)

Thanks for your help!

For a bug to get an entry in the forest plot it needs to show up in at least two studies. So you have 40 bugs with a nominal p-value below 0.05, 23 of which show up in at least two studies, and two of which have a q-value below 0.05.

1 Like

Thank you very much for the explanation!