How to use anpan

Hello, Dear developers!

I encountered a little problem when using anpan.When I was following the example on the anpan website(biobakery/anpan), I found that the sample files in the extdata folder of anpan do not seem to support the execution of the sample scripts.

library(anpan)

anpan_batch(bug_dir           = "/path/to/functional_profiles/",
            meta_file         = "/path/to/metadata.tsv",
            out_dir           = "/path/to/output",
            annotation_file   = "/path/to/annotation.tsv", #optional, used for plots
            filtering_method  = "kmeans",
            model_type        = "fastglm",
            covariates        = c("age", "gender"),
            outcome           = "crc",
            plot_ext          = "pdf",
            save_filter_stats = TRUE)

Then, I consolidated my own data into the sample file format.For example, the annotation abundance table of all sample gene families is split into individual small files according to different species.As shown in the figure:

And my metadata:

image

And modify the script as follows:

anpan_batch(bug_dir           = mybug_dir/,
            meta_file         = mymeta_file,
            out_dir           = myout_dir/,
           # annotation_file   = "/path/to/annotation.tsv", #optional, used for plots
            filtering_method  = "none",
            model_type        = "fastglm",
            covariates        = c("treatment", "part","diarrhea_score","ADG","cardiac_index","liver_index","lung_index","kidney_index"),
            outcome           = "diarrhea_score",
            plot_ext          = "pdf",
            save_filter_stats = TRUE)

But it just return:

was skipped because no samples passed the filter criteria
Error in `dplyr::relocate()`:
   3635 ! Can't select columns that don't exist.
   3636 ✖ Column `bug_name` doesn't exist.

Due to the lack of correct and usable examples, I am very confused. Is it because there is something wrong with my data format or is it because my parameter settings are incorrect?

I’m very sorry to disturb you. I hope you can take the time to reply to my question.

Thank you very much!

Best wishes!

anpan_batch() is for multiple bugs, that’s why it doesn’t work for the data from a single bug in extdata. For a single bug, use anpan(). Here’s an example:

library(anpan)

anpan_dir = system.file("extdata",
                        package = "anpan")

meta = file.path(anpan_dir,
                 "fake_metadata.tsv")

bug_file = file.path(anpan_dir,
                     "g__Madeuppy.s__Madeuppy_Fakerii.genefamilies.tsv.gz")

td = tempdir()

out_dir = file.path(td, "output")

anpan(bug_file          = bug_file,
      meta_file         = meta,
      out_dir           = out_dir,
      filtering_method  = "kmeans",
      model_type        = "fastglm",
      covariates        = NULL,
      outcome           = "is_case",
      plot_ext          = "pdf",
      save_filter_stats = TRUE)

unlink(out_dir)

I don’t think I have enough information to debug why your anpan_batch() command isn’t working. A couple thoughts/questions:

  • It may have to do with the t__SGB… component of your file names, though I doubt it. Maybe try deleting that component.
  • Does anything get written to the output directory?
  • Does it error out immediately or does it run the models (you’ll be able to tell from the console output), print a message about plotting, then error out?
  • I assume it’s erroring out at this line, the question is why… Could you try to run read_and_filter() on one of your input files and check that it looks okay?
  • It may be that setting filtering_method = “none” causes problems if all bugs are detected at non-zero levels in all samples. Try setting that to “kmeans” and check what happens.

You can verify that anpan_batch() works like so:

input_dir = file.path(td, "bug_dir")

dir.create(input_dir)

file.copy(bug_file, input_dir)

file.copy(bug_file,
          file.path(input_dir,
                    "g__madeup.s__madeup_bug.genefamilies.tsv.gz"))

anpan_batch(input_dir,
            meta,
            out_dir,
            covariates = NULL,
            outcome = "is_case")

Thank you so much for your thoughtful and considerate response! Your reply has been of great help to me. I will follow your instructions and try running anpan again.
Once again, I would like to express my respect and gratitude to you!
Best wishes!