The bioBakery help forum

MelonnPan crash with simple input file

Good Day!

I am experiencing constant crashes with MelonnPan Predict on my input files. The processing ends with the following result:

Error in ztransform(formula, data, family) :
formula argument must be a formula or one of (numeric, integer, double)
Calls: → apply → FUN → ztransform
Execution halted

I am attaching a fairly simple example, but similar behavior happens on larger files as well.
melonnpan_test1.txt (184 Bytes)

When I run MelonnPan with the demo file from its tutorial web site
(https://raw.githubusercontent.com/biobakery/melonnpan/master/data/melonnpan.test.data.txt)
everything works okay, so I assume that my MelonnPan installation is also okay.

I cannot see any structural differences between my files and the demo one. I have checked with the hexdump utility that my as well as the demo files have the same use of tab and newline characters, and they are both strictly ASCII coded.

MelonnPan is installed in a separate Conda environment. I am uploading a file with package list in the environment, as well as R packages.

conda_package_list.txt (12.3 KB)

R_packages.txt (186.7 KB)

MelonnPan installation was done by the following commands:

conda create -y --name melonnpan_env r-base=3.6.3 r-devtools=2.3.2

conda run --name melonnpan_env Rscript -e ‘devtools::install_version(“GenABEL.data”, version = “1.0.0”, repos = “http://cran.us.r-project.org”)’

conda run --name melonnpan_env Rscript -e ‘devtools::install_version(“GenABEL”, version = “1.8-0”, repos = “http://cran.us.r-project.org”)’

conda run --name melonnpan_env Rscript -e ‘devtools::install_github(“biobakery/melonnpan”)’

After these steps MelonnPan successfully processes the demo input file. However, with some of my files the above mentioned crash occurs. With some other files the processing ends with the message that there is no overlap between IDs.

I am trying to input results of Humann2 and Humann3 to MelonnPan. I am restructuring the HumannX output files with my programs to be suitable for MelonnPan, since I was unable to find any utility or instructions for making the conversion. However, many MelonnPan users do the same, and I suspect that there is some utility for the task. I would appreciate any information about it.

Thank you and best regards,
Bostjan Murovec

Hi @BMurovec - it looks like you are getting this error during the z-transformation step which strictly requires more than two unique values to be executable. I will suggest filtering out both metabolite and metagenomics features which have only a few unique values so that (i) the 10-fold CV runs smoothly, and (ii) the preprocessing of the predictor matrix runs without an error. Let me know if the above resolves the issue on your end?

Dear himel.mallick,

thank you for you extremely prompt answer. Unfortunately, this does not solve the problem. The previously posted input file was just a minimal example, but the problem persists with much larger input files. Here I am attaching another input file, where rows are filtered to those that include at least two non-zero values.

Melonnpan_test2.txt (494 Bytes)

Thank you again and best regards,
Bostjan Murovec

Hi @BMurovec - I see that there are only 2 unique values per feature in the updated input file. This is going to be problematic for the rank-based inverse normal transformation as it operates on a per-feature basis (i.e. column). This was also discussed in a previous thread which you may find useful. Of course, let me know if removing those features with only 2 unique values resolves the problem.

Dear himel.mallick,

thank you again. Of course, an input with plenty of entries per row and a sufficient number of rows works.

By the way, would you be so kind to let me know, whether there is some official instruction how to prepare Humann output for MelonnPan.

Thank you and best regards,
Bostjan Murovec

Hi @BMurovec - here is the link to the tutorial: melonnpan · biobakery/biobakery Wiki · GitHub. This tutorial focuses on UniRef90 as an example but the input structure should be similar for other HUMAnN output.

Thank you very much!

You have been extremely prompt in answering my questions.

Kindest regads,

Bostjan Murovec

1 Like