MelonnPan training -- no common IDS

rkamales · July 29, 2020, 2:08pm

Hello,

I’m interested in running MelonnPan on my Humann3 output generated from microbiome experiments from mouse models, I’ve got three files now genefamilies, pathabundance, and pathcoverage.

I tried melonnpan predict, however I get an error:

metabolites ← melonnpan.predict(data_norm, weight.matrix = NULL)
Error in melonnpan.predict(data_norm, weight.matrix = NULL) :
No common IDs found between training and test data. Execution halted!

I am attaching my gene_families file below for your consideration. all_genefamilies.tsv (31.3 KB)

Rishi

himel.mallick · July 29, 2020, 10:22pm

Hi @rkamales - three things need to be modified before you run the analysis. First, the input data frame should be features (in columns) by samples (in rows). Second, the input data should be TSS-normalized to proportions, as mentioned in the tutorial. Finally, you may want to remove the UNMAPPED category which will not be used in the prediction process. Let me know if the above resolves the issue on your end!

rkamales · July 30, 2020, 2:26am

Thank you so much for the reply. I have done all and attached the new file below…normalized_data.tsv (40.8 KB) however I am still getting the same error:

> apply(data_norm, 1, sum)
    X0   X0.1   X0.2   X0.3   X0.4   X0.5   X0.6   X1.0   X0.7   X0.8   X0.9 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.10  X0.11  X0.12  X0.13   X0.0  X0.14  X0.15  X0.16  X0.17  X0.18  X0.19 
     1      1      1      1      1      1      1      1      1      1      1 
X1.0.1  X0.20  X0.21 X0.0.1  X0.22  X0.23  X0.24  X0.25  X0.26  X0.27  X0.28 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.29  X0.30  X0.31  X0.32  X0.33  X0.34  X0.35  X0.36   X2.0  X0.37 X1.0.2 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.38  X0.39  X0.40  X0.41  X0.42  X0.43  X0.44 X0.0.2 X2.0.1  X0.45  X0.46 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.47  X0.48  X0.49  X0.50  X0.51  X0.52  X0.53  X0.54  X0.55  X0.56  X0.57 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.58  X0.59 X1.0.3 X1.0.4  X0.60  X0.61  X0.62  X0.63  X0.64 X0.0.3  X0.65 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.66  X0.67  X0.68  X0.69  X0.70  X0.71  X0.72  X0.73  X0.74  X0.75  X0.76 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.77  X0.78  X0.79  X0.80  X0.81  X0.82  X0.83 X1.0.5  X0.84  X0.85  X0.86 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.87  X0.88  X0.89  X0.90 X0.0.4  X0.91  X0.92  X0.93  X0.94  X0.95  X0.96 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.97 X0.0.5 
     1      1 
> metabolites <- melonnpan.predict(data_norm, weight.matrix = NULL)
Error in melonnpan.predict(data_norm, weight.matrix = NULL) : 
  No common IDs found between training and test data. Execution halted!

himel.mallick · July 30, 2020, 5:26pm

Hi @rkamales - based on the error message, it looks like there are no common UniRef90 IDs between the pre-trained model and your data. Since we did some filtering to remove a fraction of UniRef90 IDs with low prevalence/abundance, this is not surprising. I am not sure if this is an after-effect of your own filtering (if you are doing any) or just a property of your dataset that it does not contain shared features with our pre-trained model. If former, you might want to play with the filtering threshold/criteria to see if that improves the number of shared features. If later, there is not much we can do unless you have paired training data (metabolites and gene families) and you are able to build your own model to get the predictions using melonnpan.train.

saif · September 14, 2020, 4:09pm

Hi @himel.mallick, I run melonnpan.train() and got three output files including a weight matrix. Then I tried to use this information to run melonnpan.predict(). I attach snapshots of my weight.matrix and metal () files here, which were taken after loading and seeing through the view() in R.

IMDM_MelonnPan_Trained_Weights:

my_metag:

My code looks like this:

melonnpan.predict(metag = my_metag, weight.matrix = IMDM_MelonnPan_Trained_Weights, output = getwd())

However, I get this error message,
**Error in melonnpan.predict(metag = my_metag, weight.matrix = IMDM_MelonnPan_Trained_Weights, : No common IDs found between training and test data. Execution halted!**

Could you please suggest me what is wrong with my code and how to run it correctly? Thank you.

himel.mallick · September 16, 2020, 1:52pm

Hi @saif, when using this function with a non-UniRef90 input and non-default weight matrix, you need to specify both weight.matrix and train.metag to get the predictions. Let me know if that resolves the error.

saif · September 16, 2020, 2:52pm

Hi @himel.mallick, I specified both train.metag and weight.matrix. My code look like this:
all(colnames(my_metagenome) %in% colnames(my_metag)) #TRUE
all(rownames(my_metagenome) %in% rownames(my_metag)) #TRUE
all(colnames(my_metagenome) %in% rownames(IMDM_MelonnPan_Trained_Weights)) #TRUE

running melonnpan.predict()

melonnpan.predict(metag = my_metagenome, weight.matrix = IMDM_MelonnPan_Trained_Weights, train.metag = my_metag, output = getwd())

I used your readTable function to load weight matrix file. However, I still see an error message like this-

Error in data.table::fread(Input, header = FALSE) : **
** input= must be a single character string containing a file name, a system command containing at least one space, a URL starting ‘http[s]://’, ‘ftp[s]://’ or ‘file://’, or, the input data itself containing at least one \n or \r

Could you please suggest how to solve this?

himel.mallick · September 16, 2020, 4:01pm

Hi @saif - I am afraid I need to reproduce the error on my end to see what’s going on. Would you be able to share the input tables for me to debug?

saif · September 16, 2020, 4:06pm

Hi @himel.mallick, surely I can share the input tables. How can I share it- email or somewhere here?

saif · September 16, 2020, 4:19pm

Hi @himel.mallick, Here is the link of input files I used:

Thank you

himel.mallick · September 16, 2020, 6:08pm

The link above did not work. Would you be able to attach a subset of the data that’s small in size and quick to run? Many thanks!

saif · September 17, 2020, 4:46am

Hi @himel.mallick, I uploaded smaller input files here.

I still see the same error message for these files. This is the code I run:
melonnpan.predict(metag = my_metagenome, weight.matrix = Weight, train.metag = my_metag,output = getwd())

If this link does not work, can you please share your email address or any link where I can upload files again?

Thank you.

himel.mallick · September 17, 2020, 2:13pm

Hi @saif - I think the above link is only for sharing within your organization - depending on the size of the input files, email may not be a feasible option either. Can you reshare using a non-business shared link (e.g. DropBox or Google Drive)? Sorry for any inconvenience!

saif · September 17, 2020, 4:43pm

Hi @himel.mallick, Here is a dropbox link of my data. Can you please see if it works?

himel.mallick · September 17, 2020, 7:22pm

Hi @saif - thanks for sharing the data. Using your files, I ran the following which worked fine on my end
melonnpan.predict(metag = "my_metagenome.csv", weight.matrix = "weights.csv", train.metag = "my_metag.csv", output = getwd()). Would you mind reporting me back after you re-run with the above command? Many thanks!

saif · September 18, 2020, 2:01am

Hi @himel.mallick , It worked now. Thank you very much.

I think the main issue was in the weight.matrix and train.metag arguments. I don’t know why specifying the file path within the R environment is not working for these two arguments. These two arguments only reading files from the working directory. However, metag argument can read the file path from both preloaded or path specified in the R environment as well as from the working directory.

specifing file path

my_metagenome <- readTable(“path_my_metagenome.csv”)
weight <- readTable(“path_weight.csv”)
my_metag <- readTable(“path_my_metag.csv”)

This one did not work:
melonnpan.predict(metag = my_metagenome, weight.matrix = weight, train.metag = my_metag, output = getwd())

These two work:
melonnpan.predict(metag = "my_metagenome.csv", weight.matrix = "weight.csv", train.metag = "my_metag.csv", output = getwd())
or,
melonnpan.predict(metag = my_metagenome, weight.matrix = "weight.csv", train.metag = "my_metag.csv", output = getwd())

I hope this will help others as well.

Thank you!

Topic		Replies	Views
Error while trying to use melonnpan for predicting metabolites from metatranscriptome data MelonnPan	1	135	March 21, 2024
Input for MelonnPan MelonnPan	1	773	April 7, 2021
Variability in melonnpan result MelonnPan	3	326	February 28, 2023
Melonnpan predict result interpretation MelonnPan	2	370	February 15, 2023
melonnPan questions MelonnPan	3	824	September 16, 2020

MelonnPan training -- no common IDS

running melonnpan.predict()

specifing file path

Related topics