MelonnPan training -- no common IDS

Hello,

I’m interested in running MelonnPan on my Humann3 output generated from microbiome experiments from mouse models, I’ve got three files now genefamilies, pathabundance, and pathcoverage.

I tried melonnpan predict, however I get an error:

metabolites ← melonnpan.predict(data_norm, weight.matrix = NULL)
Error in melonnpan.predict(data_norm, weight.matrix = NULL) :
No common IDs found between training and test data. Execution halted!

I am attaching my gene_families file below for your consideration. all_genefamilies.tsv (31.3 KB)

Rishi

Hi @rkamales - three things need to be modified before you run the analysis. First, the input data frame should be features (in columns) by samples (in rows). Second, the input data should be TSS-normalized to proportions, as mentioned in the tutorial. Finally, you may want to remove the UNMAPPED category which will not be used in the prediction process. Let me know if the above resolves the issue on your end!

Thank you so much for the reply. I have done all and attached the new file below…normalized_data.tsv (40.8 KB) however I am still getting the same error:

> apply(data_norm, 1, sum)
    X0   X0.1   X0.2   X0.3   X0.4   X0.5   X0.6   X1.0   X0.7   X0.8   X0.9 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.10  X0.11  X0.12  X0.13   X0.0  X0.14  X0.15  X0.16  X0.17  X0.18  X0.19 
     1      1      1      1      1      1      1      1      1      1      1 
X1.0.1  X0.20  X0.21 X0.0.1  X0.22  X0.23  X0.24  X0.25  X0.26  X0.27  X0.28 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.29  X0.30  X0.31  X0.32  X0.33  X0.34  X0.35  X0.36   X2.0  X0.37 X1.0.2 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.38  X0.39  X0.40  X0.41  X0.42  X0.43  X0.44 X0.0.2 X2.0.1  X0.45  X0.46 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.47  X0.48  X0.49  X0.50  X0.51  X0.52  X0.53  X0.54  X0.55  X0.56  X0.57 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.58  X0.59 X1.0.3 X1.0.4  X0.60  X0.61  X0.62  X0.63  X0.64 X0.0.3  X0.65 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.66  X0.67  X0.68  X0.69  X0.70  X0.71  X0.72  X0.73  X0.74  X0.75  X0.76 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.77  X0.78  X0.79  X0.80  X0.81  X0.82  X0.83 X1.0.5  X0.84  X0.85  X0.86 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.87  X0.88  X0.89  X0.90 X0.0.4  X0.91  X0.92  X0.93  X0.94  X0.95  X0.96 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.97 X0.0.5 
     1      1 
> metabolites <- melonnpan.predict(data_norm, weight.matrix = NULL)
Error in melonnpan.predict(data_norm, weight.matrix = NULL) : 
  No common IDs found between training and test data. Execution halted!

Hi @rkamales - based on the error message, it looks like there are no common UniRef90 IDs between the pre-trained model and your data. Since we did some filtering to remove a fraction of UniRef90 IDs with low prevalence/abundance, this is not surprising. I am not sure if this is an after-effect of your own filtering (if you are doing any) or just a property of your dataset that it does not contain shared features with our pre-trained model. If former, you might want to play with the filtering threshold/criteria to see if that improves the number of shared features. If later, there is not much we can do unless you have paired training data (metabolites and gene families) and you are able to build your own model to get the predictions using melonnpan.train.

Hi @himel.mallick, I run melonnpan.train() and got three output files including a weight matrix. Then I tried to use this information to run melonnpan.predict(). I attach snapshots of my weight.matrix and metal () files here, which were taken after loading and seeing through the view() in R.

IMDM_MelonnPan_Trained_Weights:

my_metag:

My code looks like this:

melonnpan.predict(metag = my_metag, weight.matrix = IMDM_MelonnPan_Trained_Weights, output = getwd())

However, I get this error message,
**Error in melonnpan.predict(metag = my_metag, weight.matrix = IMDM_MelonnPan_Trained_Weights, : No common IDs found between training and test data. Execution halted!**

Could you please suggest me what is wrong with my code and how to run it correctly? Thank you.

Hi @saif, when using this function with a non-UniRef90 input and non-default weight matrix, you need to specify both weight.matrix and train.metag to get the predictions. Let me know if that resolves the error.

Hi @himel.mallick, I specified both train.metag and weight.matrix. My code look like this:
all(colnames(my_metagenome) %in% colnames(my_metag)) #TRUE
all(rownames(my_metagenome) %in% rownames(my_metag)) #TRUE
all(colnames(my_metagenome) %in% rownames(IMDM_MelonnPan_Trained_Weights)) #TRUE

running melonnpan.predict()

melonnpan.predict(metag = my_metagenome, weight.matrix = IMDM_MelonnPan_Trained_Weights, train.metag = my_metag, output = getwd())

I used your readTable function to load weight matrix file. However, I still see an error message like this-

Error in data.table::fread(Input, header = FALSE) : **
** input= must be a single character string containing a file name, a system command containing at least one space, a URL starting ‘http[s]://’, ‘ftp[s]://’ or ‘file://’, or, the input data itself containing at least one \n or \r

Could you please suggest how to solve this?

Hi @saif - I am afraid I need to reproduce the error on my end to see what’s going on. Would you be able to share the input tables for me to debug?

Hi @himel.mallick, surely I can share the input tables. How can I share it- email or somewhere here?

Hi @himel.mallick, Here is the link of input files I used:

Thank you

The link above did not work. Would you be able to attach a subset of the data that’s small in size and quick to run? Many thanks!

Hi @himel.mallick, I uploaded smaller input files here.

I still see the same error message for these files. This is the code I run:
melonnpan.predict(metag = my_metagenome, weight.matrix = Weight, train.metag = my_metag,output = getwd())

If this link does not work, can you please share your email address or any link where I can upload files again?

Thank you.

Hi @saif - I think the above link is only for sharing within your organization - depending on the size of the input files, email may not be a feasible option either. Can you reshare using a non-business shared link (e.g. DropBox or Google Drive)? Sorry for any inconvenience!

Hi @himel.mallick, Here is a dropbox link of my data. Can you please see if it works?

Hi @saif - thanks for sharing the data. Using your files, I ran the following which worked fine on my end
melonnpan.predict(metag = "my_metagenome.csv", weight.matrix = "weights.csv", train.metag = "my_metag.csv", output = getwd()). Would you mind reporting me back after you re-run with the above command? Many thanks!

Hi @himel.mallick , It worked now. Thank you very much.

I think the main issue was in the weight.matrix and train.metag arguments. I don’t know why specifying the file path within the R environment is not working for these two arguments. These two arguments only reading files from the working directory. However, metag argument can read the file path from both preloaded or path specified in the R environment as well as from the working directory.

specifing file path

my_metagenome <- readTable(“path_my_metagenome.csv”)
weight <- readTable(“path_weight.csv”)
my_metag <- readTable(“path_my_metag.csv”)

This one did not work:
melonnpan.predict(metag = my_metagenome, weight.matrix = weight, train.metag = my_metag, output = getwd())

These two work:
melonnpan.predict(metag = "my_metagenome.csv", weight.matrix = "weight.csv", train.metag = "my_metag.csv", output = getwd())
or,
melonnpan.predict(metag = my_metagenome, weight.matrix = "weight.csv", train.metag = "my_metag.csv", output = getwd())

I hope this will help others as well.

Thank you!