I’m interested in running MelonnPan on my Humann3 output generated from microbiome experiments from mouse models, I’ve got three files now genefamilies, pathabundance, and pathcoverage.
I tried melonnpan predict, however I get an error:
metabolites ← melonnpan.predict(data_norm, weight.matrix = NULL)
Error in melonnpan.predict(data_norm, weight.matrix = NULL) :
No common IDs found between training and test data. Execution halted!
I am attaching my gene_families file below for your consideration. all_genefamilies.tsv (31.3 KB)
Hi @rkamales - three things need to be modified before you run the analysis. First, the input data frame should be features (in columns) by samples (in rows). Second, the input data should be TSS-normalized to proportions, as mentioned in the tutorial. Finally, you may want to remove the UNMAPPED category which will not be used in the prediction process. Let me know if the above resolves the issue on your end!
Thank you so much for the reply. I have done all and attached the new file below…normalized_data.tsv (40.8 KB) however I am still getting the same error:
Hi @rkamales - based on the error message, it looks like there are no common UniRef90 IDs between the pre-trained model and your data. Since we did some filtering to remove a fraction of UniRef90 IDs with low prevalence/abundance, this is not surprising. I am not sure if this is an after-effect of your own filtering (if you are doing any) or just a property of your dataset that it does not contain shared features with our pre-trained model. If former, you might want to play with the filtering threshold/criteria to see if that improves the number of shared features. If later, there is not much we can do unless you have paired training data (metabolites and gene families) and you are able to build your own model to get the predictions using melonnpan.train.
Hi @himel.mallick, I run melonnpan.train() and got three output files including a weight matrix. Then I tried to use this information to run melonnpan.predict(). I attach snapshots of my weight.matrix and metal () files here, which were taken after loading and seeing through the view() in R.
However, I get this error message, **Error in melonnpan.predict(metag = my_metag, weight.matrix = IMDM_MelonnPan_Trained_Weights, : No common IDs found between training and test data. Execution halted!**
Could you please suggest me what is wrong with my code and how to run it correctly? Thank you.
Hi @saif, when using this function with a non-UniRef90 input and non-default weight matrix, you need to specify both weight.matrix and train.metag to get the predictions. Let me know if that resolves the error.
Hi @himel.mallick, I specified both train.metag and weight.matrix. My code look like this: all(colnames(my_metagenome) %in% colnames(my_metag))#TRUE all(rownames(my_metagenome) %in% rownames(my_metag))#TRUE all(colnames(my_metagenome) %in% rownames(IMDM_MelonnPan_Trained_Weights))#TRUE
I used your readTable function to load weight matrix file. However, I still see an error message like this-
Error in data.table::fread(Input, header = FALSE) : **
** input= must be a single character string containing a file name, a system command containing at least one space, a URL starting ‘http[s]://’, ‘ftp[s]://’ or ‘file://’, or, the input data itself containing at least one \n or \r
I still see the same error message for these files. This is the code I run:
melonnpan.predict(metag = my_metagenome, weight.matrix = Weight, train.metag = my_metag,output = getwd())
If this link does not work, can you please share your email address or any link where I can upload files again?
Hi @saif - I think the above link is only for sharing within your organization - depending on the size of the input files, email may not be a feasible option either. Can you reshare using a non-business shared link (e.g. DropBox or Google Drive)? Sorry for any inconvenience!
Hi @saif - thanks for sharing the data. Using your files, I ran the following which worked fine on my end melonnpan.predict(metag = "my_metagenome.csv", weight.matrix = "weights.csv", train.metag = "my_metag.csv", output = getwd()). Would you mind reporting me back after you re-run with the above command? Many thanks!
I think the main issue was in the weight.matrix and train.metag arguments. I don’t know why specifying the file path within the R environment is not working for these two arguments. These two arguments only reading files from the working directory. However, metag argument can read the file path from both preloaded or path specified in the R environment as well as from the working directory.