The bioBakery help forum

MelonnPan training -- no common IDS

Hello,

I’m interested in running MelonnPan on my Humann3 output generated from microbiome experiments from mouse models, I’ve got three files now genefamilies, pathabundance, and pathcoverage.

I tried melonnpan predict, however I get an error:

metabolites <- melonnpan.predict(data_norm, weight.matrix = NULL)
Error in melonnpan.predict(data_norm, weight.matrix = NULL) :
No common IDs found between training and test data. Execution halted!

I am attaching my gene_families file below for your consideration. all_genefamilies.tsv (31.3 KB)

Rishi

Hi @rkamales - three things need to be modified before you run the analysis. First, the input data frame should be features (in columns) by samples (in rows). Second, the input data should be TSS-normalized to proportions, as mentioned in the tutorial. Finally, you may want to remove the UNMAPPED category which will not be used in the prediction process. Let me know if the above resolves the issue on your end!

Thank you so much for the reply. I have done all and attached the new file below…normalized_data.tsv (40.8 KB) however I am still getting the same error:

> apply(data_norm, 1, sum)
    X0   X0.1   X0.2   X0.3   X0.4   X0.5   X0.6   X1.0   X0.7   X0.8   X0.9 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.10  X0.11  X0.12  X0.13   X0.0  X0.14  X0.15  X0.16  X0.17  X0.18  X0.19 
     1      1      1      1      1      1      1      1      1      1      1 
X1.0.1  X0.20  X0.21 X0.0.1  X0.22  X0.23  X0.24  X0.25  X0.26  X0.27  X0.28 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.29  X0.30  X0.31  X0.32  X0.33  X0.34  X0.35  X0.36   X2.0  X0.37 X1.0.2 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.38  X0.39  X0.40  X0.41  X0.42  X0.43  X0.44 X0.0.2 X2.0.1  X0.45  X0.46 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.47  X0.48  X0.49  X0.50  X0.51  X0.52  X0.53  X0.54  X0.55  X0.56  X0.57 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.58  X0.59 X1.0.3 X1.0.4  X0.60  X0.61  X0.62  X0.63  X0.64 X0.0.3  X0.65 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.66  X0.67  X0.68  X0.69  X0.70  X0.71  X0.72  X0.73  X0.74  X0.75  X0.76 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.77  X0.78  X0.79  X0.80  X0.81  X0.82  X0.83 X1.0.5  X0.84  X0.85  X0.86 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.87  X0.88  X0.89  X0.90 X0.0.4  X0.91  X0.92  X0.93  X0.94  X0.95  X0.96 
     1      1      1      1      1      1      1      1      1      1      1 
 X0.97 X0.0.5 
     1      1 
> metabolites <- melonnpan.predict(data_norm, weight.matrix = NULL)
Error in melonnpan.predict(data_norm, weight.matrix = NULL) : 
  No common IDs found between training and test data. Execution halted!

Hi @rkamales - based on the error message, it looks like there are no common UniRef90 IDs between the pre-trained model and your data. Since we did some filtering to remove a fraction of UniRef90 IDs with low prevalence/abundance, this is not surprising. I am not sure if this is an after-effect of your own filtering (if you are doing any) or just a property of your dataset that it does not contain shared features with our pre-trained model. If former, you might want to play with the filtering threshold/criteria to see if that improves the number of shared features. If later, there is not much we can do unless you have paired training data (metabolites and gene families) and you are able to build your own model to get the predictions using melonnpan.train.