Missing values in metabolomics data


I am trying to understand what is the cause of the missing data points in the metabolomics data.

  • Is it because the value was under the detection threshold of the instrument ?
  • Is there other known reason ?

Here is a plot of the number of missing values for every metabolites in the 4 metabolomics data matrices.

The authors of the publication
Application of Artificial Intelligence Modeling Technology Based on Multi-Omics in Noninvasive Diagnosis of Inflammatory Bowel Disease (Huang et al. 2021)
decided to replace any missing values by min/2 of the feature. But they do not justify this choice.

A recent publication using the same dataset uses another processing strategy for missing metabolites abundance data : removing all the metabolites that contains missing data points.

Integration of multiview microbiome data for deciphering microbiome-metabolome-disease pathways. Fang et al., 2024

This choice is described in section 4.2 of the paper, page 17. It is not explicitly justified and there is no information on the reason of the existence of empty data points.
Therefore the data go from 81,868 metabolites to 143 (because they also restrict to the metabolites having a HMDB ID).

So I am still wondering what is the reason some data points are empty, since it would help choose an appropriate to process the metabolomics dataset.