HUMAnN 3.9: Low percentage in “Selected species explain X% of predicted community composition”

Hi all,
We are performing a metagenomic analysis using HUMAnN v3.9 after pre-processing our FASTQ files with KneadData. We initially encountered issues related to the ChocoPhlAn and UniRef databases, but we were able to resolve them and now HUMAnN runs without errors—at least apparently.

However, we’ve noticed a recurring message in the output for each sample that reads:

“Selected species explain X% of predicted community composition”

In our case, the percentage values are quite low, with a mean around 11%, and never approaching 100%. We’re unsure if this is normal. Could someone help clarify:

  1. Is it normal to get such low percentages?
  2. What are the potential causes of this?
  3. Are there any recommended steps to improve these values or troubleshoot the issue?

Thanks in advance for your help!

Best regards,
Roberto

What kind of communities are you working with? That number IS low because “X% of predicted composition” is based on the species that were selected out of the species that were found by MetaPhlAn. A couple of possibilities: [1] If you’re using MetaPhlAn 4, it will find species that HUMAnN 3.x isn’t aware of (switching to HUMAnN 4 can help with that). [2] If you’re using MetaPhlAn’s “% unknown” estimation, that % might be counting in the “predicted compostion” referenced in the log message, which would tend to depress the number.

Thanks for your response! You’re correct—the issue was resolved by using HUMAnN 3.9 with MetaPhlAn 4, which fixed the low “predicted composition” warning.

We’re analyzing human gut microbiomes, and the mismatch was likely due to MetaPhlAn 4 detecting species not yet in HUMAnN 3.x’s databases. Updating HUMAnN (or switching to v4) helped align the taxonomy. We also checked MetaPhlAn’s “% unknown,” but the version adjustment was the key fix.

Your point about database compatibility was exactly the problem—appreciate the help!