Additional species show up in HUMAnN output

Hi All,

I used Humann3.7 and Metaphlan4.0.6.

This is all from the humann logfile:
Running metaphlan …
Found g__Clostridium.s__Clostridium_butyricum : 1.40% of mapped reads ( g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_2_2_44A,g__Clostridioides.s__Clostridioides_difficile,g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_I46,g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_6_1_45,g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_21_3,g__Clostridia_unclassified.s__Clostridia_bacterium_UC5_1_2G4 )

Total bugs from nucleotide alignment: 41

g__Clostridioides.s__Clostridioides_difficile: 1435 hits

g__Clostridium.s__Clostridium_butyricum: 7 hits

I get so many more hits to C. difficile, why is the main species being listed as C. butyricum?

Essentially what’s happened is in my metaphlan data it’s listed as C. butyricum and no C. difficile (As the additional species get dropped when you merge the metaphlan table), but in the humann species plots I have C. difficile coming up. If this is truly C. difficile, abundance data would be great, but I’m kind of wary to just assume the C. butyricum abundance is the C. difficile abundance. Any suggestions?

What the log is telling you there is how HUMAnN is pairing off the SGB-based profiles from MetaPhlAn 4 with the older HUMAnN 3 pangenomes for compatibility. What the line you quoted is showing is that an SGB belonging to:


Contained genomes that were previously scattered across a number of different HUMAnN 3 pangenomes:

g__Clostridioides.s__Clostridioides_difficile g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_I46

Hence HUMAnN is including all of these pangenomes in its downstream search, and at least two of them (i.e. the ones you called out with 1435 and 7 hits) recruited reads, while the others apparently did not.

Hence any hits to those pangenomes would likely be more accurately assigned to g__Clostridium.s__Clostridium_butyricum based on our current understanding of species boundaries / taxonomy, but HUMAnN 3 doesn’t make this adjustment and instead continues to list the names of the pangenomes it actually aligned to.

The upcoming HUMAnN 4 will resolve this issue by directly mapping to SGB pangenomes.