Additional species show up in HUMAnN output

Hi All,

I used Humann3.7 and Metaphlan4.0.6.

This is all from the humann logfile:
Running metaphlan …
Found g__Clostridium.s__Clostridium_butyricum : 1.40% of mapped reads ( g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_2_2_44A,g__Clostridioides.s__Clostridioides_difficile,g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_I46,g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_6_1_45,g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_21_3,g__Clostridia_unclassified.s__Clostridia_bacterium_UC5_1_2G4 )

Total bugs from nucleotide alignment: 41

g__Clostridioides.s__Clostridioides_difficile: 1435 hits

g__Clostridium.s__Clostridium_butyricum: 7 hits

Questions:
I get so many more hits to C. difficile, why is the main species being listed as C. butyricum?

Essentially what’s happened is in my metaphlan data it’s listed as C. butyricum and no C. difficile (As the additional species get dropped when you merge the metaphlan table), but in the humann species plots I have C. difficile coming up. If this is truly C. difficile, abundance data would be great, but I’m kind of wary to just assume the C. butyricum abundance is the C. difficile abundance. Any suggestions?

What the log is telling you there is how HUMAnN is pairing off the SGB-based profiles from MetaPhlAn 4 with the older HUMAnN 3 pangenomes for compatibility. What the line you quoted is showing is that an SGB belonging to:

g__Clostridium.s__Clostridium_butyricum

Contained genomes that were previously scattered across a number of different HUMAnN 3 pangenomes:

g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_2_2_44A,
g__Clostridioides.s__Clostridioides_difficile g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_I46
g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_6_1_45
g__Erysipelotrichaceae_unclassified.s__Erysipelotrichaceae_bacterium_21_3
g__Clostridia_unclassified.s__Clostridia_bacterium_UC5_1_2G4

Hence HUMAnN is including all of these pangenomes in its downstream search, and at least two of them (i.e. the ones you called out with 1435 and 7 hits) recruited reads, while the others apparently did not.

Hence any hits to those pangenomes would likely be more accurately assigned to g__Clostridium.s__Clostridium_butyricum based on our current understanding of species boundaries / taxonomy, but HUMAnN 3 doesn’t make this adjustment and instead continues to list the names of the pangenomes it actually aligned to.

The upcoming HUMAnN 4 will resolve this issue by directly mapping to SGB pangenomes.

Thank you again for your assistance. Does Humann v3.9 solve this issue? If not, would you recommend running Metaphlan v3 alongside Humann v3 for optimal species assignments or is Metaphlan v4 with Human v3 best?

This behavior will be the same under HUMAnN 3.9, which has been updated to maintain compatibility with the latest MetaPhlAn (v4.1). If you want your taxonomic and functional profiles to match exactly - without having to reference the SGB to v3 pangenome mapping - then you could use MetaPhlAn 3 with HUMAnN 3. However, the coverage/resolution of MetaPhlAn 4 is considerably improved, such that most people here are using the MetaPhlAn 4 + HUMAnN 3 compatibility approach.

Thank you, that is very helpful.