Humann v3.8 misinterprets non-default analysis types, ie "-t rel_ab_w_read_stats", results from metaphlan 4.1

Humann (v3.8) exhibits different behaviors, specifically the pre-screening for creation of custom chocophlan database and read alignment thus also producing a taxonmic profile, when changing the metaphlan (v4.1) analysis type (via -t).

Likely related issues:

[no run info to work from, so a bit of an assumption]

[again no run info, but the suggestion of combining humann v3.8 and metaphlan 4.1 with the parameter “-t rel_ab_w_read_stats”]

Here was my process for a dataset from mammalian stool:

[installations via conda and databases prepared successfully this week/morning as closely following available docs and deviating from defaults minimally, noting that (1) I had to intentionally downgrade metaphlan from the default install of 4.2, and (2) humann v4 is nota available via conda and a fresh install of humann v3.9 was completely non-functional, hence v3.8]

humann --version

humann v3.8

metaphlan -version

MetaPhlAn version 4.1.1 (11 Mar 2024)

humann --input reads.fq.gz -o output --metaphlan-options “–bowtie2db /path/to/metaphlan_db --index mpa_vOct22_CHOCOPhlAnSGB_202212 -t rel_ab_w_read_stats --nproc 4”

Output files will be written to: output
Decompressing gzipped file …

Removing spaces from identifiers in input file …

Running metaphlan …

Total species selected from prescreen: 0

Selected species explain 0.00% of predicted community composition

No species were selected from the prescreen.
Because of this the custom ChocoPhlAn database is empty.
This will result in zero species-specific gene families and pathways.

Running diamond …

Aligning to reference database: uniref90_201901b_full.dmnd

[continues through rest of humann workflow successfully]

humann --input reads.fq.gz -o output2 --metaphlan-options “–bowtie2db /path/to/metaphlan_db --index mpa_vOct22_CHOCOPhlAnSGB_202212 --nproc 4”

output files will be written to: output2
Decompressing gzipped file …

Removing spaces from identifiers in input file …

Running metaphlan …

[list of taxa and read abundances appears]

Total species selected from prescreen: 45

Selected species explain 100.00% of predicted community composition

Creating custom ChocoPhlAn database …

Running bowtie2-build …

[continues through rest of humann workflow successfully]

metaphlan --bowtie2db –bowtie2db /path/to/metaphlan_db --index mpa_vOct22_CHOCOPhlAnSGB_202212 -t rel_ab_w_read_stats --input_type fastq reads.fq.gz output3.tsv

[produces expected metaphlan results and can be used by humann as a taxonomic profile, which gives the same “total species selected from prescreen: 0” result as above and continues to complete successfully]

metaphlan --bowtie2db –bowtie2db /path/to/metaphlan_db --index mpa_vOct22_CHOCOPhlAnSGB_202212 --input_type fastq reads.fq.gz output4.tsv

[produces expected metaphlan results and can be used by humann as a taxonomic profile, which performs prescreen and re-alignment to a newly generated bowtie database and continues to complete successfully]

The results look as expected given the differences in metaphlan’s integration into humann:

  • Pre-computed metaphlan results are the same regardless of analysis type (output3 vs output4), excluding of course the additional read stats (output4)
  • Metaphlan results are the same regardless of whether metaphlan was run on its own or managed by humann (output1 intermediate vs output3 and output2 intermediate vs output4).
  • Humann resulting genes/pathways have taxonomic stratification when the taxonomic information is available (output2 and humann’s processing of output4)
  • Humann results are identical regardless of whether metaphlan was managed directly by humann or precomputed (output1 vs output3 processed by humann and output2 vs output4 processed by humann).

Therefore in summary, while metaphlan itself behaves as expected, it seems that humann does not interpret the structure of the metphlan results generated with “-t rel_ab_w_read_stats" correctly. I have not tested other analysis types.

Let me know what else I might be able to do to help or if anything is unclear.

1 Like

I believe this is the expected behavior? HUMAnN 3 expects/requires the default MetaPhlAn output format (equivalent to -t rel_ab) whereas HUMAnN 4 expects/requires the alternate -t rel_ab_w_read_stats format.

If you want to interconvert between the MetaPhlAn formats (say, you’re running MetaPhlAn inside of HUMAnN 3 but want to look at -t rel_ab_w_read_stats-style output) you can always rerun MetaPhlAn outside of HUMAnN starting from its intermediate mapping output (instead of starting from raw reads) and specifying other analysis options and it will very quickly regenerate a modified abundance profile.

I understand.

Is this documented somewhere?
Here ( GitHub - biobakery/humann: HUMAnN is the next generation of HUMAnN 1.0 (HMP Unified Metabolic Analysis Network). ) it just says that -t rel_ab is the default, but not that certain metaphlan outputs are incompatible with specific versions of humann, ans even uses -t as an example of a metaphlan option to change.
If it was not, now it is in a way.

This is not something we’ve documented since we expect most people to be running MetaPhlAn within HUMAnN or outside of HUMAnN using default options. I stressed the change to -t rel_ab_w_read_stats in the HUMAnN 4 docs since we now depend on a non-default option for coverage estimation. Hence if someone had a workflow where they were running MetaPhlAn outside of HUMAnN (e.g. for paired MGX + MTX analysis) it would need to be updated.

I think what the -t note in the docs was trying to say was that if you wanted to tune the MetaPhlAn options you might need to start from the default call and then add to it (I agree that is not clear as written).

More generally, although we provide flags to tune the metaphlan, bowtie2, and diamond calls, it would be nigh impossible to ensure that HUMAnN would be robust to all such changes downstream. I personally only use these features to change the various tools’ mapping stringency settings (which doesn’t affect output formatting).

Gotcha, I understand.