Humann v3.8 misinterprets non-default analysis types, ie "-t rel_ab_w_read_stats", results from metaphlan 4.1

Humann (v3.8) exhibits different behaviors, specifically the pre-screening for creation of custom chocophlan database and read alignment thus also producing a taxonmic profile, when changing the metaphlan (v4.1) analysis type (via -t).

Likely related issues:

[no run info to work from, so a bit of an assumption]

[again no run info, but the suggestion of combining humann v3.8 and metaphlan 4.1 with the parameter “-t rel_ab_w_read_stats”]

Here was my process for a dataset from mammalian stool:

[installations via conda and databases prepared successfully this week/morning as closely following available docs and deviating from defaults minimally, noting that (1) I had to intentionally downgrade metaphlan from the default install of 4.2, and (2) humann v4 is nota available via conda and a fresh install of humann v3.9 was completely non-functional, hence v3.8]

humann --version

humann v3.8

metaphlan -version

MetaPhlAn version 4.1.1 (11 Mar 2024)

humann --input reads.fq.gz -o output --metaphlan-options “–bowtie2db /path/to/metaphlan_db --index mpa_vOct22_CHOCOPhlAnSGB_202212 -t rel_ab_w_read_stats --nproc 4”

Output files will be written to: output
Decompressing gzipped file …

Removing spaces from identifiers in input file …

Running metaphlan …

Total species selected from prescreen: 0

Selected species explain 0.00% of predicted community composition

No species were selected from the prescreen.
Because of this the custom ChocoPhlAn database is empty.
This will result in zero species-specific gene families and pathways.

Running diamond …

Aligning to reference database: uniref90_201901b_full.dmnd

[continues through rest of humann workflow successfully]

humann --input reads.fq.gz -o output2 --metaphlan-options “–bowtie2db /path/to/metaphlan_db --index mpa_vOct22_CHOCOPhlAnSGB_202212 --nproc 4”

output files will be written to: output2
Decompressing gzipped file …

Removing spaces from identifiers in input file …

Running metaphlan …

[list of taxa and read abundances appears]

Total species selected from prescreen: 45

Selected species explain 100.00% of predicted community composition

Creating custom ChocoPhlAn database …

Running bowtie2-build …

[continues through rest of humann workflow successfully]

metaphlan --bowtie2db –bowtie2db /path/to/metaphlan_db --index mpa_vOct22_CHOCOPhlAnSGB_202212 -t rel_ab_w_read_stats --input_type fastq reads.fq.gz output3.tsv

[produces expected metaphlan results and can be used by humann as a taxonomic profile, which gives the same “total species selected from prescreen: 0” result as above and continues to complete successfully]

metaphlan --bowtie2db –bowtie2db /path/to/metaphlan_db --index mpa_vOct22_CHOCOPhlAnSGB_202212 --input_type fastq reads.fq.gz output4.tsv

[produces expected metaphlan results and can be used by humann as a taxonomic profile, which performs prescreen and re-alignment to a newly generated bowtie database and continues to complete successfully]

The results look as expected given the differences in metaphlan’s integration into humann:

  • Pre-computed metaphlan results are the same regardless of analysis type (output3 vs output4), excluding of course the additional read stats (output4)
  • Metaphlan results are the same regardless of whether metaphlan was run on its own or managed by humann (output1 intermediate vs output3 and output2 intermediate vs output4).
  • Humann resulting genes/pathways have taxonomic stratification when the taxonomic information is available (output2 and humann’s processing of output4)
  • Humann results are identical regardless of whether metaphlan was managed directly by humann or precomputed (output1 vs output3 processed by humann and output2 vs output4 processed by humann).

Therefore in summary, while metaphlan itself behaves as expected, it seems that humann does not interpret the structure of the metphlan results generated with “-t rel_ab_w_read_stats" correctly. I have not tested other analysis types.

Let me know what else I might be able to do to help or if anything is unclear.