Humann3 => only unclassified results | but not with humann2

Dear Humann3 team,

I would like to run this pipeline ASaiM - Galaxy Community Hub called ASAIM MT but using metaphlan 4 and humann3. Unfortunately, at the end of the humann3 command, I only get unclassified results:

UniRef50_A0A3B9P6A7 17928.3447331564
UniRef50_A0A3B9P6A7 unclassified
UniRef50_W4V7T4 17254.9255236297
UniRef50_W4V7T4 unclassified
UniRef50_A3DBR3 10575.1057587066
UniRef50_A3DBR3 unclassified
UniRef50_A3DEF5: Fimbrial assembly family protein 371.1157718711
UniRef50_A3DEF5: Fimbrial assembly family protein unclassified

Instead of this type of results (obtained with Humann2 on the same data) with the species:

UniRef50_P62593: Beta-lactamase TEM 51842.43044
UniRef50_P62593: Beta-lactamase TEM g__Clostridium.s__Clostridium_thermocellum
UniRef50_R5FV61 46966.29231
UniRef50_R5FV61 g__Clostridium.s__Clostridium_thermocellum

I find it strange that the species appear in the temp data of humann3:

SRR6820516.37548293754829/1 1515__A3DEF5__AD2_01123|k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Hungateiclostridiaceae.g__Hungateiclostridium.s__Hungateiclostridium_thermocellum|UniRef90_A3DEF5|UniRef50_A3DEF5|552 96.68874172185431 151.0 0 150.0 404 553 0

Is this normal?
Maybe I have a problem with the specification of the mpa_vJan21 database?

For your reference, I ran the following commands:

metaphlan “$output_dir/cutadapt_interlacer/pairs.fq” --input_type fastq -o “$output_dir/metaphlan/results.dat” --no_map -t ‘rel_ab’ --tax_lev ‘a’ --min_cu_len ‘2000’ --min_alignment_len ‘0’ --stat_q ‘0.1’ -s “$output_dir/metaphlan/sam_output_file.dat” --biom “$output_dir/metaphlan/biom_output_file.dat” --index mpa_vJan21_CHOCOPhlAnSGB_202103

humann --input $output_dir/sortmerna_interlacer/pairs.fq --output $output_dir/humann --taxonomic-profile “$output_dir/metaphlan/results.dat” --metaphlan-options="–index mpa_vJan21_CHOCOPhlAnSGB_202103 -t rel_ab " --nucleotide-database db/chocophlan --translated-alignment ‘diamond’ --protein-database ‘db/uniref/uniref50’ --search-mode ‘uniref50’ --pathways ‘metacyc’ --annotation-gene-index 8 --evalue ‘1.0’ --memory-use minimum --prescreen-threshold ‘0.01’ --nucleotide-identity-threshold ‘50.0’ --translated-identity-threshold ‘50.0’ --translated-subject-coverage-threshold ‘50.0’ --translated-query-coverage-threshold ‘50.0’ --xipe ‘off’ --minpath ‘on’ --gap-fill ‘on’ --output-format ‘tsv’ --output-max-decimals ‘10’ --output-basename ‘humann’ --threads 4

And I use the sample data of the ASAIM MT training:
Metatranscriptomics analysis using microbiome RNA-seq data (short) available at https://zenodo.org/record/4776250#.ZDAdrnZByUk

Thank you for your time and assistance.

Best regards,
Jérémy Tournayre

Hello,

I am adding an example that works with the sample data provided in Humann 3 tool, which also gives unexpected unclassified results:

#metaphlan Jan21_CHOCOPhlAnSGB
metaphlan test/humann-3.6.1/examples/demo.fastq.gz --input_type fastq -o test/test_JT_metaphlan_Jan21/profiled_metagenome.txt --index mpa_vJan21_CHOCOPhlAnSGB_202103 --bowtie2db mpa_vJan21_CHOCOPhlAnSGB_202103

#human with metaphlan
humann --input test/humann-3.6.1/examples/demo.fastq.gz --output test/test_JT_humann_wit_meta --taxonomic-profile “test/test_JT_metaphlan_Jan21/profiled_metagenome.txt” --metaphlan-options="–index mpa_vJan21_CHOCOPhlAnSGB_202103 -t rel_ab --bowtie2db mpa_vJan21_CHOCOPhlAnSGB_202103 " --nucleotide-database db/chocophlan --translated-alignment ‘diamond’ --protein-database ‘db/uniref/uniref50’ --search-mode ‘uniref50’ --pathways ‘metacyc’ --annotation-gene-index 8 --evalue ‘1.0’ --memory-use minimum --prescreen-threshold ‘0.01’ --nucleotide-identity-threshold ‘50.0’ --translated-identity-threshold ‘50.0’ --translated-subject-coverage-threshold ‘50.0’ --translated-query-coverage-threshold ‘50.0’ --xipe ‘off’ --minpath ‘on’ --gap-fill ‘on’ --output-format ‘tsv’ --output-max-decimals ‘10’ --output-basename ‘humann’ --threads 4

Here are the first rows of “humann_genefamilies.tsv” provided by the humman 3 command:

# Gene Family humann_Abundance-RPKs
UNMAPPED 18798.0000000000
UniRef50_A0A2X3K432 40.3731500652
UniRef50_A0A2X3K432 unclassified
UniRef50_Q45148: B.fragilis nimD gene and IS-1169 39.2006500255
UniRef50_Q45148: B.fragilis nimD gene and IS-1169 unclassified
UniRef50_A0A078RDY6 36.2950058072
UniRef50_A0A078RDY6 unclassified
UniRef50_C3RGR6: Transposase, IS4 family 25.1676941861
UniRef50_C3RGR6: Transposase, IS4 family unclassified

If you need more details, please let me know.

Best regards,
Jérémy Tournayre

Hello,
It’s the option “annotation-gene-index 8” which is causing an issue in my analysis, and removing it eliminates the “unclassified”. In Humann2, the default setting is set to 8, which specifies the column to use for retrieving the species information. However, in Humann3, this column has been shifted to the third position and this option by default is set to 3.

So, it’s solved :).

Best regards,
Jérémy Tournayre

Sorry for being slow to work through this thread. Glad you managed to figure out the issue and that things are presumably working now with MetaPhlAn 4 + HUMAnN 3. :slight_smile:

Dear @Jeremy_Tournayre I work for Galaxy Freiburg and updated this tutorial recently. Thanks for the information, we will provide an updated version using Humann 3.8 and MetaPhlAn 4.0.6 soonish → in October I guess.
I do have a follow-up question, which is also related to this tutorial. We found, that MetaPhlAn detects mainly Acetivibrio thermocellus; whereas the most abundant gene families found by Humann are associated to the species of Hungateiclostridium_thermocellum. Therefore, the Combine MetaPhlAn and HUMAnN outputs cannot merge the data correctly. It seems, that Acetivibrio thermocellus is the updated name of Hungateiclostridium_thermocellum, so we assume there is a discrepancy between the DBs used by Humann (uniref90_annotated_v201901b_full.tar.gz) and MetaPhlAn (mpa_vOct22_CHOCOPhlAnSGB_202212) … any idea how to solve it, maybe the uniref90 could be updated ?

Hi!
I’m new to using Humann and new to posting in the forum. I’m sorry if I miss including any information.
I am using a university server, running humann 3.7 on an assembled metagenome of a nematode microbiome.

Here is the code I run:
humann --input /workdir/eag252/cystmetagenome_all/S3_Midassembly/S3_midprok.fasta --output Test2_S3mid_Results --nucleotide-database /workdir/eag252/humann/chocophlan/ --protein-database /workdir/eag252/humann/uniref/ --metaphlan-options=“–bowtie2db /workdir/eag252/humann/chocophlan” --threads 30 --output-max-decimals 2

And the output:
Output files will be written to: /local/workdir/eag252/humann/Test2_S3mid_Results
Removing spaces from identifiers in input file …
Running metaphlan …
Total species selected from prescreen: 0
Selected species explain 0.00% of predicted community composition
No species were selected from the prescreen.
Because of this the custom ChocoPhlAn database is empty.
This will result in zero species-specific gene families and pathways.
Running diamond …
Aligning to reference database: uniref90_201901b_ec_filtered.dmnd
Total bugs after translated alignment: 1
unclassified: 17563 hits
Total gene families after translated alignment: 13432
Unaligned reads after translated alignment: 58.57 %
Computing gene families …
Computing pathways abundance and coverage …
Output files created:
/local/workdir/eag252/humann/Test2_S3mid_Results/S3_midprok_genefamilies.tsv
/local/workdir/eag252/humann/Test2_S3mid_Results/S3_midprok_pathabundance.tsv
/local/workdir/eag252/humann/Test2_S3mid_Results/S3_midprok_pathcoverage.tsv

The first few lines of the genefamilies.tsv:
|# Gene Family|S3_midprok_Abundance-RPKs|
|UNMAPPED|21192.00|
|UniRef90_A0A0D5XRU8|13.89|
|UniRef90_A0A0D5XRU8|unclassified|13.89|
|UniRef90_K9NH70|12.24|
|UniRef90_K9NH70|unclassified|12.24|
|UniRef90_A0A083Z9L3|11.46|

and like Jeremy said… the output is unclassified, but I also do not have any taxonomic information. Is there something I am missing? Is this because it isn’t a metagenome-assembled genome (MAG)?

I also don’t understand where I can find classified information. Assuming “unclassified” means no taxonomy and “classified” means there is a taxonomy label?

If I can provide more information please let me know! Thank you!
Emily

Hi Emily - HUMAnN is designed for functional profiling from unassembled metagenomes/metatranscriptomes. If you can try running with the unassembled reads as input you ought to get a more useful profile out!

1 Like