Humann3 => only unclassified results | but not with humann2

Jeremy_Tournayre · April 7, 2023, 1:46pm

Dear Humann3 team,

I would like to run this pipeline ASaiM - Galaxy Community Hub called ASAIM MT but using metaphlan 4 and humann3. Unfortunately, at the end of the humann3 command, I only get unclassified results:

—	—
UniRef50_A0A3B9P6A7	17928.3447331564
UniRef50_A0A3B9P6A7	unclassified
UniRef50_W4V7T4	17254.9255236297
UniRef50_W4V7T4	unclassified
UniRef50_A3DBR3	10575.1057587066
UniRef50_A3DBR3	unclassified
UniRef50_A3DEF5: Fimbrial assembly family protein	371.1157718711
UniRef50_A3DEF5: Fimbrial assembly family protein	unclassified

Instead of this type of results (obtained with Humann2 on the same data) with the species:

—	—
UniRef50_P62593: Beta-lactamase TEM	51842.43044
UniRef50_P62593: Beta-lactamase TEM	g__Clostridium.s__Clostridium_thermocellum
UniRef50_R5FV61	46966.29231
UniRef50_R5FV61	g__Clostridium.s__Clostridium_thermocellum

I find it strange that the species appear in the temp data of humann3:

SRR6820516.37548293754829/1 1515__A3DEF5__AD2_01123|k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Hungateiclostridiaceae.g__Hungateiclostridium.s__Hungateiclostridium_thermocellum|UniRef90_A3DEF5|UniRef50_A3DEF5|552 96.68874172185431 151.0 0 150.0 404 553 0

Is this normal?
Maybe I have a problem with the specification of the mpa_vJan21 database?

For your reference, I ran the following commands:

metaphlan “$output_dir/cutadapt_interlacer/pairs.fq” --input_type fastq -o “$output_dir/metaphlan/results.dat” --no_map -t ‘rel_ab’ --tax_lev ‘a’ --min_cu_len ‘2000’ --min_alignment_len ‘0’ --stat_q ‘0.1’ -s “$output_dir/metaphlan/sam_output_file.dat” --biom “$output_dir/metaphlan/biom_output_file.dat” --index mpa_vJan21_CHOCOPhlAnSGB_202103

humann --input $output_dir/sortmerna_interlacer/pairs.fq --output $output_dir/humann --taxonomic-profile “$output_dir/metaphlan/results.dat” --metaphlan-options="–index mpa_vJan21_CHOCOPhlAnSGB_202103 -t rel_ab " --nucleotide-database db/chocophlan --translated-alignment ‘diamond’ --protein-database ‘db/uniref/uniref50’ --search-mode ‘uniref50’ --pathways ‘metacyc’ --annotation-gene-index 8 --evalue ‘1.0’ --memory-use minimum --prescreen-threshold ‘0.01’ --nucleotide-identity-threshold ‘50.0’ --translated-identity-threshold ‘50.0’ --translated-subject-coverage-threshold ‘50.0’ --translated-query-coverage-threshold ‘50.0’ --xipe ‘off’ --minpath ‘on’ --gap-fill ‘on’ --output-format ‘tsv’ --output-max-decimals ‘10’ --output-basename ‘humann’ --threads 4

And I use the sample data of the ASAIM MT training:
Metatranscriptomics analysis using microbiome RNA-seq data (short) available at https://zenodo.org/record/4776250#.ZDAdrnZByUk

Thank you for your time and assistance.

Best regards,
Jérémy Tournayre

Jeremy_Tournayre · April 14, 2023, 7:39am

Hello,

I am adding an example that works with the sample data provided in Humann 3 tool, which also gives unexpected unclassified results:

#metaphlan Jan21_CHOCOPhlAnSGB
metaphlan test/humann-3.6.1/examples/demo.fastq.gz --input_type fastq -o test/test_JT_metaphlan_Jan21/profiled_metagenome.txt --index mpa_vJan21_CHOCOPhlAnSGB_202103 --bowtie2db mpa_vJan21_CHOCOPhlAnSGB_202103

#human with metaphlan
humann --input test/humann-3.6.1/examples/demo.fastq.gz --output test/test_JT_humann_wit_meta --taxonomic-profile “test/test_JT_metaphlan_Jan21/profiled_metagenome.txt” --metaphlan-options="–index mpa_vJan21_CHOCOPhlAnSGB_202103 -t rel_ab --bowtie2db mpa_vJan21_CHOCOPhlAnSGB_202103 " --nucleotide-database db/chocophlan --translated-alignment ‘diamond’ --protein-database ‘db/uniref/uniref50’ --search-mode ‘uniref50’ --pathways ‘metacyc’ --annotation-gene-index 8 --evalue ‘1.0’ --memory-use minimum --prescreen-threshold ‘0.01’ --nucleotide-identity-threshold ‘50.0’ --translated-identity-threshold ‘50.0’ --translated-subject-coverage-threshold ‘50.0’ --translated-query-coverage-threshold ‘50.0’ --xipe ‘off’ --minpath ‘on’ --gap-fill ‘on’ --output-format ‘tsv’ --output-max-decimals ‘10’ --output-basename ‘humann’ --threads 4

Here are the first rows of “humann_genefamilies.tsv” provided by the humman 3 command:

# Gene Family	humann_Abundance-RPKs
UNMAPPED	18798.0000000000
UniRef50_A0A2X3K432	40.3731500652
UniRef50_A0A2X3K432	unclassified
UniRef50_Q45148: B.fragilis nimD gene and IS-1169	39.2006500255
UniRef50_Q45148: B.fragilis nimD gene and IS-1169	unclassified
UniRef50_A0A078RDY6	36.2950058072
UniRef50_A0A078RDY6	unclassified
UniRef50_C3RGR6: Transposase, IS4 family	25.1676941861
UniRef50_C3RGR6: Transposase, IS4 family	unclassified

If you need more details, please let me know.

Best regards,
Jérémy Tournayre

Jeremy_Tournayre · April 25, 2023, 9:05am

Hello,
It’s the option “annotation-gene-index 8” which is causing an issue in my analysis, and removing it eliminates the “unclassified”. In Humann2, the default setting is set to 8, which specifies the column to use for retrieving the species information. However, in Humann3, this column has been shifted to the third position and this option by default is set to 3.

So, it’s solved :).

Best regards,
Jérémy Tournayre

franzosa · May 4, 2023, 5:57pm

Sorry for being slow to work through this thread. Glad you managed to figure out the issue and that things are presumably working now with MetaPhlAn 4 + HUMAnN 3.

Paul_Zierep · August 24, 2023, 6:57am

Dear @Jeremy_Tournayre I work for Galaxy Freiburg and updated this tutorial recently. Thanks for the information, we will provide an updated version using Humann 3.8 and MetaPhlAn 4.0.6 soonish → in October I guess.
I do have a follow-up question, which is also related to this tutorial. We found, that MetaPhlAn detects mainly Acetivibrio thermocellus; whereas the most abundant gene families found by Humann are associated to the species of Hungateiclostridium_thermocellum. Therefore, the Combine MetaPhlAn and HUMAnN outputs cannot merge the data correctly. It seems, that Acetivibrio thermocellus is the updated name of Hungateiclostridium_thermocellum, so we assume there is a discrepancy between the DBs used by Humann (uniref90_annotated_v201901b_full.tar.gz) and MetaPhlAn (mpa_vOct22_CHOCOPhlAnSGB_202212) … any idea how to solve it, maybe the uniref90 could be updated ?

eag252 · September 12, 2023, 1:29pm

Hi!
I’m new to using Humann and new to posting in the forum. I’m sorry if I miss including any information.
I am using a university server, running humann 3.7 on an assembled metagenome of a nematode microbiome.

Here is the code I run:
humann --input /workdir/eag252/cystmetagenome_all/S3_Midassembly/S3_midprok.fasta --output Test2_S3mid_Results --nucleotide-database /workdir/eag252/humann/chocophlan/ --protein-database /workdir/eag252/humann/uniref/ --metaphlan-options=“–bowtie2db /workdir/eag252/humann/chocophlan” --threads 30 --output-max-decimals 2

And the output:
Output files will be written to: /local/workdir/eag252/humann/Test2_S3mid_Results
Removing spaces from identifiers in input file …
Running metaphlan …
Total species selected from prescreen: 0
Selected species explain 0.00% of predicted community composition
No species were selected from the prescreen.
Because of this the custom ChocoPhlAn database is empty.
This will result in zero species-specific gene families and pathways.
Running diamond …
Aligning to reference database: uniref90_201901b_ec_filtered.dmnd
Total bugs after translated alignment: 1
unclassified: 17563 hits
Total gene families after translated alignment: 13432
Unaligned reads after translated alignment: 58.57 %
Computing gene families …
Computing pathways abundance and coverage …
Output files created:
/local/workdir/eag252/humann/Test2_S3mid_Results/S3_midprok_genefamilies.tsv
/local/workdir/eag252/humann/Test2_S3mid_Results/S3_midprok_pathabundance.tsv
/local/workdir/eag252/humann/Test2_S3mid_Results/S3_midprok_pathcoverage.tsv

and like Jeremy said… the output is unclassified, but I also do not have any taxonomic information. Is there something I am missing? Is this because it isn’t a metagenome-assembled genome (MAG)?

I also don’t understand where I can find classified information. Assuming “unclassified” means no taxonomy and “classified” means there is a taxonomy label?

If I can provide more information please let me know! Thank you!
Emily

franzosa · September 21, 2023, 10:19pm

Hi Emily - HUMAnN is designed for functional profiling from unassembled metagenomes/metatranscriptomes. If you can try running with the unassembled reads as input you ought to get a more useful profile out!

Topic		Replies	Views
Humann2 failing after temp files produced HUMAnN	59	3573	December 16, 2020
Discrepancy between metaphlan3 community profile and humann3 gene families HUMAnN	3	928	November 9, 2021
Metaphlan results as input on Humann HUMAnN	1	1051	March 25, 2022
Humann 3.0 and bowtie metaphlan output HUMAnN	5	3773	March 11, 2021
Discrepancy between community profile and gene families output HUMAnN	6	846	July 17, 2021

Humann3 => only unclassified results | but not with humann2

Related topics