Humann3 => only unclassified results | but not with humann2

Dear Humann3 team,

I would like to run this pipeline ASaiM - Galaxy Community Hub called ASAIM MT but using metaphlan 4 and humann3. Unfortunately, at the end of the humann3 command, I only get unclassified results:

UniRef50_A0A3B9P6A7 17928.3447331564
UniRef50_A0A3B9P6A7 unclassified
UniRef50_W4V7T4 17254.9255236297
UniRef50_W4V7T4 unclassified
UniRef50_A3DBR3 10575.1057587066
UniRef50_A3DBR3 unclassified
UniRef50_A3DEF5: Fimbrial assembly family protein 371.1157718711
UniRef50_A3DEF5: Fimbrial assembly family protein unclassified

Instead of this type of results (obtained with Humann2 on the same data) with the species:

UniRef50_P62593: Beta-lactamase TEM 51842.43044
UniRef50_P62593: Beta-lactamase TEM g__Clostridium.s__Clostridium_thermocellum
UniRef50_R5FV61 46966.29231
UniRef50_R5FV61 g__Clostridium.s__Clostridium_thermocellum

I find it strange that the species appear in the temp data of humann3:

SRR6820516.37548293754829/1 1515__A3DEF5__AD2_01123|k__Bacteria.p__Firmicutes.c__Clostridia.o__Clostridiales.f__Hungateiclostridiaceae.g__Hungateiclostridium.s__Hungateiclostridium_thermocellum|UniRef90_A3DEF5|UniRef50_A3DEF5|552 96.68874172185431 151.0 0 150.0 404 553 0

Is this normal?
Maybe I have a problem with the specification of the mpa_vJan21 database?

For your reference, I ran the following commands:

metaphlan “$output_dir/cutadapt_interlacer/pairs.fq” --input_type fastq -o “$output_dir/metaphlan/results.dat” --no_map -t ‘rel_ab’ --tax_lev ‘a’ --min_cu_len ‘2000’ --min_alignment_len ‘0’ --stat_q ‘0.1’ -s “$output_dir/metaphlan/sam_output_file.dat” --biom “$output_dir/metaphlan/biom_output_file.dat” --index mpa_vJan21_CHOCOPhlAnSGB_202103

humann --input $output_dir/sortmerna_interlacer/pairs.fq --output $output_dir/humann --taxonomic-profile “$output_dir/metaphlan/results.dat” --metaphlan-options="–index mpa_vJan21_CHOCOPhlAnSGB_202103 -t rel_ab " --nucleotide-database db/chocophlan --translated-alignment ‘diamond’ --protein-database ‘db/uniref/uniref50’ --search-mode ‘uniref50’ --pathways ‘metacyc’ --annotation-gene-index 8 --evalue ‘1.0’ --memory-use minimum --prescreen-threshold ‘0.01’ --nucleotide-identity-threshold ‘50.0’ --translated-identity-threshold ‘50.0’ --translated-subject-coverage-threshold ‘50.0’ --translated-query-coverage-threshold ‘50.0’ --xipe ‘off’ --minpath ‘on’ --gap-fill ‘on’ --output-format ‘tsv’ --output-max-decimals ‘10’ --output-basename ‘humann’ --threads 4

And I use the sample data of the ASAIM MT training:
Metatranscriptomics analysis using microbiome RNA-seq data (short) available at Training Data for "Metatranscriptomics analysis using microbiome RNASeq data" | Zenodo

Thank you for your time and assistance.

Best regards,
Jérémy Tournayre

Hello,

I am adding an example that works with the sample data provided in Humann 3 tool, which also gives unexpected unclassified results:

#metaphlan Jan21_CHOCOPhlAnSGB
metaphlan test/humann-3.6.1/examples/demo.fastq.gz --input_type fastq -o test/test_JT_metaphlan_Jan21/profiled_metagenome.txt --index mpa_vJan21_CHOCOPhlAnSGB_202103 --bowtie2db mpa_vJan21_CHOCOPhlAnSGB_202103

#human with metaphlan
humann --input test/humann-3.6.1/examples/demo.fastq.gz --output test/test_JT_humann_wit_meta --taxonomic-profile “test/test_JT_metaphlan_Jan21/profiled_metagenome.txt” --metaphlan-options="–index mpa_vJan21_CHOCOPhlAnSGB_202103 -t rel_ab --bowtie2db mpa_vJan21_CHOCOPhlAnSGB_202103 " --nucleotide-database db/chocophlan --translated-alignment ‘diamond’ --protein-database ‘db/uniref/uniref50’ --search-mode ‘uniref50’ --pathways ‘metacyc’ --annotation-gene-index 8 --evalue ‘1.0’ --memory-use minimum --prescreen-threshold ‘0.01’ --nucleotide-identity-threshold ‘50.0’ --translated-identity-threshold ‘50.0’ --translated-subject-coverage-threshold ‘50.0’ --translated-query-coverage-threshold ‘50.0’ --xipe ‘off’ --minpath ‘on’ --gap-fill ‘on’ --output-format ‘tsv’ --output-max-decimals ‘10’ --output-basename ‘humann’ --threads 4

Here are the first rows of “humann_genefamilies.tsv” provided by the humman 3 command:

# Gene Family humann_Abundance-RPKs
UNMAPPED 18798.0000000000
UniRef50_A0A2X3K432 40.3731500652
UniRef50_A0A2X3K432 unclassified
UniRef50_Q45148: B.fragilis nimD gene and IS-1169 39.2006500255
UniRef50_Q45148: B.fragilis nimD gene and IS-1169 unclassified
UniRef50_A0A078RDY6 36.2950058072
UniRef50_A0A078RDY6 unclassified
UniRef50_C3RGR6: Transposase, IS4 family 25.1676941861
UniRef50_C3RGR6: Transposase, IS4 family unclassified

If you need more details, please let me know.

Best regards,
Jérémy Tournayre

Hello,
It’s the option “annotation-gene-index 8” which is causing an issue in my analysis, and removing it eliminates the “unclassified”. In Humann2, the default setting is set to 8, which specifies the column to use for retrieving the species information. However, in Humann3, this column has been shifted to the third position and this option by default is set to 3.

So, it’s solved :).

Best regards,
Jérémy Tournayre

Sorry for being slow to work through this thread. Glad you managed to figure out the issue and that things are presumably working now with MetaPhlAn 4 + HUMAnN 3. :slight_smile: