I would like to run this pipeline ASaiM - Galaxy Community Hub called ASAIM MT but using metaphlan 4 and humann3. Unfortunately, at the end of the humann3 command, I only get unclassified results:
—
—
UniRef50_A0A3B9P6A7
17928.3447331564
UniRef50_A0A3B9P6A7
unclassified
UniRef50_W4V7T4
17254.9255236297
UniRef50_W4V7T4
unclassified
UniRef50_A3DBR3
10575.1057587066
UniRef50_A3DBR3
unclassified
UniRef50_A3DEF5: Fimbrial assembly family protein
371.1157718711
UniRef50_A3DEF5: Fimbrial assembly family protein
unclassified
Instead of this type of results (obtained with Humann2 on the same data) with the species:
—
—
UniRef50_P62593: Beta-lactamase TEM
51842.43044
UniRef50_P62593: Beta-lactamase TEM
g__Clostridium.s__Clostridium_thermocellum
UniRef50_R5FV61
46966.29231
UniRef50_R5FV61
g__Clostridium.s__Clostridium_thermocellum
I find it strange that the species appear in the temp data of humann3:
Hello,
It’s the option “annotation-gene-index 8” which is causing an issue in my analysis, and removing it eliminates the “unclassified”. In Humann2, the default setting is set to 8, which specifies the column to use for retrieving the species information. However, in Humann3, this column has been shifted to the third position and this option by default is set to 3.
Sorry for being slow to work through this thread. Glad you managed to figure out the issue and that things are presumably working now with MetaPhlAn 4 + HUMAnN 3.
Dear @Jeremy_Tournayre I work for Galaxy Freiburg and updated this tutorial recently. Thanks for the information, we will provide an updated version using Humann 3.8 and MetaPhlAn 4.0.6 soonish → in October I guess.
I do have a follow-up question, which is also related to this tutorial. We found, that MetaPhlAn detects mainly Acetivibrio thermocellus; whereas the most abundant gene families found by Humann are associated to the species of Hungateiclostridium_thermocellum. Therefore, the Combine MetaPhlAn and HUMAnN outputs cannot merge the data correctly. It seems, that Acetivibrio thermocellus is the updated name of Hungateiclostridium_thermocellum, so we assume there is a discrepancy between the DBs used by Humann (uniref90_annotated_v201901b_full.tar.gz) and MetaPhlAn (mpa_vOct22_CHOCOPhlAnSGB_202212) … any idea how to solve it, maybe the uniref90 could be updated ?
Hi!
I’m new to using Humann and new to posting in the forum. I’m sorry if I miss including any information.
I am using a university server, running humann 3.7 on an assembled metagenome of a nematode microbiome.
Here is the code I run:
humann --input /workdir/eag252/cystmetagenome_all/S3_Midassembly/S3_midprok.fasta --output Test2_S3mid_Results --nucleotide-database /workdir/eag252/humann/chocophlan/ --protein-database /workdir/eag252/humann/uniref/ --metaphlan-options=“–bowtie2db /workdir/eag252/humann/chocophlan” --threads 30 --output-max-decimals 2
And the output:
Output files will be written to: /local/workdir/eag252/humann/Test2_S3mid_Results
Removing spaces from identifiers in input file …
Running metaphlan …
Total species selected from prescreen: 0
Selected species explain 0.00% of predicted community composition
No species were selected from the prescreen.
Because of this the custom ChocoPhlAn database is empty.
This will result in zero species-specific gene families and pathways.
Running diamond …
Aligning to reference database: uniref90_201901b_ec_filtered.dmnd
Total bugs after translated alignment: 1
unclassified: 17563 hits
Total gene families after translated alignment: 13432
Unaligned reads after translated alignment: 58.57 %
Computing gene families …
Computing pathways abundance and coverage …
Output files created:
/local/workdir/eag252/humann/Test2_S3mid_Results/S3_midprok_genefamilies.tsv
/local/workdir/eag252/humann/Test2_S3mid_Results/S3_midprok_pathabundance.tsv
/local/workdir/eag252/humann/Test2_S3mid_Results/S3_midprok_pathcoverage.tsv
The first few lines of the genefamilies.tsv:
|# Gene Family|S3_midprok_Abundance-RPKs|
|UNMAPPED|21192.00|
|UniRef90_A0A0D5XRU8|13.89|
|UniRef90_A0A0D5XRU8|unclassified|13.89|
|UniRef90_K9NH70|12.24|
|UniRef90_K9NH70|unclassified|12.24|
|UniRef90_A0A083Z9L3|11.46|
and like Jeremy said… the output is unclassified, but I also do not have any taxonomic information. Is there something I am missing? Is this because it isn’t a metagenome-assembled genome (MAG)?
I also don’t understand where I can find classified information. Assuming “unclassified” means no taxonomy and “classified” means there is a taxonomy label?
If I can provide more information please let me know! Thank you!
Emily
Hi Emily - HUMAnN is designed for functional profiling from unassembled metagenomes/metatranscriptomes. If you can try running with the unassembled reads as input you ought to get a more useful profile out!