HUMAnN v3 joint taxonomic profile

There is a problem with the output format of humann_reduce_table. To reproduce,

$ metaphlan --version; humann --version
MetaPhlAn version 3.0 (20 Mar 2020)
humann v3.0.0.alpha.1

$ metaphlan --input_type fasta -t rel_ab --nproc 12 SRS014459-Stool.fasta.gz profiled_metagenome.tsv

$ humann_join_tables -i . --file_name profiled_metagenome.tsv -o joined_profiled_metagenome.tsv

$ humann_reduce_table -i joined_profiled_metagenome.tsv -o max_profiled_metagenome.tsv --function max --sort-by level

$ humann --input SRS014459-Stool.fasta.gz --output test.out --taxonomic-profile max_profiled_metagenome.tsv

ERROR: The MetaPhlAn2 taxonomic profile provided was not generated with the expected database version. Please update your version of MetaPhlAn2 to v3.0.

MetaPhlAn 3.0 output has an additional field, “additional_species”, so append tab to each line of max profile.

sed -i 's//\t/’ max_profiled_metagenome.tsv

$ humann --input SRS014459-Stool.fasta.gz --output test.out --taxonomic-profile max_profiled_metagenome.tsv

ERROR: The MetaPhlAn2 taxonomic profile provided was not generated with the database version v30 . Please update your version of MetaPhlAn2 to v3.0.

Prepend first line of profiled_metagenome.tsv to max_profiled_metagenome.tsv
#mpa_v30_CHOCOPhlAn_201901

Now it works!

2 Likes

Thanks for pointing this out - we built in the database version check to the main software now that there are multiple versions floating around, but these changes have not necessarily made it into the suite of utility scripts yet. Assuming your MetaPhlAn profiles were generated on v30 then this is a good short-term hack. :slight_smile:

This is so helpful comment for me!
I had same error
“ERROR: The MetaPhlAn2 taxonomic profile provided was not generated with the expected database version. Please update your version of MetaPhlAn2 to v3.0.”
during making custom pipeline with KEGG DB.
I fixed it with same method as you tried. (adding tab and 1st line in taxonomic profile document)
I think ‘custom taxonomic profile’ section in manual of Humann3 should be updated.
Thanks a lot for your help!

I’m having a similar issue, but I couldn’t fix it yet.
$ metaphlan --version; humann --version
MetaPhlAn version 3.0.13 (27 Jul 2021)
humann v3.0.0

The script “humann_join_tables” is working properly to join my gene tables but it’s failing to merge the “metaphlan_bugs_list.tsv” to create a joint taxonomic profile for multiple samples as suggested in the user manual.

$ humann_join_tables -i ./tax_profile/ -o joint.tsv
Traceback (most recent call last):
File “/home/eortiz/miniconda3/bin/humann_join_tables”, line 10, in
sys.exit(main())
File “/home/eortiz/miniconda3/lib/python3.8/site-packages/humann/tools/join_tables.py”, line 238, in main
join_gene_tables(gene_tables,args.output,verbose=args.verbose)
File “/home/eortiz/miniconda3/lib/python3.8/site-packages/humann/tools/join_tables.py”, line 98, in join_gene_tables
sorted_gene_list=util.fsort(list(gene_table_data))
File “/home/eortiz/miniconda3/lib/python3.8/site-packages/humann/tools/util.py”, line 403, in fsort
features = sorted( features, key=lambda f: c_topsort.get( fsplit( f )[0], default ) )
File “/home/eortiz/miniconda3/lib/python3.8/site-packages/humann/tools/util.py”, line 403, in
features = sorted( features, key=lambda f: c_topsort.get( fsplit( f )[0], default ) )
File “/home/eortiz/miniconda3/lib/python3.8/site-packages/humann/tools/util.py”, line 377, in fsplit
sys.exit( “LETHAL ERROR: bad feature name: {}”.format( f ) )
NameError: name ‘f’ is not defined

I’m working in a cluster, so I installed humann3 in my Ubuntu box and I’m getting the same error.

As an alternative, I merged all the taxonomic profiles in metaphlan3 with the “merge_metaphlan_tables.py” and got the max_taxonomic_profile.tsv with the humann_reduce_table script in humann3.

The max_taxonomic_profile.tsv looks like this:
#mpa_v30_CHOCOPhlAn_201901 max
clade_name 0
k__Archaea 2157.0
k__Bacteria 100.0
k__Eukaryota 2759.0
k__Archaea|p__Candidatus_Bathyarchaeota 0.36522
k__Archaea|p__Candidatus_Heimdallarchaeota 16.78566
k__Archaea|p__Candidatus_Marsarchaeota 13.69396

When I try to use this file with the --taxonomic-profile flag, I’m getting the following error:
“ERROR: The MetaPhlAn2 taxonomic profile provided was not generated with the expected database version. Please update your version of MetaPhlAn2 to v3.0.”

So, I have 2 questions:

  1. Could you please let me know if I’m missing something when I try to merge the tax profiles (bugs.tsv) with the humann_join_tables script?

  2. Where I can check the proper format for the ‘max_taxonomic_profile.tsv’? Since I have no idea what this table should look, not sure what is the right format. I tried the suggestions posted above but I’m still getting the same error. (The MetaPhlAn2 taxonomic profile provided was not generated with the expected database).

Any help is really welcome.

My apologies for the insistence, but I’m still getting the same error after trying a lot of different things. Any suggestion/help is really welcome.

The script “humann_join_tables” is working properly to join my gene tables but it’s failing to merge the “metaphlan_bugs_list.tsv” to create a joint taxonomic profile for multiple samples as suggested in the user manual.

$ humann_join_tables -i ./tax_profile/ -o joint.tsv
Traceback (most recent call last):
File “/home/eortiz/miniconda3/bin/humann_join_tables”, line 10, in
sys.exit(main())
File “/home/eortiz/miniconda3/lib/python3.8/site-packages/humann/tools/join_tables.py”, line 238, in main
join_gene_tables(gene_tables,args.output,verbose=args.verbose)
File “/home/eortiz/miniconda3/lib/python3.8/site-packages/humann/tools/join_tables.py”, line 98, in join_gene_tables
sorted_gene_list=util.fsort(list(gene_table_data))
File “/home/eortiz/miniconda3/lib/python3.8/site-packages/humann/tools/util.py”, line 403, in fsort
features = sorted( features, key=lambda f: c_topsort.get( fsplit( f )[0], default ) )
File “/home/eortiz/miniconda3/lib/python3.8/site-packages/humann/tools/util.py”, line 403, in
features = sorted( features, key=lambda f: c_topsort.get( fsplit( f )[0], default ) )
File “/home/eortiz/miniconda3/lib/python3.8/site-packages/humann/tools/util.py”, line 377, in fsplit
sys.exit( “LETHAL ERROR: bad feature name: {}”.format( f ) )
NameError: name ‘f’ is not defined

Hi Max, did you solve the problem you met?

I am actually faced with the same error like yours

There is a separate script bundled with MetaPhlAn for merging taxonomic profiles. The two scripts are looking for different properties of their respective parent methods’ output files. Sorry for the confusion!

Hi Eric, I used the MetaPhlAn to merge taxonomic profiles and then used humann3 to make the max_taxonomic_profile.

Thanks

Jun

Confirmed - glad that worked!

Hi! I found this other related post (Is Humann3 comparable with Metaphlan4.0.6? - #6 by strainliu) where you say that as max_taxonomic_profile is not widely used it has not been updated. In the post they refer to humann3.7 and metaphlan4.0.6 compatibility, I wonder if this has been updated for humann3.8? From what I’ve been seeing the biggest bottleneck is usually the indexing of the database with bowtie2 build, so this option is very interesting when I have a lot of samples to process.

On the other hand, I wanted to ask whether this option could be dangerous. For example, a decrease in the precision of the results stratified by species, i.e. if there is a higher risk of false positives (detecting species that are not in the sample) if the index is used based on the whole dataset instead of each of the samples?

Thanks in advance. Sam

I don’t believe we’ve prioritized any updates to this approach as it isn’t widely used (by us at least). And indeed - I agree with your concern that including pangenomes in a database that are not believed to be represented in a particular sample increases the risk of false positive mapping as well as decreasing per-sample computational efficiency. The HUMAnN philosophy is that it is better to start with a sample-specific database specifically because it helps to avoid these issues.