Metaphlan V4.0.2 and Huma 3.6: MetaPhlAn taxonomic profile provided was not generated with the expected database

nickp60 · October 25, 2022, 11:56pm

Continuing the discussion from Metaphlan4 and the new version humann:

I am having difficulty with the new DBs, metaphlan 4, and human 3.6.

$ metaphlan --version
MetaPhlAn version 4.0.2 (22 Sep 2022)

$ humann --version
humann v3.6

I ran metaphlan successfully; here is part of the resulting file

head 1064K_IGO_11592_240_metaphlan3_profile.txt
#mpa_vJan21_CHOCOPhlAnSGB_202103
#/usr/local/bin/metaphlan kneaddata/1064K_IGO_11592_240_knead_cat.fastq.gz --bowtie2db /resources/biobakery_workflows_dbs/metaphlanV4 --index mpa_vJan21_CHOCOPhlAnSGB_202103 --input_type fastq --sample_id 1064K_IGO_11592_240 -s metaphlan/1064K_IGO_11592_240.sam.bz2 --add_viruses --unclassified_estimation --nproc 16 -t rel_ab_w_read_stats -o metaphlan/1064K_IGO_11592_240_metaphlan3_profile.txt
#45176646 reads processed
#SampleID       1064K_IGO_11592_240
#estimated_reads_mapped_to_known_clades:249400536
#clade_name     clade_taxid     relative_abundance      coverage        estimated_number_of_reads_from_the_clade
UNCLASSIFIED    -1      26.74675        -       0
k__Bacteria     2       73.2333 7.03189 249212104
k__Eukaryota    2759    0.01994 0.00191 188432
k__Bacteria|p__Firmicutes       2|1239  58.3959 5.60719 172487448

But when I try to run Humann, I get an error about needing results from metaphlan v3 or higher:

$ humann      \
   --input kneaddata/1064K_IGO_11592_240_knead_cat.fastq.gz          \
   --output humann     --output-basename 1064K_IGO_11592_240_humann3     \
   --o-log humann/1064K_IGO_11592_240_humann.log       \
    --search-mode uniref90     \
    --remove-column-description-output   \
   --protein-database /resources/biobakery_workflows_dbs/uniref90_201901b \
   --nucleotide-database /resources/biobakery_workflows_dbs/v31/choco_v201901_v31   \
   --taxonomic-profile metaphlan/1064K_IGO_11592_240_metaphlan3_profile.txt \
   --threads 4

Output files will be written to: /data/brinkvd/watersn/apps/humann
Decompressing gzipped file ...



ERROR: The MetaPhlAn taxonomic profile provided was not generated with the expected database version. Please update your version of MetaPhlAn to at least v3.0.

What is the source of the issue? It looks like this is the method the might raise the error, but because the actual exception is not raised don’t know what needs to change.

Larry · October 26, 2022, 5:18am

I get the same problem. I’ve tried indexing the metaphlan database by hand, but there is something in the script which identifies inconsistencies even when I use the older database V29 or V31 (the script downloads and installs metaphlan4 from what I’ve seen). I haven’t been able to install metaphlan3 to test this as I can’t find this version to download. Does anyone know of an archive of metaphlan3?

Larry · October 26, 2022, 5:23am

Conda and pip installs both have this problem. I’m pretty sure metaphlan isn’t updating when running:
humann -i demo.fastq -o sample_results

humann_databases --download chocophlan full
seems to download the 202103 version of chocophlan.
Using --index in metaphlan, it searches in the metaphlan4 directory which doesn’t contain the older 2019 versions. But downloading the older versions and manually indexing doesn’t seem to work either…

franzosa · October 26, 2022, 3:02pm

(EDIT: My explanation here is not correct; please see my subsequent reply below for the correct explanation.)

Hi All - It looks like MetaPhlAn 4.0.2 adds a new row to the header indicating the number of reads that were mapped:

#45176646 reads processed

This is throwing off the test in HUMAnN that checks for compatibility between your MetaPhlAn profile and pangenome database. We can patch this in HUMAnN. As a temp solution you could roll back your MetaPhlAn version OR remove the “reads processed” line from your profile and that ought to restore the expected behavior.

nickp60 · October 26, 2022, 3:03pm

Yeah, it weird. My metaphlan file should pass the checks the database version checking on line 157 here, but I don’t know for sure because for some reason it looks like it reads through and attempts to parse the entire file before actually raising a error about db version and exiting on line 213 here. So even if it detects an invalid version its still going to try to parse every line in the file before it would quit with the proper error The MetaPhlAn taxonomic profile provided was not generated with the database version <x> or <y>.

Also looks like there is a syntax error on line 215, where

                config.metaphlan_v3_db_version+" or "+metaphlan_v4_db_version+" . Please update your version of MetaPhlAn to at least v3.0."

should probably be

                config.metaphlan_v3_db_version+" or "+config.metaphlan_v4_db_version+" . Please update your version of MetaPhlAn to at least v3.0."

Otherwise you get the


Decompressing gzipped file ...

Traceback (most recent call last):
  File "/home/watersn/miniconda3/bin/humann", line 33, in <module>
    sys.exit(load_entry_point('humann', 'console_scripts', 'humann')())
  File "/lila/home/watersn/GitHub/humann/humann/humann.py", line 979, in main
    custom_database = prescreen.create_custom_database(config.nucleotide_database, bug_file)
  File "/lila/home/watersn/GitHub/humann/humann/search/prescreen.py", line 215, in create_custom_database
    config.metaphlan_v3_db_version+" or "+metaphlan_v4_db_version+" . Please update your version of MetaPhlAn to at least v3.0."
NameError: name 'metaphlan_v4_db_version' is not defined

nickp60 · October 26, 2022, 3:05pm

Thanks for the reply @franzosa . I am happy to help write a validator if that would be helpful. Then we could have some tests in the repo to ensure that the code handles all the versions of metaphlan versions should be supportable.

franzosa · October 26, 2022, 3:42pm

My explanation above is not correct (I will flag it after posting this). @lauren.j.mciver pointed out that your MetaPhlAn profile has extra output columns from running in a non-default mode. This is the source of the error (not the added read count). We discussed this recently here as well:

If the alternate MetaPhlAn output is becoming more popular with users we can add some robustness to the extra columns in a future HUMAnN release. Sorry for any confusion caused by my earlier reply!

nickp60 · October 26, 2022, 6:08pm

Thanks @franzosa, that’s only part of the issue. I selected the first three of the columns from the same file and ran again, but there is an issue with line endings breaking the isdigit().

if you change:

data=line.split("\t")
if data[-1].replace(".","").replace("e-","").isdigit():

to

data=line.strip().split("\t")
if data[-1].replace(".","").replace("e-","").isdigit():

it works fine. Not sure why it isn’t an issue with the original files.

For instance, changing the get_abundance() function to:

def get_abundance(line):                                                                                                                                                                                                                                                                                                                 
    """                                                                                                                                                                                                                                                                                                                                  
    Read in the abundance value from the taxonomy file                                                                                                                                                                                                                                                                                   
    """                                                                                                                                                                                                                                                                                                                                  
    try:                                                                                                                                                                                                                                                                                                                                 
        data=line.split("\t")                                                                                                                                                                                                                                                                                                            
        if data[-1].replace(".","").replace("e-","").isdigit():                                                                                                                                                                                                                                                                          
            read_percent=float(data[-1])                                                                                                                                                                                                                                                                                                 
        else:                                                                                                                                                                                                                                                                                                                            
            read_percent=float(data[-2])                                                                                                                                                                                                                                                                                                 
    except Exception as e:                                                                                                                                                                                                                                                                                                               
        print(repr(data[-1].replace(".","").replace("e-","")) )                                                                                                                                                                                                                                                                                
        print(e)                                                                                                                                                                                                                                                                                                                         
        print(repr(line))                                                                                                                                                                                                                                                                                                                
        message="The MetaPhlAn taxonomic profile provided was not generated with the expected database version. Please update your version of MetaPhlAn to at least v3.0."                                                                                                                                                               
        logger.error(message)                                                                                                                                                                                                                                                                                                            
        sys.exit("\n\nERROR: "+message)

I get

Output files will be written to: /lila/home/watersn/GitHub/humann/humann
Decompressing gzipped file ...

'3439319\n'

could not convert string to float: '2|1239|186801|186802|541000|946234|292800'
'k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Flavonifractor|s__Flavonifractor_plautii\t2|1239|186801|186802|541000|946234|292800\t34.39319\n'


ERROR: The MetaPhlAn taxonomic profile provided was not generated with the expected database version. Please update your version of MetaPhlAn to at least v3.0.

but adding the change above removes the error.

NOTE: this means the existing code for checking against the config.prescreen_threshold is broken
if you run the current version (f147b28ea9 ) on the test data in ./humann/tests/data/demo_metaphlan_bugs_list.tsv, the read_percent for all of those entries is reported as 1:
data[-1].replace(".","").replace("e-","").isdigit() is never true, so it falls back to float(data[-2]) , which is the taxid.

lauren.j.mciver · October 26, 2022, 8:05pm

Hi @nickp60 , If you run without the MetaPhlAn option -t rel_ab_w_read_stats you should be set! By default MetaPhlAn writes the relative abundances to the second to the last column.

Thank you,
Lauren

nickp60 · October 26, 2022, 8:51pm

If that’s the case than the test data and the documentation should reflect that — currently both show the relative abundances to be in the last column.

For my purposes I can do cut -f 1-4 [myfile] > myfile_cut.txt and run it on the trimmed file.

Either way, the bug I showed above should be probably be rectified. Under normal circumstances I would submit a PR with a fix; let me know if you would be receptive to that.

lauren.j.mciver · October 26, 2022, 9:17pm

Hi @nickp60 , Thanks for the info! It is possible we have some examples (tests and docs) that reflect an older version of MetaPhlAn. I will look into those that you noted and make sure to get them updated to the latest version of MetaPhlAn. I think the code itself is okay as I have confirmed it works with the latest MetaPhlAn version (default output).

Thanks!
Lauren

nickp60 · October 26, 2022, 9:42pm

No problem!

Please feel free to check:

       if data[-1].replace(".","").replace("e-","").isdigit():                                                                                                                                                                                                                                                                          
            read_percent=float(data[-1])

is never going to evaluate to True unless you strip the whitespace. If that functionality not needed due to the input requirements, it would clarify things to get rid of it. Also see my comments above: line 215 has a bug that will cause a name error instead of warning the user that the wrong DB is being used. You can verify it as follows:

# pretend we have metaphlan from an invalid db
curl https://raw.githubusercontent.com/biobakery/humann/master/humann/tests/data/demo_metaphlan_bugs_list.tsv | sed "s|#v30_CHOCOPhlAn|#v29_CHOCOPhlAn|" > demo_metaphlan_bad_bugs_list.tsv

$ humann --input $PWD/tmp.fastq.gz  --output humann  --taxonomic-profile demo_metaphlan_bad_bugs_list.tsv 

Output files will be written to: /lila/home/watersn/GitHub/humann/humann
Decompressing gzipped file ...

Traceback (most recent call last):
  File "/home/watersn/miniconda3/bin/humann", line 33, in <module>
    sys.exit(load_entry_point('humann', 'console_scripts', 'humann')())
  File "/lila/home/watersn/GitHub/humann/humann/humann.py", line 979, in main
    custom_database = prescreen.create_custom_database(config.nucleotide_database, bug_file)
  File "/lila/home/watersn/GitHub/humann/humann/search/prescreen.py", line 215, in create_custom_database
    config.metaphlan_v3_db_version+" or "+metaphlan_v4_db_version+" . Please update your version of MetaPhlAn to at least v3.0."
NameError: name 'metaphlan_v4_db_version' is not defined

See the note above as well that that only gets executed after the file has been (successfully) parsed – the user has no way of knowing whether the error is due to an invalid line in the bugs list or due to an incompatible database used to generate the bugs list.

We use biobakery tools a lot in our lab, and we appreciate all the work that has gone into them. I am more than happy to help write tests, refactor etc to keep these sorts of things from popping up.

Thanks!

Nick

RohanJ · February 5, 2023, 6:46pm

ohan) jagarlamudirohan@bio-bakery:~/RohanFiles/Healthy$ humann --input SRS1041031.denovo_duplicates_marked.trimmed.singleton.fastq --output /home/jagarlamudirohan/RohanFiles/HumannFiles --threads 14
Output files will be written to: /home/jagarlamudirohan/RohanFiles/HumannFiles

Running metaphlan …

Found g__Gemmiger.s__Gemmiger_formicilis : 16.12% of mapped reads
Found g__GGB9635.s__GGB9635_SGB15106 : 15.70% of mapped reads
Found g__Bacteroides.s__Bacteroides_uniformis : 15.04% of mapped reads
Found g__Bacteroides.s__Bacteroides_caccae : 10.32% of mapped reads
Found g__Akkermansia.s__Akkermansia_muciniphila : 6.86% of mapped reads
Found g__Alistipes.s__Alistipes_inops : 6.68% of mapped reads
Found g__Bacteroides.s__Bacteroides_cellulosilyticus : 5.87% of mapped reads
Found g__Phocaeicola.s__Phocaeicola_vulgatus : 4.65% of mapped reads
Found g__Ruminococcaceae_unclassified.s__Ruminococcaceae_bacterium : 3.25% of mapped reads
Found g__Paraprevotella.s__Paraprevotella_clara : 2.50% of mapped reads
Found g__Bifidobacterium.s__Bifidobacterium_adolescentis : 1.56% of mapped reads
Found g__GGB9758.s__GGB9758_SGB15368 : 1.50% of mapped reads
Found g__Barnesiella.s__Barnesiella_intestinihominis : 1.43% of mapped reads
Found g__Parabacteroides.s__Parabacteroides_merdae : 1.14% of mapped reads
Found g__Prevotella.s__Prevotella_copri_clade_A : 1.03% of mapped reads
Found g__Phascolarctobacterium.s__Phascolarctobacterium_faecium : 0.86% of mapped reads
Found g__Alistipes.s__Alistipes_communis : 0.83% of mapped reads
Found g__Alistipes.s__Alistipes_shahii : 0.71% of mapped reads
Found g__Roseburia.s__Roseburia_hominis : 0.61% of mapped reads
Found g__GGB9347.s__GGB9347_SGB14313 : 0.51% of mapped reads
Found g__Dialister.s__Dialister_invisus : 0.46% of mapped reads
Found g__Candidatus_Cibiobacter.s__Candidatus_Cibiobacter_qucibialis : 0.34% of mapped reads
Found g__GGB9712.s__GGB9712_SGB15244 : 0.27% of mapped reads
Found g__Escherichia.s__Escherichia_coli : 0.27% of mapped reads
Found g__Monoglobus.s__Monoglobus_pectinilyticus : 0.27% of mapped reads
Found g__Parabacteroides.s__Parabacteroides_distasonis : 0.22% of mapped reads
Found g__GGB3256.s__GGB3256_SGB4303 : 0.18% of mapped reads
Found g__Clostridia_unclassified.s__Clostridia_bacterium : 0.17% of mapped reads
Found g__Bacteroides.s__Bacteroides_thetaiotaomicron : 0.16% of mapped reads
Found g__Blautia.s__Blautia_massiliensis : 0.13% of mapped reads
Found g__Ruminococcaceae_unclassified.s__Eubacterium_siraeum : 0.11% of mapped reads
Found g__Bacteroides.s__Bacteroides_ovatus : 0.10% of mapped reads
Found g__Blautia.s__Blautia_wexlerae : 0.09% of mapped reads
Found g__Clostridia_unclassified.s__Clostridia_unclassified_SGB4373 : 0.05% of mapped reads

ERROR: The MetaPhlAn2 taxonomic profile provided was not generated with the database version v30 . Please update your version of MetaPhlAn2 to v3.0.
(rohan) jagarlamudirohan@bio-bakery:~/RohanFiles/Healthy$ metaphlan --version
MetaPhlAn version 4.0.4 (17 Jan 2023)
(rohan) jagarlamudirohan@bio-bakery:~/RohanFiles/Healthy$ python --version
Python 3.7.12
(rohan) jagarlamudirohan@bio-bakery:~/RohanFiles/Healthy$ humann --version
humann v3.0.1

(rohan) jagarlamudirohan@bio-bakery:~/RohanFiles/Healthy$ humann_config --print
HUMAnN Configuration ( Section : Name = Value )
database_folders : nucleotide = /home/jagarlamudirohan/DatabaseM/Choco/chocophlan
database_folders : protein = /home/jagarlamudirohan/DatabaseM/Uni50/uniref
database_folders : utility_mapping = /home/jagarlamudirohan/anaconda3/envs/rohan/lib/python3.7/site-packages/humann/data/misc
run_modes : resume = False
run_modes : verbose = False
run_modes : bypass_prescreen = False
run_modes : bypass_nucleotide_index = False
run_modes : bypass_nucleotide_search = False
run_modes : bypass_translated_search = False
run_modes : threads = 1
alignment_settings : evalue_threshold = 1.0
alignment_settings : prescreen_threshold = 0.01
alignment_settings : translated_subject_coverage_threshold = 50.0
alignment_settings : translated_query_coverage_threshold = 90.0
alignment_settings : nucleotide_subject_coverage_threshold = 50.0
alignment_settings : nucleotide_query_coverage_threshold = 90.0
output_format : output_max_decimals = 10
output_format : remove_stratified_output = False
output_format : remove_column_description_output = False

RohanJ · February 5, 2023, 6:50pm

Hello,
I get the above errors and looking into the config file humann is looking at the most recent databases i downloaded. I have metaphlan4 running just fine with mpa_vJan21_CHOCOPhlAnSGB database. To be safe i downloaded chocophlan and uniref again.
Thanks in advance to everyone on this forum for your valuable time.

franzosa · February 6, 2023, 7:47pm

In order to use HUMAnN with MetaPhlAn 4 you need to upgrade to HUMAnN v3.6. You are running v3.0.1, which expects a taxonomic profile from MetaPhlAn 3.

RohanJ · February 9, 2023, 4:14am

Upgrading to Humann 3.6 and updating config files still gave me Crtical error asking me to download the latest version of database. I reluctantly did and it worked just fine. Thank you so much for your help.

winatony · March 9, 2023, 12:25am

Hello, I installed humann on a workstation and installed the nucleotide database using the following command:

humann_databases --download chocophlan full /home/anth445/humann_db/chocophlan

And I have the following configuration settings:

humann --version
humann v3.6.1

metaphlan --version
MetaPhlAn version 4.0.6 (1 Mar 2023)

humann_config --print
HUMAnN Configuration ( Section : Name = Value )
database_folders : nucleotide = /home/anth445/humann_db/chocophlan
database_folders : protein = /home/anth445/humann_db/uniref
database_folders : utility_mapping = /home/anth445/anaconda3/envs/mamba/envs/humann/lib/python3.10/site-packages/humann/data/misc
run_modes : resume = False
run_modes : verbose = False
run_modes : bypass_prescreen = False
run_modes : bypass_nucleotide_index = False
run_modes : bypass_nucleotide_search = False
run_modes : bypass_translated_search = False
run_modes : threads = 1
alignment_settings : evalue_threshold = 1.0
alignment_settings : prescreen_threshold = 0.01
alignment_settings : translated_subject_coverage_threshold = 50.0
alignment_settings : translated_query_coverage_threshold = 90.0
alignment_settings : nucleotide_subject_coverage_threshold = 50.0
alignment_settings : nucleotide_query_coverage_threshold = 90.0
output_format : output_max_decimals = 10
output_format : remove_stratified_output = False
output_format : remove_column_description_output = False

When I try and run the following command:

humann -i dataset/merged_reads/LG_merged.fastq_fixed.gz -o dataset/merged_reads/humann2_output/ --threads 3

Humann fails and and I get the following log:

03/08/2023 02:05:25 PM - humann.search.prescreen - INFO: Running metaphlan ........
03/08/2023 02:05:25 PM - humann.utilities - DEBUG: Using software: /home/anth445/anaconda3/envs/mamba/envs/humann/bin/metaphlan
03/08/2023 02:05:25 PM - humann.utilities - INFO: Execute command: /home/anth445/anaconda3/envs/mamba/envs/humann/bin/metaphlan /home/anth445/dataset/merged_reads/humann2_output/1-E_merged_humann_temp/tmpujun740b/tmpc6tyd1bn -t rel_ab -o /home/anth445/dataset/merged_reads/humann2_output/1-E_merged_humann_temp/1-E_merged_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /home/anth445/dataset/merged_reads/humann2_output/1-E_merged_humann_temp/1-E_merged_metaphlan_bowtie2.txt --nproc 5
03/08/2023 03:34:03 PM - humann.utilities - DEBUG: b'WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.\nAn additional column listing the merged species is added to the MetaPhlAn output.\n'
03/08/2023 03:34:03 PM - humann.humann - INFO: TIMESTAMP: Completed     prescreen       :        5318    seconds
03/08/2023 03:34:03 PM - humann.search.prescreen - DEBUG: Taxon not in mapping file: k__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Sphingomonadales|f__Sphingomonadaceae|g__GGB73774|s__GGB73774_SGB56310|t__SGB56310 2|1224|28211|204457|41297|||    54.82873        

03/08/2023 03:34:03 PM - humann.search.prescreen - DEBUG: Taxon not in mapping file: k__Bacteria|p__Proteobacteria|c__CFGB76227|o__OFGB76227|f__FGB76227|g__GGB79622|s__GGB79622_SGB56399|t__SGB56399   2|1224||||||    1.53298 

03/08/2023 03:34:03 PM - humann.search.prescreen - ERROR: The MetaPhlAn taxonomic profile provided was not generated with the database version v3 or vJan21 . Please update your version of MetaPhlAn to at least v3.0 or if you are using MetaPhlAn v4 please use the database vJan21.

How do change the database to be vJan21? I tried to use the command:

metaphlan --install --index mpa_vJan21_CHOCOPhlAnSGB_202103 --bowtie2db <database folder>

which results in these files:

mpa_latest
mpa_vJan21_CHOCOPhlAnSGB_202103.1.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103.2.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103.3.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103.4.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103.pkl
mpa_vJan21_CHOCOPhlAnSGB_202103.rev.1.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103.rev.2.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103_VINFO.csv
mpa_vJan21_CHOCOPhlAnSGB_202103_VSG.fna
mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.md5
mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar

But when I ran humann with the metaphlan option:

humann -i dataset/merged_reads/LG_merged.fastq_fixed.gz -o dataset/merged_reads/humann2_output/ --threads 3 --metaphlan-options '--bowtie2db /home/anth445/chocophlan_update'

It again broke, and I got this log file:

03/08/2023 03:41:35 PM - humann.utilities - INFO: Execute command: /home/anth445/anaconda3/envs/mamba/envs/humann/bin/metaphlan /home/anth445/dataset/merged_reads/humann2_output/LG_merged_humann_temp/tmpfxks24v5/tmpabt3cvlz --bowtie2db /home/anth445/chocophlan_update -o /home/anth445/dataset/merged_reads/humann2_output/LG_merged_humann_temp/LG_merged_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /home/anth445/dataset/merged_reads/humann2_output/LG_merged_humann_temp/LG_merged_metaphlan_bowtie2.txt --nproc 3
03/08/2023 03:41:54 PM - humann.utilities - CRITICAL: Error executing: /home/anth445/anaconda3/envs/mamba/envs/humann/bin/metaphlan /home/anth445/dataset/merged_reads/humann2_output/LG_merged_humann_temp/tmpfxks24v5/tmpabt3cvlz --bowtie2db /home/anth445/chocophlan_update -o /home/anth445/dataset/merged_reads/humann2_output/LG_merged_humann_temp/LG_merged_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /home/anth445/dataset/merged_reads/humann2_output/LG_merged_humann_temp/LG_merged_metaphlan_bowtie2.txt --nproc 3

Error message returned from metaphlan :

Downloading MetaPhlAn database
Please note due to the size this might take a few minutes

\Downloading and uncompressing indexes

File /home/anth445/chocophlan_update/mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar already present!

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.md5
Downloading file of size: 0.00 MB
0.01 MB 11070.27 %  55.26 MB/sec  0 min -0 sec         ^MMD5 checksums do not correspond! If this happens again, you should remove the database files and rerun MetaPhlAn so they are re-downloaded

03/08/2023 03:41:54 PM - humann.utilities - CRITICAL: TRACEBACK: 
Traceback (most recent call last):
  File "/home/anth445/anaconda3/envs/mamba/envs/humann/lib/python3.10/site-packages/humann/utilities.py", line 761, in execute_command
    p_out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/home/anth445/anaconda3/envs/mamba/envs/humann/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/anth445/anaconda3/envs/mamba/envs/humann/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/home/anth445/anaconda3/envs/mamba/envs/humann/bin/metaphlan', '/home/anth445/dataset/merged_reads/humann2_output/LG_merged_humann_temp/tmpfxks24v5/tmpabt3cvlz', '--bowtie2db', '/home/anth445/chocophlan_update', '-o', '/home/anth445/dataset/merged_reads/humann2_output/LG_merged_humann_temp/LG_merged_metaphlan_bugs_list.tsv', '--input_type', 'fastq', '--bowtie2out', '/home/anth445/dataset/merged_reads/humann2_output/LG_merged_humann_temp/LG_merged_metaphlan_bowtie2.txt', '--nproc', '3']' returned non-zero exit status 1.

Could you please advise what I should do next?

Thank you!

Katharina_Kujala · March 9, 2023, 5:54am

Hei,
I have been trying for days to get humann3.6 to run, but it keeps giving me headaches. I could not get metaphlan to run inside humann, but I can generate a metaphlan output using metaphlan on its own. So I thought I might just feed the metaphlan output into humann with --taxonomic-profile and let it go from there, but I am then getting the same error as described in this thread:

Traceback (most recent call last):
File “/home/katharina/miniconda3/envs/humann3.6/bin/humann”, line 10, in
sys.exit(main())
File “/home/katharina/miniconda3/envs/humann3.6/lib/python3.7/site-packages/humann/humann.py”, line 979, in main
custom_database = prescreen.create_custom_database(config.nucleotide_database, bug_file)
File “/home/katharina/miniconda3/envs/humann3.6/lib/python3.7/site-packages/humann/search/prescreen.py”, line 215, in create_custom_database
config.metaphlan_v3_db_version+" or “+metaphlan_v4_db_version+” . Please update your version of MetaPhlAn to at least v3.0."
NameError: name ‘metaphlan_v4_db_version’ is not defined

My versions are:
humann --version
humann v3.6
metaphlan --version
MetaPhlAn version 4.0.0 (22 Aug 2022)

Do you have any idea what the problem might be and how to fix it?

Thanks!

Katharina_Kujala · March 10, 2023, 5:48am

Update:
I have since updated Metaphlan to version 4.0.6.
The humann pipeline can now run metaphlan, but it still exits with the same error message as above.
Metaphlan output has changed slightly (does now contain a line about the number of reads processed).

nbat64 · March 10, 2023, 4:09pm

Hello,
Have the same issue. I amussing a conda env with Humann 3.6.1 and Metaphlan 4.0.5

Does someone find a way to fix it? I have redownloaded the database to have the most recent (mpa_vOct22_CHOCOPhlAnSGB_202212) but without success.

thanks for the help

Topic		Replies	Views
Metaphlan(4.0.6) reported an error while running humann(3.6.1) HUMAnN	8	1296	February 9, 2024
Cannot run humann v3.7 using the latest Chocophlan database HUMAnN	17	1287	August 2, 2024
Humann with the latest version of metaphlan4 HUMAnN	5	544	May 2, 2024
CRITICAL: Can not find file sample_616_metaphlan_bugs_list.tsv HUMAnN	5	334	June 30, 2025
Humann database errors (a short novel) HUMAnN	10	385	May 20, 2025

Metaphlan V4.0.2 and Huma 3.6: MetaPhlAn taxonomic profile provided was not generated with the expected database

Related topics