Auto-updating database seems to break MetaPhlAn4

Hi – I’m running MetaPhlAn4 V4.04 as follows:

metaphlan “${read1}”,“${read2}” --input_type fastq -o “${samplename}”_metaphlan --nproc 12 --bowtie2db /databases/metaphlan4_20230119/ --bowtie2out “${samplename}”_bowtie2out

The database in the specific folder is the contents of: mpa_vOct22_CHOCOPhlAnSGB_202212_bt2

I’m running it on a cluster with sbatch, so I don’t have the ability to provide user-input real-time.

The goal is to reproduce prior work that used that database, however MetaPhlAn provides a Y/N question asking if I want to download a new db, breaking the script. For this use case, I do not want to use the most recent datdabase, because I am trying to match a previously created dataset.

I have attempted to add the --offline flag, but that yields this error:

“Database cannot be downloaded with the --offline option activated”

I would really like to just be able to align to the mpa_vOct22_CHOCOPhlAnSGB_202212_bt2 database and, ideally, turn off any kind of auto-updating behavior permanently. Can you advise what the error is here and how to move forward?

Hi @btt4001
how did you install the database? it seems it has not been properly installed at the beginning

I just ran into the same problem. I unfortunately cannot really reproduce the error. However, on the HPC, I ran this command for a few hundred samples in parallel using a snakemake pipeline:

conda activate metaphlan_4.0.4 ; metaphlan results/33B/07_metaphlan/bt2Files/33B.bt2out.txt   --input_type bowtie2out   --bowtie2db /data/databases/metaphlan4   --nproc 24   --add_viruses   -o results/33B/07_metaphlan/33B.profiled_metagenome.txt --offline ;

When I ran this command, I got the error:

Database cannot be downloaded with the --offline option activated, which is pretty confusing, since I wanted NOT to download the database and therefore added --offline. However, I then ran the code for a single sample in an interactive session and I got asked in the input prompt whether I want to update the database. I then declined. Afer that, I could run the pipeline as expected with --offline on our slurm cluster for ~100 samples in parallel.

There seems to be a bug with the --offline flag.

I am also having this issue on HPC. Has never been an issue previously though the last time I ran Metaphlan was in early march prior to the latest database release. I get the same errors as above with the database download prompt breaking the script and --offline flag giving the error “Database cannot be downloaded with the --offline option activated”. Using Metaphlan 4.0.6 and database mpa_vJun23_CHOCOPhlAnSGB_202307.

I didn’t find a proper fix for the issue. However, I found a hack to circumvent the check:

I just replaced the name of the database in the mpa_latest file from mpa_vJun23_CHOCOPhlAnSGB_202307 to mpa_vOct22_CHOCOPhlAnSGB_202212. Now I can run it on the HPC with an older version.