Describe the bug
MetaPhlAn totally ignores the installed Bowtie2 database and tries (and fails) to re-download one. This behavior has been duplicated (by me) using multiple databases, on both a cluster running Linux Alpine, and also with the official MetaPhlAn docker images.
To Reproduce
- Make a folder called
my-awesome-data
and in it, place corresponding R1 and R2 fastqz files, e.g.,AP0F_ST2_S451_L004_R1_001.trimmed.fastq.gz
andAP0F_ST2_S451_L004_R2_001.trimmed.fastq.gz
. - Install the docker image:
docker pull biobakery/metaphlan
- Run the docker image with the provided files, like so (replacing
/path/to/me
with the result ofpwd
):
sudo docker run -it --rm \
-v /path/to/me/my-awesome-data:/data \
biobakery/metaphlan metaphlan \
/data/AP0F_ST2_MetaAir_S451_L004_R1_001.trimmed.fastq.gz, \
/data/AP0F_ST2_MetaAir_S451_L004_R2_001.trimmed.fastq.gz \
--input_type fastq \
--bowtie2db /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103 \
--bowtie2out /data/bowtie-output \
--nproc 25 \
-o /data/arbitrary-output.txt
And observe as it totally ignores the installed databases in /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103
and instead tries and fails to reinstall them:
Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_latest
Downloading file of size: 0.00 MB
0.01 MB 25600.00 % 31.36 MB/sec 0 min -0 sec
Downloading MetaPhlAn database
Please note due to the size this might take a few minutes
Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212.tar
Downloading file of size: 2884.91 MB
2884.91 MB 100.00 % 8.55 MB/sec 0 min -0 sec
Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212.md5
Downloading file of size: 0.00 MB
0.01 MB 11702.86 % 45.57 MB/sec 0 min -0 sec
Decompressing /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103/mpa_vOct22_CHOCOPhlAnSGB_202212_VSG.fna.bz2 into /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103/mpa_vOct22_CHOCOPhlAnSGB_202212_VSG.fna
Decompressing /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103/mpa_vOct22_CHOCOPhlAnSGB_202212_SGB.fna.bz2 into /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103/mpa_vOct22_CHOCOPhlAnSGB_202212_SGB.fna
Joining FASTA databases
Building Bowtie2 indexes
Removing uncompressed databases
Download complete
No MetaPhlAn BowTie2 database found (--index option)!
Expecting location /usr/local/lib/python3.6/dist-packages/metaphlan/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103/mpa_vOct22_CHOCOPhlAnSGB_202212
Screenshots
Not applicable.
Platform (please complete the following information):
- Version is the latest image on Docker Hub. But I also replicated this behavior installing via conda and pip (I tried both!) on Alpine Linux. In the Docker image, I get:
(base) max@max-XPS-13-9310:~/projects/emily/get-phyloflash-to-work$ sudo docker run -it --rm biobakery/metaphlan metaphlan --version
MetaPhlAn version 4.0.2 (22 Sep 2022)
- Download source is Docker Hub but also I tried both pip and conda.
Additional context
Happy to provide additional context if needed, but seeing as how the error can be directly reproduced using the publicly available Docker image, I think this should be sufficient.