The bioBakery help forum

Humann3.0.0 docker not accepting metaphlan database 201901b

Hi there,

I am using the Humann3.0.0 docker image to run humann. I seem to be running into inconsistencies with databases.

I downloaded the full database with humman_databases:

Singularity> humann_databases 
HUMANnN2 Databases ( database : build = location )
chocophlan : full = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v296_201901b.tar.gz
chocophlan : DEMO = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/DEMO_chocophlan.v296_201901b.tar.gz
uniref : uniref50_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref50_annotated_v201901b_full.tar.gz
uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz
uniref : uniref50_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref50_ec_filtered_201901b_subset.tar.gz
uniref : uniref90_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_201901b_subset.tar.gz
uniref : DEMO_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_DEMO_diamond_v201901b.tar.gz
utility_mapping : full = http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz

I downloaded the chocophlan full, uiref90_diamond and utility_mapping full databases and began to run humann using the following command:

humann \
         --input ${fastq} \
         --output . \
         --threads ${params.cpus} \
         --output-basename ${baseName} \
         --verbose \
         --nucleotide-database ${params.nucleotideDb} \
         --protein-database ${params.proteinDb} \
         --bowtie-options="--threads ${params.cpus}" \
         --metaphlan-options="--bowtie2db ${params.nucleotideDb}"

I get the following error:

Downloading MetaPhlAn database
  Please note due to the size this might take a few minutes
  
  File /data/scratch/DCS/UDDU/CANBIO/lgallagher/ngs/4.resources/databases/microbiome//humann/humann_3.0.0//chocophlan/mpa_v30_CHOCOPhlAn_201901.tar already present!
  
  File /data/scratch/DCS/UDDU/CANBIO/lgallagher/ngs/4.resources/databases/microbiome//humann/humann_3.0.0//chocophlan/mpa_v30_CHOCOPhlAn_201901.md5 already present!
  MD5 checksums do not correspond! If this happens again, you should remove the database files and rerun MetaPhlAn so they are re-downloaded

No matter how many times I delete the database files and rerun the humann command.

I’ve read a few people on here recommend running metaphlan --install $DBLOCATION which I tried. Running the same humann command I get the following error:

CRITICAL ERROR: The directory provided for ChocoPhlAn contains files ( mpa_v30_CHOCOPhlAn_201901.1.bt2 ) that are not of the expected version. Please install the latest version of the database: 201901b.  

In the first step using humann_databases --download I downloaded the full_chocophlan.v296_201901b.tar.gz database, so I’m not sure why it thinks I don’t have that version.

Even when adding --index mpa_v30_CHOCOPhlAn_201901 to the --metaphlan-options, I get a slightly different error, but is still asking me to download the version of the database I already have.

  CRITICAL ERROR: The directory provided for ChocoPhlAn contains files ( mpa_v30_CHOCOPhlAn_201901.tar ) that are not of the expected version. Please install the latest version of the database: 201901b

It was my understanding that --index is supposed to suppress the version checking function of metaphlan, so is this not working correctly?

The only way I’ve found to get humann working properly is to use the v3.0.0.a.4 Humann Docker container and install the v269_201901 databases. As far as I’ve tried, with all the different combinations of database versions and humann versions, I cannot get humann 3.0.0 to work.

Perhaps this is a docker specific issue? As I haven’t seen anyone facing the same problem as me on the forum. Granted, I haven’t tried running it via conda, but my workflow is specifically designed with Docker in mind and would be troublesome to move to conda.

Any assistance would be great!

Thanks,

Lewis

P.S

Also while I’m here, when you run humann_databases why does the first line say HUMANnN2 Databases ( database : build = location ) when ism running humann3? It’s a little misleading. Also there’s a typo in there, there aren’t 3 N’s in the name humann.

Same here. I installed simply by using pip install - nothing to do with containers. From the look of the file name, the database would be over two years old, a long time in genomics. I am also hoping for a 2021 update.