Hi there,
I am using the Humann3.0.0 docker image to run humann. I seem to be running into inconsistencies with databases.
I downloaded the full database with humman_databases:
Singularity> humann_databases
HUMANnN2 Databases ( database : build = location )
chocophlan : full = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/full_chocophlan.v296_201901b.tar.gz
chocophlan : DEMO = http://huttenhower.sph.harvard.edu/humann_data/chocophlan/DEMO_chocophlan.v296_201901b.tar.gz
uniref : uniref50_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref50_annotated_v201901b_full.tar.gz
uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_annotated_v201901b_full.tar.gz
uniref : uniref50_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref50_ec_filtered_201901b_subset.tar.gz
uniref : uniref90_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_201901b_subset.tar.gz
uniref : DEMO_diamond = http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_annotated/uniref90_DEMO_diamond_v201901b.tar.gz
utility_mapping : full = http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz
I downloaded the chocophlan full, uiref90_diamond and utility_mapping full databases and began to run humann using the following command:
humann \
--input ${fastq} \
--output . \
--threads ${params.cpus} \
--output-basename ${baseName} \
--verbose \
--nucleotide-database ${params.nucleotideDb} \
--protein-database ${params.proteinDb} \
--bowtie-options="--threads ${params.cpus}" \
--metaphlan-options="--bowtie2db ${params.nucleotideDb}"
I get the following error:
Downloading MetaPhlAn database
Please note due to the size this might take a few minutes
File /data/scratch/DCS/UDDU/CANBIO/lgallagher/ngs/4.resources/databases/microbiome//humann/humann_3.0.0//chocophlan/mpa_v30_CHOCOPhlAn_201901.tar already present!
File /data/scratch/DCS/UDDU/CANBIO/lgallagher/ngs/4.resources/databases/microbiome//humann/humann_3.0.0//chocophlan/mpa_v30_CHOCOPhlAn_201901.md5 already present!
MD5 checksums do not correspond! If this happens again, you should remove the database files and rerun MetaPhlAn so they are re-downloaded
No matter how many times I delete the database files and rerun the humann command.
I’ve read a few people on here recommend running metaphlan --install $DBLOCATION
which I tried. Running the same humann command I get the following error:
CRITICAL ERROR: The directory provided for ChocoPhlAn contains files ( mpa_v30_CHOCOPhlAn_201901.1.bt2 ) that are not of the expected version. Please install the latest version of the database: 201901b.
In the first step using humann_databases --download
I downloaded the full_chocophlan.v296_201901b.tar.gz database, so I’m not sure why it thinks I don’t have that version.
Even when adding --index mpa_v30_CHOCOPhlAn_201901
to the --metaphlan-options, I get a slightly different error, but is still asking me to download the version of the database I already have.
CRITICAL ERROR: The directory provided for ChocoPhlAn contains files ( mpa_v30_CHOCOPhlAn_201901.tar ) that are not of the expected version. Please install the latest version of the database: 201901b
It was my understanding that --index
is supposed to suppress the version checking function of metaphlan, so is this not working correctly?
The only way I’ve found to get humann working properly is to use the v3.0.0.a.4 Humann Docker container and install the v269_201901 databases. As far as I’ve tried, with all the different combinations of database versions and humann versions, I cannot get humann 3.0.0 to work.
Perhaps this is a docker specific issue? As I haven’t seen anyone facing the same problem as me on the forum. Granted, I haven’t tried running it via conda, but my workflow is specifically designed with Docker in mind and would be troublesome to move to conda.
Any assistance would be great!
Thanks,
Lewis
P.S
Also while I’m here, when you run humann_databases
why does the first line say HUMANnN2 Databases ( database : build = location )
when ism running humann3? It’s a little misleading. Also there’s a typo in there, there aren’t 3 N’s in the name humann.