'update your version of MetaPhlAn2 to v3.0' error while running HUMAnN 3 in a HPC

Dear all,

I am using HUMAnN 3 in a cluster since there are many metagenomes in this dataset and we think there is no way to run it locally. (Using humann v3.0.0.alpha.4, MetaPhlAn version 3.0 at that cluster)

Using this code

humann --input ./IN/SRR88998813_paired_1.fastq --output ./OUT/SRR88998813 --input-format fastq --nucleotide-database ../DB/chocophlan/ --protein-database ../DB/uniref --threads 20 --metaphlan-options " --bowtie2db ../metaphlan/database296/ --index mpa_v296_CHOCOPhlAn_201901"

I had this error. I put those metaphlan-options to fix a previous error.

ERROR: The MetaPhlAn2 taxonomic profile provided was not generated with the database version v30 .  Please update your version of MetaPhlAn2 to v3.0.

I had to diagnose this backwards so after a failed attempt in a run with HUMAnN I had to check again which version of the databases they are using. I ended trying to match the HUMAnN database with the one metaphlan tries to use when it is called by the HUMAnN process.

HUMANnN2 Databases ( database : build = location )
chocophlan : full = http://huttenhower.sph.harvard.edu/humann2_data/chocophlan/full_chocophlan.v296_201901.tar.gz
chocophlan : DEMO = http://huttenhower.sph.harvard.edu/humann2_data/chocophlan/DEMO_chocophlan.v296_201901.tar.gz
uniref : uniref50_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_annotated/uniref50_annotated_v201901.tar.gz
uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_annotated/uniref90_annotated_v201901.tar.gz
uniref : uniref50_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_ec_filtered/uniref50_ec_filtered_201901.tar.gz
uniref : uniref90_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_201901.tar.gz
uniref : DEMO_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_annotated/uniref90_DEMO_diamond_v201901.tar.gz
utility_mapping : full = http://huttenhower.sph.harvard.edu/humann2_data/full_mapping_v201901.tar.gz

I already downloaded the chocophlan and uniref databases for HUMAnN in a custom folder.
I downloaded again v296_201901 for metaphlan.
As suggested in other previous issues, I used bowtie2-build and also directly used the option --install trying to solve this.

Alternatively I tried to directly run metaphlan (in another folder) to use the sam files as input later

metaphlan /IN/SRR88998813_paired_1.fastq --input_type fastq -s samsII/SRR88998813.sam --bowtie2out bowtieII/SRR88998813.bowtie2.bz2 -o profilesII/SRR88998813_profiled.tsv --unknown_estimation --add_viruses  --bowtie2db ../database296/   -x mpa_v296_CHOCOPhlAn_201901 --nproc 30

And then tried

humann --input ./IN/SRR88998813.sam --output ./OUT/SRR88998813 --input-format genetable --nucleotide-database ../DB/chocophlan/ --protein-database ../DB/uniref --threads 20 --metaphlan-options " --bowtie2db /lustre/scratch118/infgen/team162/gi1/lab/nadia/cancer/metaphlan/database296/ --index mpa_v296_CHOCOPhlAn_201901"

But I got

# Gene Family   SRR13068813_Abundance-RPKs
UNMAPPED        244590.0000000000

I searched for previous issues and this seems to come from an unmatch between databases.

Is there a way to match databases to run everything accordingly? Which databases do I need to match for this analysis? Within the restrictions of the HPC I can try to download all the databases again in case that helps.
I can either run metaphan first or just learn how to use directly HUMAnN and make it pick the correct database.

I am attaching the logs from trying either HUMAnN or MetaPhlAn in this cluster

trySAM.txt (5.2 KB)
tryHUMAnN.txt (16.0 KB)

Thanks for your help!

There are a couple of things going on here. In the last example (starting from the SAM file), you are specifying that you’re starting from a gene table. A gene table is a TSV file like the one HUMAnN outputs for UniRef abundances, so I think the SAM/TSV formatting mismatch is what’s causing the weird output there.

For the rest of the comment, you shouldn’t need to be specifying all those database paths if MetaPhlAn and HUMAnN are properly configured - the HUMAnN call can be as simple as humann -i sample123.fastq -o sample123_folder. It’s possible that by specifying the paths manually you are combining MetaPhlAn and HUMAnN databases that are not compatible (consistent with the first ERROR message you provided).

1 Like

Thanks for your reply! I solved this!

If I passed only the fastq like this I got error of the database match.

humann -i SRR88998813.fastq -o ./OUTIE/SRR88998813 --threads 20

After this I tried to use v296 but I got the errors I mentioned before but I finally solved locally rebuilding with diamond as per suggested here and pointing every database to use the v30 one (instead of the 296 that was set up at the cluster).