Dear all,
I am using HUMAnN 3 in a cluster since there are many metagenomes in this dataset and we think there is no way to run it locally. (Using humann v3.0.0.alpha.4, MetaPhlAn version 3.0 at that cluster)
Using this code
humann --input ./IN/SRR88998813_paired_1.fastq --output ./OUT/SRR88998813 --input-format fastq --nucleotide-database ../DB/chocophlan/ --protein-database ../DB/uniref --threads 20 --metaphlan-options " --bowtie2db ../metaphlan/database296/ --index mpa_v296_CHOCOPhlAn_201901"
I had this error. I put those metaphlan-options to fix a previous error.
ERROR: The MetaPhlAn2 taxonomic profile provided was not generated with the database version v30 . Please update your version of MetaPhlAn2 to v3.0.
I had to diagnose this backwards so after a failed attempt in a run with HUMAnN I had to check again which version of the databases they are using. I ended trying to match the HUMAnN database with the one metaphlan tries to use when it is called by the HUMAnN process.
HUMANnN2 Databases ( database : build = location )
chocophlan : full = http://huttenhower.sph.harvard.edu/humann2_data/chocophlan/full_chocophlan.v296_201901.tar.gz
chocophlan : DEMO = http://huttenhower.sph.harvard.edu/humann2_data/chocophlan/DEMO_chocophlan.v296_201901.tar.gz
uniref : uniref50_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_annotated/uniref50_annotated_v201901.tar.gz
uniref : uniref90_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_annotated/uniref90_annotated_v201901.tar.gz
uniref : uniref50_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_ec_filtered/uniref50_ec_filtered_201901.tar.gz
uniref : uniref90_ec_filtered_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_201901.tar.gz
uniref : DEMO_diamond = http://huttenhower.sph.harvard.edu/humann2_data/uniprot/uniref_annotated/uniref90_DEMO_diamond_v201901.tar.gz
utility_mapping : full = http://huttenhower.sph.harvard.edu/humann2_data/full_mapping_v201901.tar.gz
I already downloaded the chocophlan and uniref databases for HUMAnN in a custom folder.
I downloaded again v296_201901 for metaphlan.
As suggested in other previous issues, I used bowtie2-build and also directly used the option --install
trying to solve this.
Alternatively I tried to directly run metaphlan (in another folder) to use the sam files as input later
metaphlan /IN/SRR88998813_paired_1.fastq --input_type fastq -s samsII/SRR88998813.sam --bowtie2out bowtieII/SRR88998813.bowtie2.bz2 -o profilesII/SRR88998813_profiled.tsv --unknown_estimation --add_viruses --bowtie2db ../database296/ -x mpa_v296_CHOCOPhlAn_201901 --nproc 30
And then tried
humann --input ./IN/SRR88998813.sam --output ./OUT/SRR88998813 --input-format genetable --nucleotide-database ../DB/chocophlan/ --protein-database ../DB/uniref --threads 20 --metaphlan-options " --bowtie2db /lustre/scratch118/infgen/team162/gi1/lab/nadia/cancer/metaphlan/database296/ --index mpa_v296_CHOCOPhlAn_201901"
But I got
# Gene Family SRR13068813_Abundance-RPKs
UNMAPPED 244590.0000000000
I searched for previous issues and this seems to come from an unmatch between databases.
Is there a way to match databases to run everything accordingly? Which databases do I need to match for this analysis? Within the restrictions of the HPC I can try to download all the databases again in case that helps.
I can either run metaphan first or just learn how to use directly HUMAnN and make it pick the correct database.
I am attaching the logs from trying either HUMAnN or MetaPhlAn in this cluster
trySAM.txt (5.2 KB)
tryHUMAnN.txt (16.0 KB)
Thanks for your help!