Humann2 failing after temp files produced

Hello, I’m trying to use Humann2 to profile some metatranscriptomic data, however it seems to only produce a few .temp files and then stops all together. Below is the code I used to install and run Humann2 - could you see if I am doing something wrong please? Many thanks in advance.

Download Humann2

This method should include metaphlan

Preparation by adding the required channels

conda config --add channels defaults

conda config --add channels bioconda

conda config --add channels conda-forge

conda config --add channels biobakery

Install bioconda

conda install bwa

Build humann2 environment

conda create --name hmnn -c biobakery humann2

type “y” to proceed all package

y

Activate hmnn environment to run humann2

conda activate hmnn

Download Databases

humann2_databases --download chocophlan full ~/hmnn_databases/

humann2_databases --download uniref uniref90_diamond ~/hmnn_databases/

Move to where the files are

cd seqdata/Danish_ash/

Make your outputs directory

mkdir outputs

#Make a new screen to run it in

screen -S humann2_denmark

Activate Humann2:

conda activate hmnn

Run humann2

for f in *.fastq.gz; do humann2 -i $f -o outputs; done

Hello, Thank you for the detailed post. I believe HUMAnN is stopping at the diamond step possibly because it is running out of memory or disk space. Would you double check both to see if this might be the case? If this is not the case would you post the log from your run (or just the last few lines)?

Thank you,
Lauren

Hi Lauren, many thanks for your reply. We don’t think the memory is an issue as we’ve run lots of intensive programmes in the past without issues. The log file is pasted below - can you see anything odd there? Many thanks

07/03/2020 10:31:42 AM - humann2.humann2 - INFO: Running humann2 v2.8.2

07/03/2020 10:31:42 AM - humann2.humann2 - INFO: Output files will be written to: /home/rantwis/outputs2

07/03/2020 10:31:42 AM - humann2.humann2 - INFO: Writing temp files to directory: /home/rantwis/outputs2/Ash100.unmapped_humann2_temp

07/03/2020 10:31:42 AM - humann2.utilities - INFO: File ( /home/rantwis/seqdata/Danish_ash/Ash100.unmapped.fastq.gz ) is of format: fastq.gz

07/03/2020 10:31:42 AM - humann2.utilities - INFO: Decompressing gzipped file …

07/03/2020 10:33:40 AM - humann2.utilities - DEBUG: Check software, bowtie2, for required version, 2.2

07/03/2020 10:33:40 AM - humann2.utilities - INFO: Using bowtie2 version 2.3

07/03/2020 10:33:40 AM - humann2.humann2 - INFO: Search mode set to uniref90 because a uniref90 translated search database is selected

07/03/2020 10:33:40 AM - humann2.utilities - DEBUG: Check software, diamond, for required version, 0.8.22

07/03/2020 10:33:40 AM - humann2.utilities - INFO: Using diamond version 0.9.31

07/03/2020 10:33:40 AM - humann2.config - INFO:

Run config settings:

DATABASE SETTINGS

nucleotide database folder = /home/rantwis/hmnn_databases/chocophlan

protein database folder = /home/rantwis/hmnn_databases/uniref

pathways database file 1 = /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2

pathways database file 2 = /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/pathways/metacyc_pathways_structured_filtered

utility mapping database folder = /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/misc

RUN MODES

resume = False

verbose = False

bypass prescreen = False

bypass nucleotide index = False

bypass nucleotide search = False

bypass translated search = False

translated search = diamond

pick frames = off

threads = 1

SEARCH MODE

search mode = uniref90

identity threshold = 90.0

ALIGNMENT SETTINGS

evalue threshold = 1.0

prescreen threshold = 0.01

translated subject coverage threshold = 50.0

translated query coverage threshold = 90.0

PATHWAYS SETTINGS

minpath = on

xipe = off

gap fill = on

INPUT AND OUTPUT FORMATS

input file format = fastq.gz

output file format = tsv

Hi, Thanks for the detailed info! I don’t see any issues/errors with that portion of the log. It looks like you are writing to your home directory. Are you possibly running out of disk space? If not, do you see an error at the end of the run? If so, can you post the error message and the last few lines of the log file?

Thank you,
Lauren

Thanks again Lauren! How would I go about getting the log file please?
Thanks

Hi, The log file is in the output folder and named $SAMPLE.log. It should have info about any errors that occurred during your HUMAnN run. Check out the last ~10-20 lines and see if you see anything with “error” and if so please post. This will help debug what might be going on.

Thanks,
Lauren

Ah ok - that’s what I posted previously, so it doesn’t look like there’s any errors there?

Hi - Thanks for the follow up! Is there any additional info at the end of your log file? From what you posted it looks like humann only decompresses the input files and then stops before running the prescreen mode with MetaPhlAn. If that is the case then is it possible you are running out of disk space in your home directory (where the outputs are written)?

Thanks!
Lauren

Hmm ok. Do you know how much bigger the files written by MetaPhlAn are than the original input file? The system says we still have 20% space left, so this seems odd but perhaps they are very big? Many thanks in advance

Hi - The MetaPhlAn output files are small so should likely not take up too much space in your home directory. Would you try running again? This time just one run on the command line instead of in the bash loop and look for any additional information printed out to the screen. There should be some error printed to the screen when the run stops that will tell us what might be going on.

Thank you!
Lauren

Good idea. Here is the log:

07/08/2020 07:17:45 PM - humann2.config - INFO:

Run config settings:

DATABASE SETTINGS

nucleotide database folder = /home/rantwis/hmnn_databases/chocophlan

protein database folder = /home/rantwis/hmnn_databases/uniref

pathways database file 1 = /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2

pathways database file 2 = /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/pathways/metacyc_pathways_structured_filtered

utility mapping database folder = /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/misc

RUN MODES

resume = False

verbose = False

bypass prescreen = False

bypass nucleotide index = False

bypass nucleotide search = False

bypass translated search = False

translated search = diamond

pick frames = off

threads = 1

SEARCH MODE

search mode = uniref90

identity threshold = 90.0

ALIGNMENT SETTINGS

evalue threshold = 1.0

prescreen threshold = 0.01

translated subject coverage threshold = 50.0

translated query coverage threshold = 90.0

PATHWAYS SETTINGS

minpath = on

xipe = off

gap fill = on

INPUT AND OUTPUT FORMATS

input file format = fastq.gz

output file format = tsv

output max decimals = 10

remove stratified output = False

remove column description output = False

log level = DEBUG

07/08/2020 07:17:45 PM - humann2.store - DEBUG: Initialize Alignments class instance to minimize memory use

07/08/2020 07:17:45 PM - humann2.store - DEBUG: Initialize Reads class instance to minimize memory use

07/08/2020 07:17:51 PM - humann2.humann2 - INFO: Load pathways database part 1: /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2

07/08/2020 07:18:05 PM - humann2.humann2 - INFO: Load pathways database part 2: /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/pathways/metacyc_pathways_structured_filtered

07/08/2020 07:18:05 PM - humann2.utilities - DEBUG: Check software, metaphlan2.py, for required version, 2.6

07/08/2020 07:18:06 PM - humann2.utilities - INFO: Using metaphlan2.py version 2.7

07/08/2020 07:18:06 PM - humann2.search.prescreen - INFO: Running metaphlan2.py …

07/08/2020 07:18:06 PM - humann2.utilities - DEBUG: Using software: /home/rantwis/miniconda3/envs/hmnn/bin/metaphlan2.py

07/08/2020 07:18:06 PM - humann2.utilities - INFO: Execute command: /home/rantwis/miniconda3/envs/hmnn/bin/metaphlan2.py /home/rantwis/seqdata/Danish_ash/outputs_single/Ash30.unmapped_humann2_temp/tmpkXLX40/tmpEbELPW -t rel_ab -o /home/rantwis/seqdata/Danish_ash/outputs_single/Ash30.unmapped_humann2_temp/Ash30.unmapped_metaphlan_bugs_list.tsv --input_type multifastq --bowtie2out /home/rantwis/seqdata/Danish_ash/outputs_single/Ash30.unmapped_humann2_temp/Ash30.unmapped_metaphlan_bowtie2.txt

Are there any clues there? Thanks so much for your help.

Sorry, I just realised I hadn’t scrolled down properly on the first log I sent you! Sorry - brain meltdown. Here is the log, it looks like it might be making a text file of bowtie data (which I think I can see in the temp file for the fasta.gz file) but then the process is stopping?

(base) rantwis@uos-p-bioi-02:~/seqdata/Danish_ash/outputs/Ash100.unmapped_humann2_temp$ less Ash100.unmapped.log

07/17/2020 09:11:25 AM - humann2.utilities - DEBUG: Check software, diamond, for required version, 0.8.22

07/17/2020 09:11:25 AM - humann2.utilities - INFO: Using diamond version 0.9.31

07/17/2020 09:11:25 AM - humann2.config - INFO:

Run config settings:

DATABASE SETTINGS

nucleotide database folder = /home/rantwis/hmnn_databases/chocophlan

protein database folder = /home/rantwis/hmnn_databases/uniref

pathways database file 1 = /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2

pathways database file 2 = /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/pathways/metacyc_pathways_structured_filtered

utility mapping database folder = /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/misc

RUN MODES

resume = False

verbose = False

bypass prescreen = False

bypass nucleotide index = False

bypass nucleotide search = False

bypass translated search = False

translated search = diamond

pick frames = off

threads = 1

SEARCH MODE

search mode = uniref90

identity threshold = 90.0

ALIGNMENT SETTINGS

evalue threshold = 1.0

prescreen threshold = 0.01

translated subject coverage threshold = 50.0

translated query coverage threshold = 90.0

PATHWAYS SETTINGS

minpath = on

xipe = off

gap fill = on

INPUT AND OUTPUT FORMATS

input file format = fastq.gz

output file format = tsv

output max decimals = 10

remove stratified output = False

remove column description output = False

log level = DEBUG

07/17/2020 09:11:25 AM - humann2.store - DEBUG: Initialize Alignments class instance to minimize memory use

07/17/2020 09:11:25 AM - humann2.store - DEBUG: Initialize Reads class instance to minimize memory use

07/17/2020 09:11:31 AM - humann2.humann2 - INFO: Load pathways database part 1: /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2

07/17/2020 09:11:45 AM - humann2.humann2 - INFO: Load pathways database part 2: /home/rantwis/.local/lib/python2.7/site-packages/humann2/data/pathways/metacyc_pathways_structured_filtered

07/17/2020 09:11:45 AM - humann2.utilities - DEBUG: Check software, metaphlan2.py, for required version, 2.6

07/17/2020 09:11:46 AM - humann2.utilities - INFO: Using metaphlan2.py version 2.7

07/17/2020 09:11:46 AM - humann2.search.prescreen - INFO: Running metaphlan2.py …

07/17/2020 09:11:46 AM - humann2.utilities - DEBUG: Using software: /home/rantwis/miniconda3/envs/hmnn/bin/metaphlan2.py

07/17/2020 09:11:46 AM - humann2.utilities - INFO: Execute command: /home/rantwis/miniconda3/envs/hmnn/bin/metaphlan2.py /home/rantwis/seqdata/Danish_ash/outputs/Ash100.unmapped_humann2_temp/tmpqOkSPC/tmpKejIuJ -t rel_ab -o /home/rantwis/seqdata/Danish_ash/outputs/Ash100.unmapped_humann2_temp/Ash100.unmapped_metaphlan_bugs_list.tsv --input_type multifastq --bowtie2out /home/rantwis/seqdata/Danish_ash/outputs/Ash100.unmapped_humann2_temp/Ash100.unmapped_metaphlan_bowtie2.txt

Thanks for trying the run again and posting the log! It looks like it might be getting stuck running MetaPhlAn. I don’t see any errors in the log but if you would try running just the MetaPhlAn command and check for errors that should help figure out what might be up! (Here is the command for your reference):

$ /home/rantwis/miniconda3/envs/hmnn/bin/metaphlan2.py /home/rantwis/seqdata/Danish_ash/outputs/Ash100.unmapped_humann2_temp/tmpqOkSPC/tmpKejIuJ -t rel_ab -o /home/rantwis/seqdata/Danish_ash/outputs/Ash100.unmapped_humann2_temp/Ash100.unmapped_metaphlan_bugs_list.tsv --input_type multifastq --bowtie2out /home/rantwis/seqdata/Danish_ash/outputs/Ash100.unmapped_humann2_temp/Ash100.unmapped_metaphlan_bowtie2.txt

Thanks,
Lauren

I have run into a similar situation, where the metaphlan2.py step ran for 48 hours without moving past the temp file creation. On a re-run I saw this error:

Downloading MetaPhlAn2 database
Please note due to the size this might take a few minutes

File /home/chris/anaconda3/envs/humann2/bin/databases/mpa_v20_m200.tar already present!

File /home/chris/anaconda3/envs/humann2/bin/databases/mpa_v20_m200.md5 already present!
MD5 checksums not found, something went wrong!

I looked into the directory and saw the database was a 0 sized file. Deleting the files resulted in downloading of 0 sized files again. I ended up finding the correct link to the database files (Dropbox - File Deleted - Simplify your life). After getting it downloaded everything works great now. I would suggest looking in the database directory to make sure you aren’t seeing the same thing I saw.

Chris

Hi Chris,
many thanks for this. In the databases folder I have chocophlan which is 847872 and uniref which is 4096, but perhaps I’m missing another database (do you know?).
Cheers
Rachael

Thanks Lauren! I ran the command and it’s saying:

Downloading MetaPhlAn2 database

Please note due to the size this might take a few minutes

File /home/rantwis/miniconda3/envs/hmnn/bin/databases/mpa_v20_m200.tar already present!

File /home/rantwis/miniconda3/envs/hmnn/bin/databases/mpa_v20_m200.md5 already present!

MD5 checksums not found, something went wrong!

Hi Rachael, I agree with Chris (Thanks Chris!) in that if you delete the database files in your MetaPhlAn folder and rerun it should resolve the error you are seeing. To get all set up with MetaPhlAn run it directly and once you see it running without errors then try running HUMAnN.

Thanks,
Lauren

Hi both, many thanks for this. I have tried as Chris suggested and its very strange - even uploading the folder from my local Desktop results in files of size 0, even when I do these individually:

(base) rantwis@uos-p-bioi-02:~/miniconda3/envs/hmnn/bin/databases$ ll
total 2851932
drwxr-xr-x 2 rantwis rantwis 4096 Aug 4 09:10 ./
drwxrwxr-x 5 rantwis rantwis 12288 Aug 3 13:46 …/
-rw-r–r-- 1 rantwis rantwis 481755 Aug 3 14:01 bcftools
-rw-r–r-- 1 rantwis rantwis 3526 Aug 3 13:57 file_list.txt
-rw-r–r-- 1 rantwis rantwis 69 Aug 3 13:57 metaphlan2_homebrew_counter.txt
-rw-r–r-- 1 rantwis rantwis 26 Aug 3 14:12 mpa_latest
-rw-r–r-- 1 rantwis rantwis 33258288 Aug 4 09:09 mpa_v20_m200_marker_info.txt.bz2
-rw-r–r-- 1 rantwis rantwis 0 Aug 4 09:10 mpa_v20_m200.md5
-rw-r–r-- 1 rantwis rantwis 0 Aug 4 09:10 mpa_v20_m200.tar
-rw-r–r-- 1 rantwis rantwis 17031150 Aug 3 14:16 mpa_v292_CHOCOPhlAn_201901_marker_info.txt.bz2
-rw-r–r-- 1 rantwis rantwis 32 Aug 3 14:07 mpa_v292_CHOCOPhlAn_201901.md5
-rw-r–r-- 1 rantwis rantwis 385853440 Aug 3 14:06 mpa_v292_CHOCOPhlAn_201901.tar
-rw-r–r-- 1 rantwis rantwis 65 Aug 3 14:01 mpa_v293_CHOCOPhlAn_201901.md5
-rw-r–r-- 1 rantwis rantwis 373032960 Aug 3 14:16 mpa_v293_CHOCOPhlAn_201901.tar
-rw-r–r-- 1 rantwis rantwis 65 Aug 3 14:16 mpa_v294_CHOCOPhlAn_201901.md5
-rw-r–r-- 1 rantwis rantwis 357406720 Aug 3 14:01 mpa_v294_CHOCOPhlAn_201901.tar
-rw-r–r-- 1 rantwis rantwis 33 Aug 3 14:01 mpa_v295_CHOCOPhlAn_201901.md5
-rw-r–r-- 1 rantwis rantwis 389560320 Aug 3 14:12 mpa_v295_CHOCOPhlAn_201901.tar
-rw-r–r-- 1 rantwis rantwis 16178781 Aug 3 14:16 mpa_v296_CHOCOPhlAn_201901_marker_info.txt.bz2
-rw-r–r-- 1 rantwis rantwis 65 Aug 3 14:22 mpa_v296_CHOCOPhlAn_201901.md5
-rw-r–r-- 1 rantwis rantwis 384675840 Aug 3 13:50 mpa_v296_CHOCOPhlAn_201901.tar
-rw-r–r-- 1 rantwis rantwis 19938674 Aug 3 14:01 mpa_v29_CHOCOPhlAn_201901_marker_info.txt.bz2
-rw-r–r-- 1 rantwis rantwis 33 Aug 3 13:57 mpa_v29_CHOCOPhlAn_201901.md5
-rw-r–r-- 1 rantwis rantwis 537395200 Aug 3 14:22 mpa_v29_CHOCOPhlAn_201901.tar
-rw-r–r-- 1 rantwis rantwis 16171356 Aug 3 14:02 mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
-rw-r–r-- 1 rantwis rantwis 64 Aug 3 14:16 mpa_v30_CHOCOPhlAn_201901.md5
-rw-r–r-- 1 rantwis rantwis 384430080 Aug 3 13:57 mpa_v30_CHOCOPhlAn_201901.tar
-rw-r–r-- 1 rantwis rantwis 4855257 Aug 3 14:06 SRS019033.fastq
-rw-r–r-- 1 rantwis rantwis 69 Aug 3 13:46 strainphlan_homebrew_counter.txt

These files are the correct size on my local machine, so I’m not sure why they aren’t copying over properly (I’ve tried a few times). Do you have any ideas/work arounds?

Many thanks for your help!

Hello - an update on this - I ran this on humann3 and it’s all now working! A quick question (perhaps better for me to start a new thread) - I need to be sure the gene assignments etc are microbial only (bacterial, fungal etc), rather than from the host (in this case a tree). Does humann only make downstream assignments on microbial taxa based on the metaphlan output, or are the eggnog/kegg etc assignments for any gene found in the sample, regardless of origin? Many thanks.

Hello - Glad to hear you have it all working! HUMAnN does not filter based on taxonomic assignment. Instead we recommend you run a quality control step prior to running HUMAnN that will filter your reads based on quality and also remove host contaminate reads. We have a tool that we use for this step named Kneaddata. We currently do not have a tree database. However, there is information on the Kneaddata site on how to build a custom database if you have tree genome available. I just double checked NCBI Genome and it looks like there are some trees that have been sequenced so if these fit your tree then you could use one for filtering host contamination.

Thank you,
Lauren