Humann3 metatranscriptome analysis stuck at nucleotide alignment post processing

Hi All,

I am having trouble getting through a metatranscriptomic analysis with humann3. I am running it on my university’s computing cluster which uses slurm job manager. I am using paired end sequencing data that was previously cleaned with Kneaddata (without any error). I then concatenated the cleaned forward and reverse fastq files for each sample to make a single file per sample containing F+R reads per the instructions on your man page.

For about 2/3 of my samples the analysis completed successfully, however for the other 1/3 of my samples they never make it past the nucleotide alignment step (based on log files, example attached). They seem to complete the nucleotide alignment ok, usually within a few hours, but then the run sits for up to 10+ days (as long as I’ve tried) with apparently nothing happening. I see no updates to logfile, no changes in the size of the aligned reads file and no other files (like unaligned reads) are created. Eventually they just time-out with no apparent progress after the nucleotide alignment step.

I have tried looking back for issues with these particular input files (no errors in clean up step with knead data as mentioned) and I tried re-concatenating the F+R files. But so far I can’t see anything at issue with the input files themselves. The issue does seem to be related to file size as the failed samples tend to be among those with higher read counts, however this is not a perfect correlation as some samples with equivalent reads completed successfully. my files are rather large with those on the higher end of read counts ~80-100 million reads.

Any ideas/help would be greatly appreciated!
Thanks,
Chapman

Log file:
‘’’
03/24/2022 04:20:54 PM - humann.humann - INFO: Running humann v3.0.1
03/24/2022 04:20:54 PM - humann.humann - INFO: Output files will be written to: /users/cbeekman/scratch/GImapping/metaRNA/humann3/humann3_outfiles/RNA_17
03/24/2022 04:20:54 PM - humann.humann - INFO: Writing temp files to directory: /users/cbeekman/scratch/GImapping/metaRNA/humann3/humann3_outfiles/RNA_17/RNA_17_cleanFR_humann_temp
03/24/2022 04:20:54 PM - humann.utilities - INFO: File ( /users/cbeekman/scratch/GImapping/metaRNA/humann3/concat_cleanreads/RNA_17_cleanFR.fastq ) is of format: fastq
03/24/2022 04:20:54 PM - humann.utilities - DEBUG: Check software, metaphlan, for required version, 3.0
03/24/2022 04:20:56 PM - humann.utilities - INFO: Using metaphlan version 3.0
03/24/2022 04:20:56 PM - humann.utilities - DEBUG: Check software, bowtie2, for required version, 2.2
03/24/2022 04:20:56 PM - humann.utilities - INFO: Using bowtie2 version 2.4
03/24/2022 04:20:56 PM - humann.humann - INFO: Search mode set to uniref90 because a uniref90 translated search database is selected
03/24/2022 04:20:56 PM - humann.utilities - DEBUG: Check software, diamond, for required version, 0.9.36
03/24/2022 04:20:56 PM - humann.utilities - INFO: Using diamond version 2.0.14
03/24/2022 04:20:56 PM - humann.config - INFO:
Run config settings:

DATABASE SETTINGS
nucleotide database folder = /users/cbeekman/scratch/GImapping/databases/humann3/chocophlan
protein database folder = /users/cbeekman/scratch/GImapping/databases/humann3/uniref
pathways database file 1 = /users/cbeekman/miniconda2/envs/humann3/lib/python3.9/site-packages/humann/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2
pathways database file 2 = /users/cbeekman/miniconda2/envs/humann3/lib/python3.9/site-packages/humann/data/pathways/metacyc_pathways_structured_filtered_v24
utility mapping database folder = /users/cbeekman/miniconda2/envs/humann3/lib/python3.9/site-packages/humann/data/misc

RUN MODES
resume = False
verbose = False
bypass prescreen = False
bypass nucleotide index = False
bypass nucleotide search = False
bypass translated search = False
translated search = diamond
threads = 48

SEARCH MODE
search mode = uniref90
nucleotide identity threshold = 0.0
translated identity threshold = 80.0

ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 50.0
translated query coverage threshold = 90.0
nucleotide subject coverage threshold = 50.0
nucleotide query coverage threshold = 90.0

PATHWAYS SETTINGS
minpath = on
xipe = off
gap fill = on

INPUT AND OUTPUT FORMATS
input file format = fastq
output file format = tsv
output max decimals = 10
remove stratified output = False
remove column description output = False
log level = DEBUG

03/24/2022 04:20:56 PM - humann.store - DEBUG: Initialize Alignments class instance to maximize memory use
03/24/2022 04:20:56 PM - humann.store - DEBUG: Initialize Reads class instance to maximize memory use
03/24/2022 04:21:13 PM - humann.humann - INFO: Load pathways database part 1: /users/cbeekman/miniconda2/envs/humann3/lib/python3.9/site-packages/humann/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2
03/24/2022 04:21:13 PM - humann.humann - INFO: Load pathways database part 2: /users/cbeekman/miniconda2/envs/humann3/lib/python3.9/site-packages/humann/data/pathways/metacyc_pathways_structured_filtered_v24
03/24/2022 04:21:13 PM - humann.search.prescreen - INFO: Found g__Klebsiella.s__Klebsiella_pneumoniae : 54.76% of mapped reads
03/24/2022 04:21:13 PM - humann.search.prescreen - INFO: Found g__Parasutterella.s__Parasutterella_excrementihominis : 23.03% of mapped reads
03/24/2022 04:21:13 PM - humann.search.prescreen - INFO: Found g__Klebsiella.s__Klebsiella_variicola : 18.80% of mapped reads
03/24/2022 04:21:13 PM - humann.search.prescreen - INFO: Found g__Klebsiella.s__Klebsiella_quasipneumoniae : 1.29% of mapped reads
03/24/2022 04:21:13 PM - humann.search.prescreen - INFO: Found g__Proteobacteria_unclassified.s__Proteobacteria_bacterium_CAG_139 : 1.01% of mapped reads
03/24/2022 04:21:13 PM - humann.search.prescreen - INFO: Found g__Klebsiella.s__Klebsiella_michiganensis : 0.86% of mapped reads
03/24/2022 04:21:13 PM - humann.search.prescreen - INFO: Found g__Turicimonas.s__Turicimonas_muris : 0.23% of mapped reads
03/24/2022 04:21:13 PM - humann.search.prescreen - INFO: Found g__Lactobacillus.s__Lactobacillus_johnsonii : 0.02% of mapped reads
03/24/2022 04:21:13 PM - humann.search.prescreen - INFO: Total species selected from prescreen: 8
03/24/2022 04:21:13 PM - humann.search.prescreen - DEBUG: Adding file to database: g__Parasutterella.s__Parasutterella_excrementihominis.centroids.v296_v201901b.ffn.gz
03/24/2022 04:21:13 PM - humann.search.prescreen - DEBUG: Adding file to database: g__Klebsiella.s__Klebsiella_quasipneumoniae.centroids.v296_v201901b.ffn.gz
03/24/2022 04:21:13 PM - humann.search.prescreen - DEBUG: Adding file to database: g__Klebsiella.s__Klebsiella_michiganensis.centroids.v296_v201901b.ffn.gz
03/24/2022 04:21:13 PM - humann.search.prescreen - DEBUG: Adding file to database: g__Turicimonas.s__Turicimonas_muris.centroids.v201901b.ffn.gz
03/24/2022 04:21:13 PM - humann.search.prescreen - DEBUG: Adding file to database: g__Klebsiella.s__Klebsiella_pneumoniae.centroids.v296_v201901b.ffn.gz
03/24/2022 04:21:13 PM - humann.search.prescreen - DEBUG: Adding file to database: g__Proteobacteria_unclassified.s__Proteobacteria_bacterium_CAG_139.centroids.v296_v201901b.ffn.gz
03/24/2022 04:21:13 PM - humann.search.prescreen - DEBUG: Adding file to database: g__Klebsiella.s__Klebsiella_variicola.centroids.v296_v201901b.ffn.gz
03/24/2022 04:21:13 PM - humann.search.prescreen - DEBUG: Adding file to database: g__Lactobacillus.s__Lactobacillus_johnsonii.centroids.v296_v201901b.ffn.gz
03/24/2022 04:21:13 PM - humann.search.prescreen - INFO: Creating custom ChocoPhlAn database …
03/24/2022 04:21:13 PM - humann.utilities - DEBUG: Using software: /usr/bin/gunzip
03/24/2022 04:21:13 PM - humann.utilities - INFO: Execute command: /usr/bin/gunzip -c /users/cbeekman/scratch/GImapping/databases/humann3/chocophlan/g__Parasutterella.s__Parasutterella_excrementihominis.centroids.v296_v201901b.ffn.gz /users/cbeekman/scratch/GImapping/databases/humann3/chocophlan/g__Klebsiella.s__Klebsiella_quasipneumoniae.centroids.v296_v201901b.ffn.gz /users/cbeekman/scratch/GImapping/databases/humann3/chocophlan/g__Klebsiella.s__Klebsiella_michiganensis.centroids.v296_v201901b.ffn.gz /users/cbeekman/scratch/GImapping/databases/humann3/chocophlan/g__Turicimonas.s__Turicimonas_muris.centroids.v201901b.ffn.gz /users/cbeekman/scratch/GImapping/databases/humann3/chocophlan/g__Klebsiella.s__Klebsiella_pneumoniae.centroids.v296_v201901b.ffn.gz /users/cbeekman/scratch/GImapping/databases/humann3/chocophlan/g__Proteobacteria_unclassified.s__Proteobacteria_bacterium_CAG_139.centroids.v296_v201901b.ffn.gz /users/cbeekman/scratch/GImapping/databases/humann3/chocophlan/g__Klebsiella.s__Klebsiella_variicola.centroids.v296_v201901b.ffn.gz /users/cbeekman/scratch/GImapping/databases/humann3/chocophlan/g__Lactobacillus.s__Lactobacillus_johnsonii.centroids.v296_v201901b.ffn.gz
03/24/2022 04:21:14 PM - humann.humann - INFO: TIMESTAMP: Completed custom database creation : 1 seconds
03/24/2022 04:21:14 PM - humann.search.nucleotide - INFO: Running bowtie2-build …
03/24/2022 04:21:14 PM - humann.utilities - DEBUG: Using software: /users/cbeekman/miniconda2/envs/humann3/bin/bowtie2-build
03/24/2022 04:21:14 PM - humann.utilities - INFO: Execute command: /users/cbeekman/miniconda2/envs/humann3/bin/bowtie2-build -f /users/cbeekman/scratch/GImapping/metaRNA/humann3/humann3_outfiles/RNA_17/RNA_17_cleanFR_humann_temp/RNA_17_cleanFR_custom_chocophlan_database.ffn /users/cbeekman/scratch/GImapping/metaRNA/humann3/humann3_outfiles/RNA_17/RNA_17_cleanFR_humann_temp/RNA_17_cleanFR_bowtie2_index
03/24/2022 04:24:26 PM - humann.humann - INFO: TIMESTAMP: Completed database index : 192 seconds
03/24/2022 04:24:26 PM - humann.search.nucleotide - DEBUG: Nucleotide input file is of type: fastq
03/24/2022 04:24:26 PM - humann.utilities - DEBUG: Using software: /users/cbeekman/miniconda2/envs/humann3/bin/bowtie2
03/24/2022 04:24:26 PM - humann.utilities - INFO: Execute command: /users/cbeekman/miniconda2/envs/humann3/bin/bowtie2 -q -x /users/cbeekman/scratch/GImapping/metaRNA/humann3/humann3_outfiles/RNA_17/RNA_17_cleanFR_humann_temp/RNA_17_cleanFR_bowtie2_index -U /users/cbeekman/scratch/GImapping/metaRNA/humann3/concat_cleanreads/RNA_17_cleanFR.fastq -S /users/cbeekman/scratch/GImapping/metaRNA/humann3/humann3_outfiles/RNA_17/RNA_17_cleanFR_humann_temp/RNA_17_cleanFR_bowtie2_aligned.sam -p 48 --very-sensitive
03/24/2022 05:04:18 PM - humann.utilities - DEBUG: b"Warning: skipping read ‘A00261:479:H7VMHDSX3:3:1152:28031:31501:N:0:CAAGGTAC+GAGATACG#0’ because length (1) <= # seed mismatches (0)\nWarning: skipping read ‘A00261:479:H7VMHDSX3:3:1152:28031:31501:N:0:CAAGGTAC+GAGATACG#0’ because it was < 2 characters long\nWarning: skipping read ‘A00261:479:H7VMHDSX3:3:2463:28230:8296:N:0:CAAGGTAC+GAGATACG#0’ because length (1) <= # seed mismatches (0)\nWarning: skipping read ‘A00261:479:H7VMHDSX3:3:2463:28230:8296:N:0:CAAGGTAC+GAGATACG#0’ because it was < 2 characters long\nWarning: skipping read ‘A00261:479:H7VMHDSX3:3:1152:28031:31501:N:0:CAAGGTAC+GAGATACG#0’ because length (1) <= # seed mismatches (0)\nWarning: skipping read ‘A00261:479:H7VMHDSX3:3:1152:28031:31501:N:0:CAAGGTAC+GAGATACG#0’ because it was < 2 characters long\nWarning: skipping read ‘A00261:479:H7VMHDSX3:3:2463:28230:8296:N:0:CAAGGTAC+GAGATACG#0’ because length (1) <= # seed mismatches (0)\nWarning: skipping read ‘A00261:479:H7VMHDSX3:3:2463:28230:8296:N:0:CAAGGTAC+GAGATACG#0’ because it was < 2 characters long\n185480202 reads; of these:\n 185480202 (100.00%) were unpaired; of these:\n 96382098 (51.96%) aligned 0 times\n 14695505 (7.92%) aligned exactly 1 time\n 74402599 (40.11%) aligned >1 times\n48.04% overall alignment rate\n"
03/24/2022 05:04:18 PM - humann.humann - INFO: TIMESTAMP: Completed nucleotide alignment : 2392 seconds

Hi, Thank you for the detailed post. It sounds like you might be running out of memory for the jobs that failed. Can you try increasing the memory allocated to those jobs to see if it resolves the issue?

Thank you,
Lauren

Hi Lauren,

Thanks for the response, I am already maxing out the memory per run on out standard partition. However our cluster does have a large memory partition which will allow a bit more, I will try running there with more memory requested.
However, usually the slurm file would make some reference to memory if the job runs out, these simply say timed out in the slurms…? Are there any any other possibilities that come to mind?