Bowtie2 Could not open bowtie2_aligned.sam

Greetings! I’m currently conducting a metatranscriptomics analysis using HUMAnN. Given the low bacterial biomass and low input RNA quality in my samples (derived from bovine milk), I’ve encountered a high proportion of unclassified reads from MetaPhlAn. For instance in one sample, with MetaPhlAn4’s default settings, 100% of the reads were unclassified, and even with adjusted settings (stat_q at 0.01 and min_mapq_val at 5), 99.93208% of the reads remained unclassified.

I opted to bypass the taxonomic profiling step using the --bypass-prescreen option in HUMAnN, aiming to directly obtain functional profiles. Here’s the command I used:
humann --input other_kneaddata.fastq --output output/dir --bypass-prescreen --search-mode uniref50

However, I encountered an error during the process. The details are as follows:

other_kneaddata_custom_chocophlan_database.ffn /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/based_on_metaphlan/D71PD_stat_q_0.01_min_mapq_val_-1/other_kneaddata_humann_temp/other_kneaddata_bowtie2_index --large-index
04/09/2024 10:23:11 AM - humann.humann - INFO: TIMESTAMP: Completed database index : 163501 seconds
04/09/2024 10:23:12 AM - humann.search.nucleotide - DEBUG: Nucleotide input file is of type: fastq
04/09/2024 10:23:12 AM - humann.utilities - DEBUG: Using software: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/bowtie2/2.4.4/bin/bowtie2
04/09/2024 10:23:12 AM - humann.utilities - INFO: Execute command: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/bowtie2/2.4.4/bin/bowtie2 -q -x /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/based_on_metaphlan/D71PD_stat_q_0.01_min_mapq_val_-1/other_kneaddata_humann_temp/other_kneaddata_bowtie2_index -U /scratch/zhangbin/metatranscriptomics/4_host_RNA_trimmed/kneadata/D71PD_cow_human_trimmed/other_kneaddata.fastq -S /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/based_on_metaphlan/D71PD_stat_q_0.01_min_mapq_val_-1/other_kneaddata_humann_temp/other_kneaddata_bowtie2_aligned.sam --very-sensitive
04/09/2024 10:23:12 AM - humann.utilities - CRITICAL: Error executing: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/bowtie2/2.4.4/bin/bowtie2 -q -x /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/based_on_metaphlan/D71PD_stat_q_0.01_min_mapq_val_-1/other_kneaddata_humann_temp/other_kneaddata_bowtie2_index -U /scratch/zhangbin/metatranscriptomics/4_host_RNA_trimmed/kneadata/D71PD_cow_human_trimmed/other_kneaddata.fastq -S /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/based_on_metaphlan/D71PD_stat_q_0.01_min_mapq_val_-1/other_kneaddata_humann_temp/other_kneaddata_bowtie2_aligned.sam --very-sensitive

Error message returned from bowtie2 :
Error: Could not open alignment output file /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/based_on_metaphlan/D71PD_stat_q_0.01_min_mapq_val_-1/other_kneaddata_humann_temp/other_kneaddata_bowtie2_aligned.sam
Error: Encountered internal Bowtie 2 exception (#1)
Command: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/bowtie2/2.4.4/bin/bowtie2-align-l --wrapper basic-0 -q -x /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/based_on_metaphlan/D71PD_stat_q_0.01_min_mapq_val_-1/other_kneaddata_humann_temp/other_kneaddata_bowtie2_index -S /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/based_on_metaphlan/D71PD_stat_q_0.01_min_mapq_val_-1/other_kneaddata_humann_temp/other_kneaddata_bowtie2_aligned.sam --very-sensitive -U /scratch/zhangbin/metatranscriptomics/4_host_RNA_trimmed/kneadata/D71PD_cow_human_trimmed/other_kneaddata.fastq
(ERR): bowtie2-align exited with value 1

04/09/2024 10:23:12 AM - humann.utilities - CRITICAL: TRACEBACK:
Traceback (most recent call last):
File “/home/zhangbin/humann3/lib/python3.9/site-packages/humann/utilities.py”, line 761, in execute_command
p_out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
File “/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.9.6/lib/python3.9/subprocess.py”, line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File “/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.9.6/lib/python3.9/subprocess.py”, line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command ‘[’/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/bowtie2/2.4.4/bin/bowtie2’, ‘-q’, ‘-x’, ‘/scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/based_on_metaphlan/D71PD_stat_q_0.01_min_mapq_val_-1/other_kneaddata_humann_temp/other_kneaddata_bowtie2_index’, ‘-U’, ‘/scratch/zhangbin/metatranscriptomics/4_host_RNA_trimmed/kneadata/D71PD_cow_human_trimmed/other_kneaddata.fastq’, ‘-S’, ‘/scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/based_on_metaphlan/D71PD_stat_q_0.01_min_mapq_val_-1/other_kneaddata_humann_temp/other_kneaddata_bowtie2_aligned.sam’, ‘–very-sensitive’]’ returned non-zero exit status 1.

any suggestions?

I think what you want here is --bypass-nucleotide-search, which will skip trying to identify species in your community (which is apparently quite uncharacterized) and instead forward your reads directly to translated search. You will likely want to use the UniRef50 protein database for this to enable finding more remote homology.

Thank you for the suggestions! based on your advise I used this scripts:
humann --input /scratch/zhangbin/metatranscriptomics/4_host_RNA_trimmed/kneadata/D71PD_cow_human_trimmed/other_kneaddata.fastq --output /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/D71PD --bypass-nucleotide-search --search-mode uniref50
while humann still failed. Here is the log:
04/12/2024 10:21:50 PM - humann.humann - INFO: Running humann v3.8
04/12/2024 10:21:50 PM - humann.humann - INFO: Output files will be written to: /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/D71PD
04/12/2024 10:21:50 PM - humann.humann - INFO: Writing temp files to directory: /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/D71PD/other_kneaddata_humann_temp
04/12/2024 10:21:50 PM - humann.utilities - INFO: File ( /scratch/zhangbin/metatranscriptomics/4_host_RNA_trimmed/kneadata/D71PD_cow_human_trimmed/other_kneaddata.fastq ) is of format: fastq
04/12/2024 10:21:50 PM - humann.utilities - DEBUG: Check software, diamond, for required version, 2.0.15
04/12/2024 10:21:50 PM - humann.utilities - INFO: Using diamond version 2.0.15
04/12/2024 10:21:50 PM - humann.config - INFO:
Run config settings:

DATABASE SETTINGS
nucleotide database folder = /scratch/zhangbin/db/humann/chocophlan
protein database folder = /scratch/zhangbin/db/humann/uniref
pathways database file 1 = /home/zhangbin/humann3/lib/python3.9/site-packages/humann/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2
pathways database file 2 = /home/zhangbin/humann3/lib/python3.9/site-packages/humann/data/pathways/metacyc_pathways_structured_filtered_v24_subreactions
utility mapping database folder = /scratch/zhangbin/db/humann/utility_mapping

RUN MODES
resume = False
verbose = False
bypass prescreen = True
bypass nucleotide index = True
bypass nucleotide search = True
bypass translated search = False
translated search = diamond
threads = 1

SEARCH MODE
search mode = uniref50
nucleotide identity threshold = 0.0
translated identity threshold = 50.0

ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --sensitive --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 50.0
translated query coverage threshold = 90.0
nucleotide subject coverage threshold = 50.0
nucleotide query coverage threshold = 90.0

PATHWAYS SETTINGS
minpath = on
xipe = off
gap fill = on

INPUT AND OUTPUT FORMATS
input file format = fastq
output file format = tsv
output max decimals = 10
remove stratified output = False
remove column description output = False
log level = DEBUG

04/12/2024 10:21:50 PM - humann.store - DEBUG: Initialize Alignments class instance to minimize memory use
04/12/2024 10:21:50 PM - humann.store - DEBUG: Initialize Reads class instance to minimize memory use
04/12/2024 10:22:20 PM - humann.humann - INFO: Load pathways database part 1: /home/zhangbin/humann3/lib/python3.9/site-packages/humann/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2
04/12/2024 10:22:20 PM - humann.humann - INFO: Load pathways database part 2: /home/zhangbin/humann3/lib/python3.9/site-packages/humann/data/pathways/metacyc_pathways_structured_filtered_v24_subreactions
04/12/2024 10:22:20 PM - humann.humann - DEBUG: Custom database is empty
04/12/2024 10:22:20 PM - humann.store - DEBUG: Initialize Reads class instance to minimize memory use
04/12/2024 10:27:29 PM - humann.utilities - DEBUG: Remove file: /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/D71PD/other_kneaddata_humann_temp/tmp48fxpgck/tmpbtuvkv66
04/12/2024 10:27:29 PM - humann.search.translated - DEBUG: Convert unaligned reads fastq file to fasta
04/12/2024 10:31:27 PM - humann.search.translated - INFO: Running diamond …
04/12/2024 10:31:28 PM - humann.search.translated - INFO: Aligning to reference database: uniref50_201901b_full.dmnd
04/12/2024 10:31:28 PM - humann.utilities - DEBUG: Remove file: /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/D71PD/other_kneaddata_humann_temp/tmp48fxpgck/diamond_m8_uha44b8a
04/12/2024 10:31:28 PM - humann.utilities - DEBUG: Using software: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/diamond/2.0.15/bin/diamond
04/12/2024 10:31:28 PM - humann.utilities - INFO: Execute command: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/diamond/2.0.15/bin/diamond blastx --query /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/D71PD/other_kneaddata_humann_temp/tmp48fxpgck/tmp65hn1wc5 --evalue 1.0 --threads 1 --top 1 --sensitive --outfmt 6 --db /scratch/zhangbin/db/humann/uniref/uniref50_201901b_full --out /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/D71PD/other_kneaddata_humann_temp/tmp48fxpgck/diamond_m8_uha44b8a --tmpdir /scratch/zhangbin/metatranscriptomics/5_assembly_free_analysis/6_humann/D71PD/other_kneaddata_humann_temp/tmp48fxpgck
Any input is highly appreciated.