The bioBakery help forum

Error message returned from bowtie2 : Could not open index file all_genes_annot.rev.rev.1.bt2l; (ERR): bowtie2-align died with signal 11 (SEGV)

The error message returned when I try to run Humann3 is:

Error message returned from bowtie2 : Could not open index file /marialaura/databases/humann/struo_gtdb/nucleotide/all_genes_annot.rev.rev.1.bt2l Could not open index file /marialaura/databases/humann/struo_gtdb/nucleotide/all_genes_annot.rev.rev.2.bt2l (ERR): bowtie2-align died with signal 11 (SEGV)

The error has already been discussed elsewhere, in the GitHub page from where I downloaded the database. We concluded that it is likely not a database problem, but a Humann problem.

I downloaded this Humann3 database from Struo2, a very good project by Cuesta-Zuluaga, Ley and Youngblut. They provide a db based on GTDB taxonomy. The available files are:

  • Nucleotide
  • 12G - all_genes_annot.1.bt2l
  • 13G - all_genes_annot.2.bt2l
  • 500M - all_genes_annot.3.bt2l
  • 6.6G - all_genes_annot.4.bt2l
  • 12G - all_genes_annot.rev.1.bt2l (this one gives me the problem)
  • 13G - all_genes_annot.rev.2.bt2l (this one too)
  • 7.9G - genome_reps_filt_annot.fna.gz
  • Protein
  • 11G - uniref90_201901.dmnd

I tried running Humann3 with these databases, using the command:

humann \
--bypass-nucleotide-index \
--search-mode uniref90 \
--remove-temp-output \
--nucleotide-database /marialaura/databases/humann/struo_gtdb/nucleotide/ \ 
--protein-database /marialaura/databases/humann/struo_gtdb/protein/ \ 
--taxonomic-profile /marialaura/kraken_results/mpa_syle/S57.mpa.txt \
--threads 1 \
--input /marialaura/samples_joined/S57_R1_R2.fastq.gz \
--output /marialaura/humann_results/humann_23-06_test 

My version of Humann is v3.0.0.alpha.4. I am aware that it is not the latest, but as I am not the administrator of the system, I have not been able get v3.0.0 yet.

I wonder if you had an answer on why I keep getting this error. I will keep this updated as soon as I get the latest version of Humann.

Hi Maria, Thank you for the detailed post. It looks like HUMAnN might be having a hard time locating the large index files for the custom database you are using. Instead of providing just the folder can you try changing your command to include the index in the path to the nucleotide database (see below)?

--nucleotide-database /marialaura/databases/humann/struo_gtdb/nucleotide/all_genes_annot

Thank you,
Lauren

1 Like

Thank you so much, @lauren.j.mciver. I have changed the command according to your suggestion and updated Humann according to Nick Youngblut’s suggestion in the Struo2 forum. Then it had a problem in Diamond version and the entire Humann environment had to be reinstalled. However, now I get the problem:
/var/spool/pbs/mom_priv/jobs/28220.master.SC: line 17: 22931 Segmentation fault humann --bypass-nucleotide-index --search-mode uniref90 --remove-temp-output --nucleotide-database /marialaura/databases/humann/struo_gtdb/nucleotide/all_genes_annot --protein-database /marialaura/databases/humann/struo_gtdb/protein/ --taxonomic-profile /marialaura/kraken_results/kraken_results_gtdb_26-05-2021-23/metaphlan_style/${i}.mpa.txt --threads 10 --input /marialaura/kneaddata/${i}_R1_001_kneaddata_joined.fastq --output /lmarialaura/humann_results/humann_28-06
Could the segmentation fault be a memory issue? I am currently working with 94GB. The databases sum up to 66GB for nucleotide and 12GB for protein. My fastqs are not compressed, so the biggest of them is 20GB.

Hi Maria, Thank you for the follow up. I am glad to hear that fixed the first issue. Usually running out of memory is a different error then a seg fault so I think you probably have enough memory allocated. If you could check the HUMAnN log it might tell us what was running when the seg fault occurred. It will likely be either the bowtie2 or diamond run. You might also find additional error information in the log. If not, try running the exact command (bowtie2 or diamond) directly to see if you can get more information about the error.

Thank you,
Lauren

1 Like

Thanks so much for such fast reply. I will do it as soon as I can.

Hi again, @lauren.j.mciver. Sorry for taking long to reply! (exams :disappointed_relieved:)

The end of the log went as:

07/01/2021 09:52:44 PM - humann.store - DEBUG: Initialize Alignments class instance to minimize memory use

07/01/2021 09:52:44 PM - humann.store - DEBUG: Initialize Reads class instance to minimize memory use

07/01/2021 09:53:08 PM - humann.humann - INFO: Load pathways database part 1: /opt/spack/opt/spack/linux-centos7-westmere/gcc-8.2.0/miniconda3-4.7.12.1-slhupgyt5ir7mlt3bfac4pqk623zk4sj/envs/humann3/lib/python3.7/site-packages/humann/data/pathways/metacyc_reactions_level4ec_only.uniref.bz2

07/01/2021 09:53:08 PM - humann.humann - INFO: Load pathways database part 2: /opt/spack/opt/spack/linux-centos7-westmere/gcc-8.2.0/miniconda3-4.7.12.1-slhupgyt5ir7mlt3bfac4pqk623zk4sj/envs/humann3/lib/python3.7/site-packages/humann/data/pathways/metacyc_pathways_structured_filtered 

07/01/2021 09:53:08 PM - humann.search.nucleotide - DEBUG: Nucleotide input file is of type: fastq

07/01/2021 09:53:08 PM - humann.utilities - DEBUG: Using software: /opt/spack/opt/spack/linux-centos7-westmere/gcc-8.2.0/miniconda3-4.7.12.1-slhupgyt5ir7mlt3bfac4pqk623zk4sj/envs/humann3/bin/bowtie2

07/01/2021 09:53:08 PM - humann.utilities - INFO: Execute command: /opt/spack/opt/spack/linux-centos7-westmere/gcc-8.2.0/miniconda3-4.7.12.1-slhupgyt5ir7mlt3bfac4pqk623zk4sj/envs/humann3/bin/bowtie2 -q -x /lustre/marialaura/databases/humann/struo_gtdb/nucleotide/all_genes_annot -U /lustre/marialaura/kneaddata/S57_R1_001_kneaddata_joined.fastq -S /lustre/marialaura/humann_results/humann_28-06/S57_R1_001_kneaddata_joined_humann_temp_t0t8kji8/S57_R1_001_kneaddata_joined_bowtie2_aligned.sam -p 10 --very-sensitive

I tried this last command separately, both using a separate installation of bowtie and the one in humann environment.

The first one, independent bowtie2.4.2, not in a conda environment, went well.

The second one, bowtie2.4.4 inside Humann3 conda environment, failed with segmentation fault.

What is happening? The Humann3 version of bowtie is even more recent.

Hello, Thank you for the follow up. You could try uninstalling and re-installing the bowtie2 in your conda environment. Maybe there was an issue with the install. If not you could try installing the non-conda bowtie2 in your conda environment to see if that resolves the issue. Maybe there is something different with the conda bowtie2 that does not quite sync up with your environment.

Thank you,
Lauren

I’ve tried circumventing the problem by indicating the other Bowtie with the --bowtie2 option. But Humann still seems to be using the in-environment Bowtie. What did I got wrong with --bowtie2?

This is the tail of the log:

07/09/2021 06:04:49 AM - humann.utilities - DEBUG: Using software: /opt/spack/opt/spack/linux-centos7-westmere/gcc-8.2.0/miniconda3-4.7.12.1-slhupgyt5ir7mlt3bfac4pqk623zk4sj/envs/humann3/bin/bowtie2
07/09/2021 06:04:49 AM - humann.utilities - INFO: Execute command: /opt/spack/opt/spack/linux-centos7-westmere/gcc-8.2.0/miniconda3-4.7.12.1-slhupgyt5ir7mlt3bfac4pqk623zk4sj/envs/humann3/bin/bowtie2 -q -x /lustre/marialaura/databases/humann/struo_gtdb/nucleotide/all_genes_annot -U /lustre/marialaura/kneaddata/S66_R1_001_kneaddata_joined.fastq -S /lustre/marialaura/humann_results/humann_28-06/S66_R1_001_kneaddata_joined_humann_temp__hhk76it/S66_R1_001_kneaddata_joined_bowtie2_aligned.sam -p 10 --very-sensitive
07/09/2021 07:05:35 AM - humann.utilities - DEBUG: b'72043558 reads; of these:\n  72043558 (100.00%) were unpaired; of these:\n    48911079 (67.89%) aligned 0 times\n    9126169 (12.67%) aligned exactly 1 time\n    14006310 (19.44%) aligned >1 times\n32.11% overall alignment rate\n'
07/09/2021 07:05:35 AM - humann.humann - INFO: TIMESTAMP: Completed     nucleotide alignment    :        3646    seconds

This is the command I had used, I thought it was alright:

for i in S{57..66}
do
     humann \
     --bypass-nucleotide-index \
     --search-mode uniref90 \
     --bowtie2 /opt/spack/opt/spack/linux-centos7-westmere/gcc-8.2.0/bowtie2-2.4.2-upu2ulm22vzqqnhp5dwcv3wi7alnmtkj/bin/bowtie2 \
      --remove-temp-output \
     --nucleotide-database /lustre/marialaura/databases/humann/struo_gtdb/nucleotide/all_genes_annot \
     --protein-database /lustre/marialaura/databases/humann/struo_gtdb/protein/ \
     --taxonomic-profile /lustre/marialaura/kraken_results/kraken_results_gtdb_26-05-2021-23/metaphlan_style/${i}.mpa.txt \
     --threads 10 \ 
     --memory-use minimum \
     --input /lustre/marialaura/kneaddata/${i}_R1_001_kneaddata_joined.fastq \
     --output /lustre/marialaura/humann_results/humann_28-06 \
     --o-log /lustre/marialaura/humann_results/humann_28-06/log.txt
done

Hello Maria, If you change your bowtie2 option slightly, to remove the executable, HUMAnN should pick it up. Sorry for any confusion. The HUMAnN option needs the full path to the directory.

--bowtie2 /opt/spack/opt/spack/linux-centos7-westmere/gcc-8.2.0/bowtie2-2.4.2-upu2ulm22vzqqnhp5dwcv3wi7alnmtkj/bin/

If you have issues with this bowtie2 in your conda environment and since you have issues with the bowtie2 installed with conda, you might try manually installing bowtie2 in your conda environment. Bowtie2 provides a v2.2.3 executable that you would just need to download and unzip.

Thank you,
Lauren

1 Like