Hi all,
I am trying to process my samples with Humann 3.6 and the Struo2 release of the GTDB202 database for Humann. I’m using this command for running Humann:
humann3 -vvv --input ${name}.cleaned.fastq.gz --input-format fastq.gz \
--output humann3_out --output-basename ${name} \
--threads 16 \
--protein-database $PROTEIN \
--nucleotide-database $NUC_DB \
--bypass-nucleotide-index \
--search-mode uniref90 \
--remove-temp-output
For a subset of my samples, I am getting an error like the following:
TIMESTAMP: Completed nucleotide alignment : 1150 seconds
Traceback (most recent call last):
File "/research/bsi/projects/staff_analysis/m141127/conda/biobakery3/bin/humann3", line 33, in <module>
sys.exit(load_entry_point('humann==3.6', 'console_scripts', 'humann3')())
File "/research/bsi/projects/staff_analysis/m141127/conda/biobakery3/lib/python3.7/site-packages/humann/humann.py", line 1000, in main
nucleotide_alignment_file, alignments, unaligned_reads_store, keep_sam=True)
File "/research/bsi/projects/staff_analysis/m141127/conda/biobakery3/lib/python3.7/site-packages/humann/search/nucleotide.py", line 263, in unaligned_reads
if int(info[config.sam_flag_index]) & config.sam_unmapped_flag != 0:
ValueError: invalid literal for int() with base 10: 'AS:i:-6'
Sometimes the ValueError is a different value, e.g., ‘YT:Z:UU\n’, ‘AS:i:-28’, or ‘AS:i:-6’. When I re-run the command on these specific samples, I’m able to reproduce the error. This error also does not occur in every sample. I’ve also verified the md5sums of the database, so I think that isn’t the issue. Any guidance you can provide will be greatly appreciated, and I’ll be happy to provide any additional info.
Thanks!