Humann3 error for some samples but not others

sjohnson · June 6, 2023, 9:04pm

Hi all,

I am trying to process my samples with Humann 3.6 and the Struo2 release of the GTDB202 database for Humann. I’m using this command for running Humann:

humann3 -vvv --input ${name}.cleaned.fastq.gz --input-format fastq.gz \
        --output humann3_out --output-basename ${name} \
        --threads 16 \
        --protein-database $PROTEIN \
        --nucleotide-database $NUC_DB \
        --bypass-nucleotide-index \
        --search-mode uniref90 \
        --remove-temp-output

For a subset of my samples, I am getting an error like the following:

TIMESTAMP: Completed nucleotide alignment : 1150 seconds

Traceback (most recent call last):
  File "/research/bsi/projects/staff_analysis/m141127/conda/biobakery3/bin/humann3", line 33, in <module>
    sys.exit(load_entry_point('humann==3.6', 'console_scripts', 'humann3')())
  File "/research/bsi/projects/staff_analysis/m141127/conda/biobakery3/lib/python3.7/site-packages/humann/humann.py", line 1000, in main
    nucleotide_alignment_file, alignments, unaligned_reads_store, keep_sam=True)
  File "/research/bsi/projects/staff_analysis/m141127/conda/biobakery3/lib/python3.7/site-packages/humann/search/nucleotide.py", line 263, in unaligned_reads
    if int(info[config.sam_flag_index]) & config.sam_unmapped_flag != 0:
ValueError: invalid literal for int() with base 10: 'AS:i:-6'

Sometimes the ValueError is a different value, e.g., ‘YT:Z:UU\n’, ‘AS:i:-28’, or ‘AS:i:-6’. When I re-run the command on these specific samples, I’m able to reproduce the error. This error also does not occur in every sample. I’ve also verified the md5sums of the database, so I think that isn’t the issue. Any guidance you can provide will be greatly appreciated, and I’ll be happy to provide any additional info.

Thanks!

franzosa · June 11, 2023, 3:00pm

As a caveat, Struo2 was developed outside our group, so if this error is on their end it will be trickier for us to diagnose/solve. That said, the error you’re seeing looks like it’s arising from your SAM output having an unexpected structure, potentially because the columns are shifted. Where it’s not happening with every sample, my first guess would be that there is something funky going on with the read names in some of the samples that is disrupting the output.

If you can share a full SAM alignment row from a sample that worked vs. one that didn’t that might point to an answer.

Topic		Replies	Views
Need help with humann tool HUMAnN	3	230	February 28, 2023
Humann nucleotide alignment HUMAnN	9	140	June 24, 2024
ValueError starting humann2 HUMAnN	1	584	April 7, 2020
Humann error database version HUMAnN	1	405	June 28, 2022
Error when running HUMAnN HUMAnN	5	485	October 10, 2023

Humann3 error for some samples but not others

Related topics