Humann nucleotide alignment

Hi,

Thank you so much for the development of HUMAnN!

I have performed shotgun sequencing of several mouse fecal samples. My reads are 150bp.

My issue is with the nucleotide alignment step using HUMAnN. For my first run I used:

humann --input sample_cat.fasta.gz --output sampleoutput/ --memory-use maximum --threads 150

Here, I used Uniref90 with default parameters:
SEARCH MODE
search mode = uniref90
nucleotide identity threshold = 0.0
translated identity threshold = 80.0

ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 50.0
translated query coverage threshold = 90.0
nucleotide subject coverage threshold = 50.0
nucleotide query coverage threshold = 90.0

From this, my samples had an output of “Unaligned reads after nucleotide alignment: 88.7167874589 %” All of my samples ranged from 85%-92% here.

The translated alignment output was “Unaligned reads after translated alignment: 58.3685585936 %”. My other samples ranged from 30-60%.

After reading many posts, I decided to relax the settings and use Uniref50:

humann --input sample.fasta.gz --output sampleoutput/ --search-mode uniref50 --translated-subject-coverage-threshold 0.0 --nucleotide-subject-coverage-threshold 0.0 --nucleotide-query-coverage-threshold 50.0 --translated-query-coverage-threshold 50.0 --memory-use maximum --threads 150

SEARCH MODE
search mode = uniref50
nucleotide identity threshold = 0.0
translated identity threshold = 50.0

ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --sensitive --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 0.0
translated query coverage threshold = 50.0
nucleotide subject coverage threshold = 0.0
nucleotide query coverage threshold = 50.0

Here my output was: “Unaligned reads after nucleotide alignment: 88.4718670763 %” which was slightly worse.

The translated alignment was “Unaligned reads after translated alignment: 29.4553120544 %”

Lastly, I relaxed the settings even more:

humann --input sample.fasta.gz --output sampleoutput/ --search-mode uniref50 --translated-subject-coverage-threshold 0.0 --nucleotide-subject-coverage-threshold 0.0 --nucleotide-query-coverage-threshold 0.0 --translated-query-coverage-threshold 50.0 --memory-use maximum --threads 150

SEARCH MODE
search mode = uniref50
nucleotide identity threshold = 0.0
translated identity threshold = 50.0

ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --sensitive --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 0.0
translated query coverage threshold = 50.0
nucleotide subject coverage threshold = 0.0
nucleotide query coverage threshold = 0.0

Here, I got “Unaligned reads after nucleotide alignment: 88.4718670763 %”
“Unaligned reads after translated alignment: 29.4552988863 %”
This is exactly the same as my last run.

My question is, is there something that explains the low nucleotide alignment? Am I doing something wrong? The nucleotide alignment did not improve after relaxing the parameters, so should I just use the first run with Uniref90? What results should I trust?

Thanks so much for all the help and sorry for all the questions!

Hi,
Have you been able to solve this problem?
I have exactly the same problem, and the taxonomic results I find don’t make much biological sense.

Hi cgar,

No, I have not been able to solve this problem. I am hoping that someone from the HUMANN development team could supply some answers.

@franzosa

Hi, i am hoping you can give some insights here. Thanks so much!

Hi all,

I just ran HUMANN again with nucleotide bypass and the translated unalignment was only 26%.

humann --input WT_819_cat.fasta.gz --output /bypass_nucl/ --protein-database /home/uniref90/uniref/ --search-mode uniref90 --memory-use maximum --threads 150 --bypass-nucleotide-search

I believe that the chocophlan database may not have contained many of the species that were identified in Metaphlan4, so they were counted as unaligned reads.