Hi,
Thank you so much for the development of HUMAnN!
I have performed shotgun sequencing of several mouse fecal samples. My reads are 150bp.
My issue is with the nucleotide alignment step using HUMAnN. For my first run I used:
humann --input sample_cat.fasta.gz --output sampleoutput/ --memory-use maximum --threads 150
Here, I used Uniref90 with default parameters:
SEARCH MODE
search mode = uniref90
nucleotide identity threshold = 0.0
translated identity threshold = 80.0
ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 50.0
translated query coverage threshold = 90.0
nucleotide subject coverage threshold = 50.0
nucleotide query coverage threshold = 90.0
From this, my samples had an output of “Unaligned reads after nucleotide alignment: 88.7167874589 %” All of my samples ranged from 85%-92% here.
The translated alignment output was “Unaligned reads after translated alignment: 58.3685585936 %”. My other samples ranged from 30-60%.
After reading many posts, I decided to relax the settings and use Uniref50:
humann --input sample.fasta.gz --output sampleoutput/ --search-mode uniref50 --translated-subject-coverage-threshold 0.0 --nucleotide-subject-coverage-threshold 0.0 --nucleotide-query-coverage-threshold 50.0 --translated-query-coverage-threshold 50.0 --memory-use maximum --threads 150
SEARCH MODE
search mode = uniref50
nucleotide identity threshold = 0.0
translated identity threshold = 50.0
ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --sensitive --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 0.0
translated query coverage threshold = 50.0
nucleotide subject coverage threshold = 0.0
nucleotide query coverage threshold = 50.0
Here my output was: “Unaligned reads after nucleotide alignment: 88.4718670763 %” which was slightly worse.
The translated alignment was “Unaligned reads after translated alignment: 29.4553120544 %”
Lastly, I relaxed the settings even more:
humann --input sample.fasta.gz --output sampleoutput/ --search-mode uniref50 --translated-subject-coverage-threshold 0.0 --nucleotide-subject-coverage-threshold 0.0 --nucleotide-query-coverage-threshold 0.0 --translated-query-coverage-threshold 50.0 --memory-use maximum --threads 150
SEARCH MODE
search mode = uniref50
nucleotide identity threshold = 0.0
translated identity threshold = 50.0
ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --sensitive --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 0.0
translated query coverage threshold = 50.0
nucleotide subject coverage threshold = 0.0
nucleotide query coverage threshold = 0.0
Here, I got “Unaligned reads after nucleotide alignment: 88.4718670763 %”
“Unaligned reads after translated alignment: 29.4552988863 %”
This is exactly the same as my last run.
My question is, is there something that explains the low nucleotide alignment? Am I doing something wrong? The nucleotide alignment did not improve after relaxing the parameters, so should I just use the first run with Uniref90? What results should I trust?
Thanks so much for all the help and sorry for all the questions!