Humann nucleotide alignment

Jtoor · April 29, 2024, 8:54pm

Hi,

Thank you so much for the development of HUMAnN!

I have performed shotgun sequencing of several mouse fecal samples. My reads are 150bp.

My issue is with the nucleotide alignment step using HUMAnN. For my first run I used:

humann --input sample_cat.fasta.gz --output sampleoutput/ --memory-use maximum --threads 150

Here, I used Uniref90 with default parameters:
SEARCH MODE
search mode = uniref90
nucleotide identity threshold = 0.0
translated identity threshold = 80.0

ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 50.0
translated query coverage threshold = 90.0
nucleotide subject coverage threshold = 50.0
nucleotide query coverage threshold = 90.0

From this, my samples had an output of “Unaligned reads after nucleotide alignment: 88.7167874589 %” All of my samples ranged from 85%-92% here.

The translated alignment output was “Unaligned reads after translated alignment: 58.3685585936 %”. My other samples ranged from 30-60%.

After reading many posts, I decided to relax the settings and use Uniref50:

humann --input sample.fasta.gz --output sampleoutput/ --search-mode uniref50 --translated-subject-coverage-threshold 0.0 --nucleotide-subject-coverage-threshold 0.0 --nucleotide-query-coverage-threshold 50.0 --translated-query-coverage-threshold 50.0 --memory-use maximum --threads 150

SEARCH MODE
search mode = uniref50
nucleotide identity threshold = 0.0
translated identity threshold = 50.0

ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --sensitive --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 0.0
translated query coverage threshold = 50.0
nucleotide subject coverage threshold = 0.0
nucleotide query coverage threshold = 50.0

Here my output was: “Unaligned reads after nucleotide alignment: 88.4718670763 %” which was slightly worse.

The translated alignment was “Unaligned reads after translated alignment: 29.4553120544 %”

Lastly, I relaxed the settings even more:

humann --input sample.fasta.gz --output sampleoutput/ --search-mode uniref50 --translated-subject-coverage-threshold 0.0 --nucleotide-subject-coverage-threshold 0.0 --nucleotide-query-coverage-threshold 0.0 --translated-query-coverage-threshold 50.0 --memory-use maximum --threads 150

SEARCH MODE
search mode = uniref50
nucleotide identity threshold = 0.0
translated identity threshold = 50.0

ALIGNMENT SETTINGS
bowtie2 options = --very-sensitive
diamond options = --top 1 --sensitive --outfmt 6
evalue threshold = 1.0
prescreen threshold = 0.01
translated subject coverage threshold = 0.0
translated query coverage threshold = 50.0
nucleotide subject coverage threshold = 0.0
nucleotide query coverage threshold = 0.0

Here, I got “Unaligned reads after nucleotide alignment: 88.4718670763 %”
“Unaligned reads after translated alignment: 29.4552988863 %”
This is exactly the same as my last run.

My question is, is there something that explains the low nucleotide alignment? Am I doing something wrong? The nucleotide alignment did not improve after relaxing the parameters, so should I just use the first run with Uniref90? What results should I trust?

Thanks so much for all the help and sorry for all the questions!

cgar · May 17, 2024, 3:46pm

Hi,
Have you been able to solve this problem?
I have exactly the same problem, and the taxonomic results I find don’t make much biological sense.

Jtoor · May 17, 2024, 8:43pm

Hi cgar,

No, I have not been able to solve this problem. I am hoping that someone from the HUMANN development team could supply some answers.

Jtoor · May 20, 2024, 1:41pm

@franzosa

Hi, i am hoping you can give some insights here. Thanks so much!

Jtoor · May 22, 2024, 2:03pm

Hi all,

I just ran HUMANN again with nucleotide bypass and the translated unalignment was only 26%.

humann --input WT_819_cat.fasta.gz --output /bypass_nucl/ --protein-database /home/uniref90/uniref/ --search-mode uniref90 --memory-use maximum --threads 150 --bypass-nucleotide-search

I believe that the chocophlan database may not have contained many of the species that were identified in Metaphlan4, so they were counted as unaligned reads.

franzosa · June 20, 2024, 9:11pm

Sorry for missing this thread! If the issue is that HUMAnN isn’t finding the species in your sample (because they aren’t in our database), then relaxing parameters won’t improve nucleotide alignment, but it will improve species-agnostic protein-level alignment, just as you’re seeing. As HUMAnN 4 transitions to MetaPhlAn 4’s SGB model it will do a better job identifying and mapping to species from the murine microbiome during the nucleotide search phase.

Jtoor · June 20, 2024, 9:40pm

Hi Eric,

Thanks for the reply!

I am trying to make some conclusions about my data at least in terms of the overall abundance of some genes in my samples rather than what microbes are contributing to the abundance. While we wait for Humann 4.0 to be released, do you think bypassing the nucleotide alignment is okay to do?

Thanks!

franzosa · June 21, 2024, 2:57pm

Yes, while there is no harm in letting the nucleotide alignment do a small amount of alignment to the species it can find, just doing pure translated search to UniRef50 is also a fine strategy for functional profiling. This is essentially how the original HUMAnN worked before we developed the tiered search.

Jtoor · June 24, 2024, 5:32pm

Hi,

Thanks for the input! Is there a reason that you specified UniRef50 instead of Uniref90 in your reply?

franzosa · June 24, 2024, 7:12pm

UniRef50 is better for communities that you expect to have more remote homology, since it allows reads to align at 50% identity (vs. our allowed 80% for UniRef90).

Topic		Replies	Views
Swtiching from uniref90 to uniref 50 with only unclassified reads HUMAnN	1	332	February 11, 2022
Getting 67% unaligned reads with HUMANnN 3.0 HUMAnN	9	2228	June 28, 2022
Mouse stool high unaligned reads, humann 3 HUMAnN	3	598	August 10, 2021
Bowtie2 unaligned reads slow HUMAnN	14	1959	November 8, 2024
Humann3 "Unaligned reads after..." HUMAnN	0	559	November 16, 2020

Humann nucleotide alignment

Related topics