No unclassified fraction when using humann2 with uniref100

Franziska · August 11, 2020, 11:25am

Hi there!
I ran humann2 with UniRef100 as a protein database on some metatranscriptome samples from Schirmer et al. 2018. The log files look fine to me and it seems that diamond ran nicely with UniRef100.
However, the resulting gene families files have no unclassified fraction whatsoever.

But, if I understand the documentation correctly, all reads that are passed over to diamond are assigned to the “unclassified” species bin.
Am I missing something obvious?

Thanks for your help!
Franziska

lauren.j.mciver · August 24, 2020, 6:08pm

Hello Franziska, Thank you for the detailed post. It sounds like the translated search results were filtered out and so they do not appear in the final gene families output file. You are right in that any translated search results in the gene families output file should have the “unclassified” stratification. Would you check the HUMAnN log to see if all of the translated search alignments are filtered. There are counts for each filtering type in the log. Another check would be to look at the annotations for the UniRef100 database you generated to make sure they follow the custom format required for HUMAnN of gene_family (defaults to 1000 base gene length) or gene_family|gene_length. Please feel free to post with any other issues/questions.

Thank you,
Lauren

Topic		Replies	Views
Eukaryotic Uniref90 Gene Families in Gene Family TSV Files HUMAnN	20	2285	April 2, 2020
Guidance on UniRef database contents/filtering HUMAnN	3	1282	June 26, 2020
Humann3 => only unclassified results \| but not with humann2 HUMAnN	6	727	September 21, 2023
Running HUMAnN: pre-computed protein blastx M8 input HUMAnN	8	568	June 8, 2022
Some questions about Humann3 output HUMAnN	1	169	January 12, 2024

No unclassified fraction when using humann2 with uniref100

Related topics