Hi there!
I ran humann2 with UniRef100 as a protein database on some metatranscriptome samples from Schirmer et al. 2018. The log files look fine to me and it seems that diamond ran nicely with UniRef100.
However, the resulting gene families files have no unclassified fraction whatsoever.
But, if I understand the documentation correctly, all reads that are passed over to diamond are assigned to the “unclassified” species bin.
Am I missing something obvious?
Thanks for your help!
Franziska
Hello Franziska, Thank you for the detailed post. It sounds like the translated search results were filtered out and so they do not appear in the final gene families output file. You are right in that any translated search results in the gene families output file should have the “unclassified” stratification. Would you check the HUMAnN log to see if all of the translated search alignments are filtered. There are counts for each filtering type in the log. Another check would be to look at the annotations for the UniRef100 database you generated to make sure they follow the custom format required for HUMAnN of gene_family
(defaults to 1000 base gene length) or gene_family|gene_length
. Please feel free to post with any other issues/questions.
Thank you,
Lauren