I apologize if this is a duplicate post; I went through the forum to see if others were having similar problems and couldn’t find anything on it.
I have run HUMANn3 on my mouse microbial samples and found ~10^6 gene families (an amazing improvement from HUMANn2). These however only map to ~12 EC IDs when using the mapping file provided. When I run the UniRef90 IDs in UniProt, their database is able to map to many more EC IDs.
Does anyone have any insight on why there is such a huge discrepancy? Is there a way I can get an updated mapping file (hopefully short of making my own, I’m still a rookie bioinformatician but I can certainly give it a go!).
Thank you so much, this forum has been incredibly helpful.
Are you using updated mapping files for HUMAnN 3? The UniRef identifiers changed between v2 and 3, and if you’re using the old UniRef -> EC mapping file it might explain the low regrouping rate. Usually ~5-10% of UniRefs will have an EC annotation.
Yes, I’ve updated the mapping file for Humann3, but the majority of the ECs found in my data have 0 for all of the samples. Attached output file hopefully shows what I mean clearly.
Humann3_FwdRevMerg_lvl4ec_relab_unstratified.tsv (390 KB)
Ah ok, this looks like it might be the known issue of giving the
regroup_table script relative abundance measurements without changing the
--precision (rounding) flag to a larger number. The default (3) assumes you are working with CPM units and has the effect of rounding all of the very small relative abundances to zero.
I’m going to fix this for the next HUMAnN release. For now changing
--precision to something bigger should rescue your ECs.
This fixed my problem, thanks so much. For beginner users like me, it would be great if this were a warning in the --help text.