Hello,
I am currently analyzing metagenomic data using HUMAnN3. After generating and merging the gene families table, I attempted to regroup UniRef90 gene families to KEGG Orthologs (KOs) with the following command:
humann_regroup_table --input genefamilies.tsv --output genefamilies_ko.tsv --groups uniref90_ko
The output included the message:
Original Feature Count: 503760; Grouped 1+ times: 16702 (3.3%); Grouped 2+ times: 78 (0.0%)
indicating that only about 3.3% of the UniRef90 features were assigned to KOs.
I am using the UniRef90 database version uniref90_201901b_full.dmnd
, which I believe is up-to-date.
The input gene families table appears to be correctly generated and contains tens of thousands of UniRef90 IDs.
No errors or warnings occurred during the regrouping or normalization steps.
However, this KO assignment rate (~3.3%) seems unexpectedly low compared to literature reports and other analyses, where KO mapping rates often range between 50% and 80%.
Could you please advise on:
- Common reasons or factors that might cause such a low KO regrouping rate?
- Recommended checks or steps to troubleshoot or improve the KO assignment?
- Whether the sample type or environment could significantly affect the KO mapping rate?
Any insights or suggestions would be greatly appreciated.
Thank you very much for your support!
Let me know if you want me to help you post it or adapt it further!