We are using the stratified KOs generated with HUMAnN 3.0, and Metaphlan 3. Going through the main contributors to a KO, we sometimes hit really paradoxical results, two examples are:
- O-antigen synthesis protein (K13012) is mapped to F. prausnitzii as main contributor to the KO abundance. But F. prausnitzii is a Gram + bacterium, and O-antigen is generally associated with Gram - bacteria
- K04769 is described as sporulation-associated protein, but among its main contributors there are F. prausnitzii and Eubacterium siraeum, none of which are spore forming bacteria
Do you expect situations like these two? How can we confidently reconcile these counterintuitive findings?
Thanks for your work,
Giacomo
1 Like
This is a tough one. I agree that the biology doesn’t make much sense, and I don’t see outside support for O-antigen synthesis in F. prausnitzii (I didn’t dive into the second example as deeply). Yet KEGG seems to label it pretty confidently, and they have very rigorous definitions of how they annotate KOs (each one has a profile HMM and a threshold bit score for assignment of that KO to a query protein). UniProt in turn sources KO annotations from KEGG and we source them from UniProt.
So my best guesses would be 1) that these are false positive annotations OR 2) the HMM definitions of these KOs are too broad. #2 seems to be supported by the Pfam domain KEGG associates with this KO, which represents a more generic sugar transferase activity.