Thanks for your work on developing HUMAnN3, it’s a very useful tool. I’ve been having a problem that unfortunately has persisted to the current version related to detection of pathways known to present in a given sample. For example, I cannot detect the reductive acetyl-CoA synthesis pathway (Wood-Ljungdahl) in any sample I’ve looked at to date.
Databases: full Chocophlan database (full_chocophlan.v296_201901), Uniref90 full (uniref90_annotated_v201901)
What I expect: if provided wgsim-generated synthetic data derived from an acetogen (https://www.ncbi.nlm.nih.gov/assembly/GCF_000013105.1, Moorella thermoacetica), I expect to detect CODH-PWY (https://metacyc.org/META/NEW-IMAGE?type=PATHWAY&object=CODH-PWY) or another equivalent pathway
What happens: CODH-PWY is not detected.
Other information: I checked the metacyc pathway information included with HUMAaN3 and noticed this pathway/species association in the metacyc_pathways_to_organisms file:
CODH-PWY Arabidopsis thaliana,Carthamus tinctorius,Glycine max,Limnanthes douglasii,Pisum sativum,Ricinus communis,Saccharomyces cerevisiae
Since I don’t know the intent of the file, I’m not sure if this is an issue, but these species do not match those listed on Metacyc: Acetitomaculum ruminis, Acetobacterium carbinolicum, Acetobacterium woodii, Blautia producta, Clostridium formicaceticum, Eubacterium limosum, Moorella thermoacetica, Moorella thermoautotrophica, Sporomusa malonica, Sporomusa termitida, Syntrophococcus sucromutans, [Butyribacterium] methylotrophicum
Let me know if you need any additional information to help debug this issue, or if there are settings I need to change/databases I need to obtain to get my expected result.