The bioBakery help forum

Pathway coverage issue from humann2

Hi, I got the “interesting” pathway coverage reported by humann2. The database I used was metaCyc. I found the community level the pathway coverage was reported as 0, whereas the same path was non-zero when is down to the species leve. I am not sure how to explain this observation (example like following, here I have two samples). Appreciate anyone can help me with it!
AST-PWY : L-arginine degradation II (AST pathway) 0.0000000000 0.0000000000

AST-PWY : L-arginine degradation II (AST pathway)|g__Escherichia.s__Escherichia_coli 0.0688571855 0.0045202237


I notice the same phenomenon for my results with humann 3.0.0

PWY-5304: superpathway of sulfur oxidation (Acidianus ambivalens)       0.0000000000
PWY-5304: superpathway of sulfur oxidation (Acidianus ambivalens)|g__Blautia.s__Ruminococcus_torques    0.4207716664
PWY-5304: superpathway of sulfur oxidation (Acidianus ambivalens)|unclassified  0.0000000000

Is there a good explanation for this?

I understand why the converse would be true, i.e. for the unstratified/community coverage > stratified/particular species coverage. But the reverse is mysterious to me.
The abundances are roughly same order of magnitude between the Ruminococcus and unclassified:

PWY-5304: superpathway of sulfur oxidation (Acidianus ambivalens)       265.0315463137
PWY-5304|g__Blautia.s__Ruminococcus_torques     147.6274165202
PWY-5304|g__Blautia.s__Ruminococcus_torques|UniRef90_A0A396FLA1 147.6274165202109
PWY-5304|unclassified   117.4041297935
PWY-5304|unclassified|UniRef90_D4MZY6   117.40412979351032

I am asking in part because in an answer to this question Issue on metacyc pathway coverage , the recommendation is to disregard the coverage in favor of abundance > 0. But, I have found that in some instances as this one, the pathway abundance is based on just a single gene/reaction (I realize you are using more detailed information on the makeup of the pathway, but in the case of this pathway and carefully examining other omics info from our particular samples, it could be a false positive inference). I notice these single gene pathway identifications are usually when coverage is 0. So, I am curious if I can use this to do some filtering. Thanks!

(Side note: what humann is identifying as Ruminococcus torques seems to actually be a sequence from Eubacterium rectale. See A0A396FLA1 in UniProt. Also confirmed by running kraken on the raw reads of this particular sample and on the assembly and blasting the contig.)