Hi there,
I am seeking some clarification on a curious result/behavior I obtained with humann2 (v0.11.2). I am working with a set of metagenomes derived from mouse samples that are infected with C. difficile and sequenced with ~10 million reads each. I have previously isolated the strain from the mice, generated a MAG for the C. difficile strain from these metagenomes, and quantified the strain and one of its effectors (toxin B) by qPCR.
C. difficile appears in the metaphlan_bugs_list.txt as below:
MMSP_H1-1_T1_metaphlan_bugs_list.tsv:k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Peptostreptococcaceae|g__Peptostreptococcaceae_noname|s__Clostridium_difficile 0.01538
MMSP_H1-1_T7_metaphlan_bugs_list.tsv:k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Peptostreptococcaceae|g__Peptostreptococcaceae_noname|s__Clostridium_difficile 0.09357
MMSP_H1-2_T21_metaphlan_bugs_list.tsv:k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Peptostreptococcaceae|g__Peptostreptococcaceae_noname|s__Clostridium_difficile 0.12672
MMSP_H1-2_T7_metaphlan_bugs_list.tsv:k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Peptostreptococcaceae|g__Peptostreptococcaceae_noname|s__Clostridium_difficile 0.37416
MMSP_H1-3_T21_metaphlan_bugs_list.tsv:k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Peptostreptococcaceae|g__Peptostreptococcaceae_noname|s__Clostridium_difficile 0.43544
MMSP_H1-3_T7_metaphlan_bugs_list.tsv:k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Peptostreptococcaceae|g__Peptostreptococcaceae_noname|s__Clostridium_difficile 0.64551
MMSP_H2-1_T7_metaphlan_bugs_list.tsv:k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Peptostreptococcaceae|g__Peptostreptococcaceae_noname|s__Clostridium_difficile 0.01453
However, the gene family representing the glycosyltransferase toxin B (UniRef90_T3DH50) does not appear in any of the genefamilies.tsv files, but there are most certainly reads mapping to this gene (as I was able to assemble it). There are also other genes and pathways from C. difficile in the outputs. Out of interest, I checked the diamond database file (uniref90_annotated.1.1.dmnd) which was downloaded using the humann2_databases
script, and it does not appear that UniRef90_T3DH50 is listed within it. Is there an obvious reason I am missing as to why this family would have been excluded?
Thanks,
Jordan