The bioBakery help forum

Setting abundance percentage cutoff to 0. Confusing results

I’ve being using Humann2 for taxanomic and functional profiling of metagenomic samples along with total RNA seq analysis and I’ve been seeing some confusing results. Aligning RNA reads to the Refseq database I identified a significant proportion or reads which aligned to mycobacterium species, looking at the humann2 pathway abundance I can see the mycobacterium peptidoglycan synthesis pathway was also id’d in most of the samples.

However, metaphlan2 was unable to ID mycobacterium in any of the samples. I’m thinking that the abundance may be very low, given that can I run metaphlan2 with a minimum abundance filter set to 0 so I see absolutely everything ID’d within the sample, even if it’s only a few reads? If not, can anyone suggest why I can find mycobacterial reads from RNA data and humann2 analysis but not metaphlan?

HUMAnN2 uses MetaPhlAn2 to decide which species to work with, so something here doesn’t make sense. How are you determining that your species is found by HUMAnN2 but not MetaPhlAn2?

Hi franzosa, humann2 identified mycobacterium specific pathways in the pathwayabundnace output. I have also tried aligning the input reads to a Mycobacterium reference genome using bowtie2 and there is a small percentage of reads which are mapping to it.

But does your HUMAnN2 output have lines like:

PathwayXYZ|g__Mycobacterium.s__Mycobacterium_tuberculosis  12.345

? This would be HUMAnN2’s way of saying 1) I found PathwayXYZ, 2) I attributed some copies to M. tuberculosis, where 3) the pathway had 12.345 RPK of coverage. Such lines would only be possible if MetaPhlAn2 had detected _M. tuberculosis.

If instead the output lines look like

PathwayXYZ  12.345
PathwayXYZ|unclassified  12.345

Then HUMAnN2 is identifying the pathway from the translated search step. It could still be driven by reads from M. tuberculosis, but we didn’t see enough evidence to associate the pathway with any particular species.

This is how the pathway in the output is shown:
PWY-6385: peptidoglycan biosynthesis III (mycobacteria)

I see. In this case (confusingly) the “mycobacteria” is part of the pathway’s official name from MetaCyc, but HUMAnN2 isn’t necessarily assigning it to any particular species. If there’s no | in the name (as in my examples above), it means that the pathway is being quantified at the community level only.

The pathway could very well be coming from Mycobacterium in your sample, HUMAnN2 just hasn’t seen enough evidence (e.g. non-zero MetaPhlAn2 abundance) to make that specific conclusion, probably due to low coverage breadth.

Thanks for your help, does metaphlan2 have a minimum abundance cuttoff that can be adjusted? Or does it output everything identified by default?

MetaPhlAn2 will output abundances for clades that recruited reads to a critical fraction of markers (defined by the stat_q parameter), regardless of their final relative abundance. Lowering this fraction will find more clades but also increase the risk of false positive detection events.

Hi Eric,

I am having the same issue in my project which I ran both HUMAnN3 and Metaphlan3.

The pathway PWY-6385 which I found different among my groups has been assigned to several different genus & species but not “mycobacteria” which seems strange to me (according to the description, it should only be encoded by mycobacteria).

And I further looked into my Metaphlan3 results, no mycobacteria species was detected among all the subjects. In fact, I looked into several projects I’ve done with Metaphlan2 or 3 before and none of them have the abundance of mycobacteria.

Could you explain more on these issue and the possible reason? Thank you so much for your kindly help.

Best regards,
James

One of two things is probably happening: 1) the taxonomic range annotated to the pathway in MetaCyc is conservative (and the pathway’s reactions can actually be found in a broader set of bugs) OR 2) there is a similar pathway in your sample (differing by only a few reactions) that HUMAnN has a hard time distinguishing from this pathway, possibly due to gap filling decisions.