Keeping/identifying host reads with chocophlan/metaphlan/humann?


I was trying to figure out the number of reads mapping to either human or mouse genomes in my microbiome metagenomes (which come from either of those hosts); however, I noticed in the chocophlan database I have that there is no file for Mus musculus or Homo sapiens. Is there a version of the database that includes markers for these? Further, can Humann2 handle input that is not host read depleted?

Thank you

Hi Jenny,
the ChocoPhlAn database does not include any sequences for the human genome nor the murine one.
From the BowTie2 website you can download pre-built indexes for GRCh38 or mm10 and map your metagenomes against it for identifying the host reads.
HUMAnN can handle not host depleted metagenomes but it should be best practice to remove the host reads.


To expand the previous reply, “unclassified” output from a non-host-depleted sample could reflect host contamination. I would either ignore those output rows OR run with the --bypass-translated-search flag to restrict the analysis to microbial ORFs.