Percent mapped reads

Hi, I am new to HUMAnN3 (and HUMAnN) in general. I have no problem installing it using conda. I compared the mapping of the demo fastq file before and after installing the complete ChocoPhlAn and UniRef90 databases. Surprisingly, the percent mapped reads only improved marginally from

Unaligned reads after nucleotide alignment: 88.3095238095 %
to
Unaligned reads after nucleotide alignment: 87.3714285714 %, for ChocoPhIAn, and

Unaligned reads after translated alignment: 83.6190476190 %
to
Unaligned reads after translated alignment: 80.3904761905 %, for UniRef90.

Is there a mistake? Or is it common that only 20% reads map for a typical stool sample? (I understand that this may be a limitation of the NCBI database/ annotation rather than the pipeline per se).

Thanks,
Choon

This is expected since we are still using the HUMAnN 2.0 demo dataset in the 3.0 alpha release, and it’s very shallow. While many of the reads hit genes in the demo and full databases, they don’t do so with sufficient coverage to believe that those genes are actually present (hence the reads are reported as unexplained). If you turn down/off the subject coverage filters you should see the majority of reads explained.

We’ll be making an improved demo for the official v3.0 release.

For a real stool sample I’d expect 50-80% of reads to be mapped by HUMAnN.

Thanks for the quick response! :smile: