Zero community-level pathways detected despite gene families present

Hi,

I’m running HUMAnN 3.9 on my samples and want to confirm my understanding of the pathway output on five of the samples.

My command:

humann \
	-i "${local_work_dir}/${base}"_host_rm_combined.fq.gz \
	-o ${local_work_dir} \
	--threads 8 \
	--taxonomic-profile "${local_work_dir}/${base}"_profile_abs.forHUMAnN.txt \
	--memory-use minimum \
	--output-basename ${base}

Observed pattern across the five samples:

  • genefamilies.tsv: contains data

  • pathabundance.tsv and pathcoverage.tsv: only show:

  UNMAPPED      0.0000000000  
  UNINTEGRATED  0.0000000000

Sample breakdown:

  • Four samples: Relatively small-sized FASTQ (~0.2 Mb) + 100% unclassified MetaPhlAn profiles

  • One sample: Relatively normal-sized FASTQ (~15 Mb) + valid MetaPhlAn profile (contains taxonomic classifications)

My understanding from the documentation:

  • Pathways can be detected at both species-level and community-level

  • With 100% unclassified profiles, I wouldn’t expect species-specific pathways (it was mentioned in HUMAnN log as well)

  • However, community-level pathways should still be possible if the detected gene families form complete pathway structures

My question:

Does my output simply mean that no community-level pathways were detected across any of these samples? Meaning the gene families present don’t satisfy criteria for any pathways, regardless of whether there’s taxonomic classification?

Or is there something else I should investigate, particularly for the sample with valid taxonomic information?

Thanks!

The output you show there suggests that something weird has happened, specifically the 0 UNMAPPED reads line. Are you sure these jobs finished successfully?

But your samples are also both in the “shallow shotgun” range, and those can be a challenge for MetaPhlAn + HUMAnN, one requiring retuning of the search stringency settings to get useful output. There are definitely other posts on the forum that get into specifics on this. A 15 Mbp sample ought to be rescue-able. I’m not sure about 0.2 Mbp - that is extremely small (like ~1,000 HiSeq reads). I think even our HUMAnN demo file is larger than that and it requires custom settings and databases to produce useful output!