Pathway abundance file only includes 9 species

MetaPhlAn version 4.0.6 (1 Mar 2023)

humann v3.9

I ran humann3 and it seemed to run smoothly with no issues, outputting those final 3 tsv files, but when I looked at the pathway abundance file, only 9 species were represented even though the bugs_list.tsv file has at least 55 species-level taxa listed. I’m confused about why there would only be pathways associated with such a small subset of my output.

For reference, this is the code I used to run humann (done on my institution’s HPC cluster):

module load humann/3.9

cd /scratch/g/jkirby/Matt_WGS_analysis_LK/

for fastq in 01_QC_IU_nomouse/merged_seqs/*.gz; do humann --input $fastq --output humann_output; done

If you’re using MetaPhlAn 4 I would also recommend updating to HUMAnN 4 (alpha) at this point. That would solve one possible issue, i.e. that MetaPhlAn 4 is finding new species that HUMAnN 3 is not aware of. That will be especially true if your samples are not from the human microbiome.

Another possibility is simply that the extra species are too rare (lowly abundant) to have their genomes well covered by reads. For example, we can detect a species as present with ~20% of its genome covered, but to perform pathway reconstruction we really need strong coverage of the genome (to be able to see all genes), and that requires the species to be abundant.