HUMAnN 4 prescreen issue and input file format clarification

Hi all,

I’m running HUMAnN 4, and the workflow runs correctly with the demo.fastq file. However, when I try it on the other demo samples or on my own dataset, I get the following message:

“No species were selected from the prescreen.
Because of this the custom ChocoPhlAn database is empty.
This will result in zero species-specific gene families and pathways.”

My data consists of shotgun metagenomic/metatranscriptomic samples.

I also have a related question: when running MetaPhlAn alone, I normally provide both R1 and R2 FASTQ files for a sample. Since HUMAnN accepts a single input file, should I supply the scaffold/contig FASTA instead, or can HUMAnN work directly with paired-end R1 and R2 files?

Thank you.

Can you tell us more about the other samples you’re trying? What kind of environment, sequencing technology, sequencing depth?

For HUMAnN, we recommend concatenating your R1 and R2 files as a single input. HUMAnN isn’t designed for analyzing contig-level inputs: it expects QC’ed (but not assembled) short shotgun sequencing reads.

Hi @franzosa, The samples I’m working with are calf rectal swabs processed using Illumina Stranded Total RNA Prep with Ribo-Zero Plus, sequenced on a NovaSeq. MetaPhlAn 4.2.2 gives good profiles on these datasets. I’ve been trying contig-level inputs for HUMAnN. So now Following your recommendation, I’ll concatenate a few samples and test that approach but I’m still seeing the same prescreen error even when using the demo samples.