Humann 4.0 alpha in workflow

Hi. I just started using humann 4.0. I’m running with slurm though the wmgx workflow. The files that output into scratch for humann have numbers appended in the middle, but the script doesn’t recognize the format, so when the workflow tries to copy them to the main output files, it throws a file not found error. Kneaddata and metaphlan are both running correctly, and humann is creating the correct output data, but just the wrong file names. I’ve included information below. I can also email over log files as needed.

Best,
Artemis

tool versions:
biobakery_workflows v3.1
kneaddata v0.12.0
MetaPhlAn version 4.1.0 (23 Aug 2023)
humann v4.0.0.alpha.1

command:

export HOST="human"
export KNEADDATA_DB_HUMAN_GENOME=/data/databases/kneaddata_2023/${HOST}
export METAPHLAN_DB=/data/databases/biobakery/bb4/metaphlan
export CHOCOPHLAN_DB=/data/databases/biobakery/bb4/humann/chocophlan_v4_alpha
export METAPHLAN_INDEX=mpa_vOct22_CHOCOPhlAnSGB_202403

biobakery_workflows wmgx --input $1 --output $2 --threads 10 --pair-identifier _R1 \
  --grid-jobs 10 --grid slurm --grid-scratch $2/scratch --grid-partition="defq" \
  --grid-tasks="humann,10000,75000,10" \
  --grid-environment="
source ~/miniforge3/etc/profile.d/conda.sh
conda activate ~/miniforge3/envs/biobakery4
export KNEADDATA_DB_HUMAN_GENOME=/data/databases/kneaddata_2023/${HOST}" \
  --contaminate-databases ${KNEADDATA_DB_HUMAN_GENOME}/ \
  --skip-nothing --remove-intermediate-output --bypass-strain-profiling \
  --qc-options="--max-memory=1000m --run-trf \
    --trimmomatic=~/biobakery_workflows_databases/Trimmomatic-0.39/ \
    --trf  ~/miniforge3/envs/biobakery4/bin/" \
  --taxonomic-profiling-options="--add_viruses --bowtie2db=${METAPHLAN_DB} \
    --index ${METAPHLAN_INDEX} --unclassified_estimation -t rel_ab_w_read_stats" \
  --functional-profiling-options="--nucleotide-database ${CHOCOPHLAN_DB} \
    --protein-database /data/databases/biobakery/bb4/humann/uniref90/uniref/ --remove-stratified-output  --memory-use minimum "

output example for on sample:

$ ls biobakery4_out/scratch/humann/main/S01*
biobakery4_out/scratch/humann/main/S01_2_genefamilies.tsv
biobakery4_out/scratch/humann/main/S01_3_reactions.tsv
biobakery4_out/scratch/humann/main/S01_4_pathabundance.tsv
biobakery4_out/scratch/humann/main/S01.log

error file output:

$ cat biobakery4_out/slurm_files/task_61_*.err
cp: cannot stat ‘biobakery4_out//scratch/humann/main/S01_genefamilies.tsv’: No such file or directory

Apologies, you are right that HUMAnN 4’s output formats won’t be consistent with the workflows’ current expected target structures. We still need to publish an update to the workflows that will work with HUMAnN 4. I will try to get this done ASAP.

1 Like

We have pushed an update to the workflows to make them compatible with HUMAnN 4’s updated operation and target structure: