Is long computation time in humann3 related to not using knead data?

I am running 150 bp PE metagenomic reads from soil samples on a university HPC cluster and the diamond step does not finish after running for 7 days. I am using the uniref50 database and it is the only one in my data/uniref folder.

I previously did QC with trimmomatic so did not use knead data, and I concatenated my QC forward and reverse reads for input. I had the impression from reading the docs that knead data was not necessary, but after looking into knead data more I see that it can filter rRNA. So I’m wondering if filtering rRNA with knead data prior to running humann is suggested reduce run time?

  1. I would greatly appreciate some clarification as to if removing rRNA with knead data would reduce humann’s run time.
  2. If indeed filtering rRNA with knead data is recommended, could I filter the bowtie2_unaligned.fa file from humann and proceed with --resume? Or would I need to delete all the intermediate temp output files and start over?


Hi Grace, Thanks for reaching out! How many cores are you using for your HUMAnN run? Also how large is your input file? Unless you are running with a single core and you have a very large input file, that run time seems very long. Do you see any errors in any logs? If you expect contamination in your samples I would suggest running Kneaddata in addition to the QC you ran. The only way Kneaddata would reduce the HUMAnN run time was if it filtered a significant number of your reads (so HUMAnN had significantly less reads to align).

Thank you,