I am running 150 bp PE metagenomic reads from soil samples on a university HPC cluster and the diamond step does not finish after running for 7 days. I am using the uniref50 database and it is the only one in my data/uniref folder.
I previously did QC with trimmomatic so did not use knead data, and I concatenated my QC forward and reverse reads for input. I had the impression from reading the docs that knead data was not necessary, but after looking into knead data more I see that it can filter rRNA. So I’m wondering if filtering rRNA with knead data prior to running humann is suggested reduce run time?
- I would greatly appreciate some clarification as to if removing rRNA with knead data would reduce humann’s run time.
- If indeed filtering rRNA with knead data is recommended, could I filter the bowtie2_unaligned.fa file from humann and proceed with --resume? Or would I need to delete all the intermediate temp output files and start over?