I am a fan of the simplicity and user friendliness of kneaddata for users working with human-derived microbiome samples but I have the experience that the current implementation as a Python-only based pipeline running jobs sequentially does not work for very deeply-sequenced samples.
Due to the large uncompressed intermediated files, kneaddata needed more > 1 TB of memory for some of my deeply sequenced metagenomic samples and took multiple days to align the sequencing data on a single machine using multiple processors. To be able to process these samples, I implemented the steps of kneaddata (Trimmomatic, BowTie2, TRF) into a Snakemake-based workflow that allowed me to distribute the steps over a large computing cluster and reduce the amount of temporary files kept for each step.
I am happy to make this workflow available to others via GitHub, however, I was wondering whether there are already plans from the biobakery team to switch to a workflow management system? If this is the case, I would prefer to contribute such an attempt, if appreciated, rather than spending further time on a separate adaptation.