The bioBakery help forum

Kneaddata trim options, file headers, and consistent options flags

Hello,

I’ve been browsing the forum and the github site for clarification on things, but I’m finding disparate information that I think should be in the manual, and is not, and so I’d like to know some things about how kneaddata can be operated and what to expect in the outputs. I’m tired of having to do multiple runs just because the flags are ambiguous.

  1. My latest download install as of around Jan 8 2022 using both pip and git clone still says I have version 0.10.0, when this should be 11. Any clarification? fix in the reporting underway? How to validate that I have latest even if it self reports wrong?

  2. As some users have mentioned before, the use of PE data results in mixed order of sequences in the output, which is problematic for other assembly steps or non-biobakery processes. I’ve read replies that one should pass the option flag --reorder but this is not listed in the manual of options. Is this still needed in the latest version? Can the manual be updated to specify this procedure and let users know to expect mixed order sequences in output?

  3. I’m unclear about what --bypass-trim actually bypasses. Is it only bypassing trimmomatic? Or does it also bypass trimming repeats with TRF? I had assumed only the former because there is another flag to --bypass-trf, but then there is the --run-trim-repetitive option to remove overrepresented sequences (recommended for metagenomic data), but is this separate from TRF? Does --run-trim-repetitive need to be always specified to run based on fastqc, as in --run-trim-repetitive --fastqc FastQC ? Can this please be clarified here as well as in the manual? There are too many processes that use “trim” in the name of the activity and I think this lends to confusion.

  4. Finally, the file header format - I have a separate script that is reformatting my sequence headers for kneaddata PE to have no spaces and the /1 & /2 at the end to keep track of pairs. Is this still needed? I notice that there is still a “reformatting headers” component of the run, so I’m wondering what is happening here if I have manually reformatted to what I thought was required by kneaddata.

Thanks a lot for your time and help with this.
Best,
Stephanie