Kneaddata trim options, file headers, and consistent options flags

SLS · January 13, 2022, 11:43am

Hello,

I’ve been browsing the forum and the github site for clarification on things, but I’m finding disparate information that I think should be in the manual, and is not, and so I’d like to know some things about how kneaddata can be operated and what to expect in the outputs. I’m tired of having to do multiple runs just because the flags are ambiguous.

My latest download install as of around Jan 8 2022 using both pip and git clone still says I have version 0.10.0, when this should be 11. Any clarification? fix in the reporting underway? How to validate that I have latest even if it self reports wrong?
As some users have mentioned before, the use of PE data results in mixed order of sequences in the output, which is problematic for other assembly steps or non-biobakery processes. I’ve read replies that one should pass the option flag --reorder but this is not listed in the manual of options. Is this still needed in the latest version? Can the manual be updated to specify this procedure and let users know to expect mixed order sequences in output?
I’m unclear about what --bypass-trim actually bypasses. Is it only bypassing trimmomatic? Or does it also bypass trimming repeats with TRF? I had assumed only the former because there is another flag to --bypass-trf, but then there is the --run-trim-repetitive option to remove overrepresented sequences (recommended for metagenomic data), but is this separate from TRF? Does --run-trim-repetitive need to be always specified to run based on fastqc, as in --run-trim-repetitive --fastqc FastQC ? Can this please be clarified here as well as in the manual? There are too many processes that use “trim” in the name of the activity and I think this lends to confusion.
Finally, the file header format - I have a separate script that is reformatting my sequence headers for kneaddata PE to have no spaces and the /1 & /2 at the end to keep track of pairs. Is this still needed? I notice that there is still a “reformatting headers” component of the run, so I’m wondering what is happening here if I have manually reformatted to what I thought was required by kneaddata.

Thanks a lot for your time and help with this.
Best,
Stephanie

Topic		Replies	Views
Trim Overrepresented Effect Downstream KneadData	1	719	July 9, 2021
--run-trim-repetitive and --sequencer-source not part of Docker, Conda, or Pip KneadData	1	799	November 24, 2020
Kneaddata paired output have different names KneadData	0	27	September 19, 2024
Kneaddata MINLEN behaviour unexpected; Adapters remain unremoved KneadData	0	363	March 30, 2022
Problem with NovaSeq sequenced data KneadData	8	1334	October 20, 2020

Kneaddata trim options, file headers, and consistent options flags

Related topics