The reason I start this topic is to discuss if anyone has any reason to choose a certain minlen value for the --trimmomatic-options=“MINLEN:”.
My review of literature revealed that researchers tend to either leave kneaddata on default parameters, or choose a MINLEN without justifying why that exact threshold was chosen.
I am having a hard time deciding the minimum length I want my reads to be. I understand (and expect) that this question would be answered differently depending on sequencing depth, research objectives, assembly or profiling approaches, etc.
Does anyone have any rationale to share, given context? For example, in the case of taxonomic profiling from reads, I would expect we should keep the reads closer to 150 bp (in the case of NovaSeq6000, which we used) and drop very small (20-30 bp long) ones. I feel like the few million 20 bp long reads (post trimming) won’t be much use and may even favour false positives or bias abundance calculations.
Then again, what threshold to choose? 40? 60 ? and what justifies these values ? I feel like researchers ofter just refer to what others have done, but if nobody ever question and measure the impact of such decisions, I think we’re not helping science. It might even be that choosing a minlen of 40 or 80 doesn’t really change anything, but I haven’t found anything supporting that either.
All that being said, I am looking forward to hear your rationales, or maybe sources you could point me to that I have missed (I have not been doing literature reviews for long so it is definitely possible that I might have missed important pieces).