Hi, I just came across the same issue.
When using KneadData, it appears that the software first outputs a SAM file and then processes this file to determine the mapping results. My understanding is that during this post-processing step, the software identifies paired reads by examining the suffix of each read’s name, looking for either ‘/1’ or ‘/2’ to differentiate between the two ends of a pair.
While this approach works seamlessly with raw sequencing data, I have found that when working with data obtained from public databases, the read names are often sanitized, and the distinguishing ‘/1’ or ‘/2’ suffixes are removed. This could potentially lead to misidentification of paired reads during the post-processing phase.
Bowtie2 actually offers built-in options to handle such cases elegantly. The --un-conc
and --un
parameters in bowtie2 are specifically designed to output unmapped reads in a way that retains the paired-end information, even when the read names have been altered or are absent of these suffixes.
Could you maybe include an option for users to enable bowtie2’s --un-conc
and --un
parameters during the mapping process? This would allow for better handling of paired-end reads with modified names.