All paired-end read unmatched

Hi, I just came across the same issue.

When using KneadData, it appears that the software first outputs a SAM file and then processes this file to determine the mapping results. My understanding is that during this post-processing step, the software identifies paired reads by examining the suffix of each read’s name, looking for either ‘/1’ or ‘/2’ to differentiate between the two ends of a pair.

While this approach works seamlessly with raw sequencing data, I have found that when working with data obtained from public databases, the read names are often sanitized, and the distinguishing ‘/1’ or ‘/2’ suffixes are removed. This could potentially lead to misidentification of paired reads during the post-processing phase.

Bowtie2 actually offers built-in options to handle such cases elegantly. The --un-conc and --un parameters in bowtie2 are specifically designed to output unmapped reads in a way that retains the paired-end information, even when the read names have been altered or are absent of these suffixes.

Could you maybe include an option for users to enable bowtie2’s --un-conc and --un parameters during the mapping process? This would allow for better handling of paired-end reads with modified names.