The output of KneadData differs when threads are used

KneadData Version: commit 90c05f3cd25a8c74a4d804f27dd0ce52718007d8

The size of the output FASTQ file is different when the -t argument is added.

# t=8

-rw-r--r-- 1 root root 6126995024 Dec 24 06:13 sample1136_t8.repeats.removed.1.fastq

-rw-r--r-- 1 root root 6108005668 Dec 24 06:14 sample1136_t8.repeats.removed.2.fastq

# t=1

-rw-r--r-- 1 root root 6105388071 Dec 24 07:04 sample1136_t1.repeats.removed.1.fastq

-rw-r--r-- 1 root root 6084203934 Dec 24 07:05 sample1136_t1.repeats.removed.2.fastq

I found that it might be caused by line 113 in kneaddata/
Since only the first temporary file is appended to the tempfile_list, the trf command actually executes only one parallel temporary file.
I suggest adding tempfile_written_list.append(tempfile_list[output_file_number]) and datfile_to_write_list.append(datfile_list[output_file_number]) after line 123. This can fix the issue.

for read_line in utilities.read_file_n_lines(input,2):
            if not file_handle_write:
                file_handle_write = open(tempfile_list[output_file_number],"wt")

            if lines_written > lines_per_file:
              ### tempfile_written_list.append(tempfile_list[output_file_number]) ###
              ### datfile_to_write_list.append(datfile_list[output_file_number]) ###
                file_handle_write = open(tempfile_list[output_file_number],"wt")

Since this post has been temporarily hidden for several days, I have already found a new solution and posted it in the following topic.

Hi @weichi ,
Thank you for reaching out to the bioBakery Lab and pointing out the “TRF parallel run” issue + fix. I have added your email and username in the Contribution section of Kneaddata User’s Manual Here: kneaddata/ at master · biobakery/kneaddata · GitHub. Please let me know if you would like to change any of your information listed here. Thank you again.


Thank you for reviewing and accepting my PR! I appreciate it.
The information looks correct, and no changes are needed at this time.