KneadData Version: commit 90c05f3cd25a8c74a4d804f27dd0ce52718007d8
The size of the output FASTQ file is different when the -t
argument is added.
e.g.
# t=8
-rw-r--r-- 1 root root 6126995024 Dec 24 06:13 sample1136_t8.repeats.removed.1.fastq
-rw-r--r-- 1 root root 6108005668 Dec 24 06:14 sample1136_t8.repeats.removed.2.fastq
# t=1
-rw-r--r-- 1 root root 6105388071 Dec 24 07:04 sample1136_t1.repeats.removed.1.fastq
-rw-r--r-- 1 root root 6084203934 Dec 24 07:05 sample1136_t1.repeats.removed.2.fastq
I found that it might be caused by line 113
in kneaddata/trf_parallel.py
.
Since only the first temporary file is appended to the tempfile_list, the trf command actually executes only one parallel temporary file.
I suggest adding tempfile_written_list.append(tempfile_list[output_file_number])
and datfile_to_write_list.append(datfile_list[output_file_number])
after line 123
. This can fix the issue.
file_handle_write=None
for read_line in utilities.read_file_n_lines(input,2):
if not file_handle_write:
file_handle_write = open(tempfile_list[output_file_number],"wt")
tempfile_written_list.append(tempfile_list[output_file_number])
datfile_to_write_list.append(datfile_list[output_file_number])
file_handle_write.write("".join(read_line))
lines_written+=2
if lines_written > lines_per_file:
file_handle_write.close()
lines_written=0
output_file_number+=1
### tempfile_written_list.append(tempfile_list[output_file_number]) ###
### datfile_to_write_list.append(datfile_list[output_file_number]) ###
file_handle_write = open(tempfile_list[output_file_number],"wt")
file_handle_write.close()