Kneaddata Reformatting file sequence identifiers ... Type error

Rachael-16 · April 30, 2021, 2:05am

Hi there I’m currently trying to run Kneaddata on a high performance computing cluster. I installed Kneadata using the pip install into a virtual environment, downloaded the indexed human genome database.

command :
kneaddata --input STJ-182-d5-709_S137_L001_R1_001.fastq.gz --input STJ-182-d5-709_S137_L001_R2_001.fastq.gz -db ~/human --output kneaddata_STJ-182_d5 --trimmomatic $PATH_to_Trimmomatic

I’m getting the following error:

Any guidance anyone could offer to resolve this would be much appreciated.

Blockquote

Decompressing gzipped file …

Reformatting file sequence identifiers …

Traceback (most recent call last):
File “/home/rach06/kneaddata/bin/kneaddata”, line 8, in
sys.exit(main())
File “/home/rach06/kneaddata/lib/python3.6/site-packages/kneaddata/knead_data.py”, line 427, in main
args.input[index]=utilities.get_reformatted_identifiers(args.input[index],args.output_dir, temp_output_files)
File “/home/rach06/kneaddata/lib/python3.6/site-packages/kneaddata/utilities.py”, line 258, in get_reformatted_identifiers
os.write(file_out, “”.join(lines))
TypeError: a bytes-like object is required, not ‘str’’

sagunmaharjann · June 3, 2021, 4:06pm

Hi @Rachael-16,

Apologies for the late reply. It looks like there is some problem while kneaddata is trying to reformat the sequence identifier of R1 and R2. Would it be possible to provide me the version of the kneaddata and the first 4 lines of --input STJ-182-d5-709_S137_L001_R1_001.fastq.gz --input STJ-182-d5-709_S137_L001_R2_001.fastq.gz please ?

Regards,
Sagun

larabrian · January 3, 2022, 4:38pm

The reason for this error is that in Python 3, strings are Unicode, but when transmitting on the network, the data needs to be bytes instead. We can convert bytes to string using bytes class decode() instance method, So you need to decode the bytes object to produce a string. In Python 3 , the default encoding is “utf-8” , so you can use directly:

b"python byte to string".decode("utf-8")

Python makes a clear distinction between bytes and strings . Bytes objects contain raw data — a sequence of octets — whereas strings are Unicode sequences . Conversion between these two types is explicit: you encode a string to get bytes, specifying an encoding (which defaults to UTF-8); and you decode bytes to get a string. Clients of these functions should be aware that such conversions may fail, and should consider how failures are handled.

Topic		Replies	Views
Python error when running kneaddata KneadData	0	282	August 30, 2022
Error when using --bypass-trim KneadData	0	325	August 11, 2023
Error in trimmomatic while running kneaddata KneadData	3	609	August 17, 2023
Java error after new install KneadData	1	177	December 6, 2024
Invalid or corrupt jar file error from Trimmomatic KneadData	0	38	November 1, 2024

Kneaddata Reformatting file sequence identifiers ... Type error

Related topics