The bioBakery help forum

Kneaddata Reformatting file sequence identifiers ... Type error

Hi there I’m currently trying to run Kneaddata on a high performance computing cluster. I installed Kneadata using the pip install into a virtual environment, downloaded the indexed human genome database.

command :
kneaddata --input STJ-182-d5-709_S137_L001_R1_001.fastq.gz --input STJ-182-d5-709_S137_L001_R2_001.fastq.gz -db ~/human --output kneaddata_STJ-182_d5 --trimmomatic $PATH_to_Trimmomatic

I’m getting the following error:

Any guidance anyone could offer to resolve this would be much appreciated.

Blockquote

Decompressing gzipped file …

Reformatting file sequence identifiers …

Traceback (most recent call last):
File “/home/rach06/kneaddata/bin/kneaddata”, line 8, in
sys.exit(main())
File “/home/rach06/kneaddata/lib/python3.6/site-packages/kneaddata/knead_data.py”, line 427, in main
args.input[index]=utilities.get_reformatted_identifiers(args.input[index],args.output_dir, temp_output_files)
File “/home/rach06/kneaddata/lib/python3.6/site-packages/kneaddata/utilities.py”, line 258, in get_reformatted_identifiers
os.write(file_out, “”.join(lines))
TypeError: a bytes-like object is required, not ‘str’’

Hi @Rachael-16,

Apologies for the late reply. It looks like there is some problem while kneaddata is trying to reformat the sequence identifier of R1 and R2. Would it be possible to provide me the version of the kneaddata and the first 4 lines of --input STJ-182-d5-709_S137_L001_R1_001.fastq.gz --input STJ-182-d5-709_S137_L001_R2_001.fastq.gz please ?

Regards,
Sagun