Waafle_junctions: _csv.Error: field larger than field limit (131072)

Hi, the waafle_junctions works for two cohorts, but it fails with another cohort, popping up the following [error]. I tried adding “quoting=csv.QUOTE_NONE” to “csv.reader( fh, dialect=“excel-tab” )” in the file “/home/username/pythonenvs/biobakery/lib/python3.7/site-packages/waafle/utils.py”, but it did not fix the error. Any thoughts? Thanks,

Junhui

[Traceback (most recent call last):
File “/home/username/pythonenvs/biobakery/bin/waafle_junctions”, line 11, in
sys.exit(main())
File “/home/username/pythonenvs/biobakery/lib/python3.7/site-packages/waafle/waafle_junctions.py”, line 429, in main
for mate1, mate2 in concordant_hits( p_sam ):
File “/home/username/pythonenvs/biobakery/lib/python3.7/site-packages/waafle/waafle_junctions.py”, line 256, in concordant_hits
for hit in wu.iter_sam_hits( p_sam ):
File “/home/username/pythonenvs/biobakery/lib/python3.7/site-packages/waafle/utils.py”, line 543, in iter_sam_hits
for row in csv.reader( fh, dialect=“excel-tab” ):
_csv.Error: field larger than field limit (131072)]

This sounds like you have a read name, contig name, or read sequence (i.e. one of the SAM fields) that’s very very long (>100K characters). Can you check your SAM file and see if that’s the case?

Thanks for your prompt response! The read name, contig name and read sequence are short. The length of the contig can be ~ 500k bp, but it doesn’t matter. I would be grateful if you could help with checking one sample when you have the opportunity.

The files (i.e., ERR5445742.contigs.fa.gz, ERR5445742.gff.zip) for ERR5445742 have been deposited on GitHub: GitHub - junhuili/test_sample: waafle error testing.
My apologies, the SAM file is too large to be uploaded to GitHub. The FASTQ file can be downloaded here.
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR544/002/ERR5445742/ERR5445742_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR544/002/ERR5445742/ERR5445742_2.fastq.gz