Does kneaddata 0.7.4 still require the /1 and /2 in the read ids?

smb2020 · January 20, 2021, 1:44pm

My question is how do we tell whether the kneaddata version requires the /1 and /2? Also, is there a way to download the data from the SRA to be compatible with kneaddata?

smb2020 · January 21, 2021, 6:31pm

Would greatly appreciate your guidance.

lauren.j.mciver · January 21, 2021, 11:53pm

Hello, Yes, the latest Kneaddata version still requires the pair identifiers in the read headers. It accepts both the original and new illumina formats.

Original format example: @HWUSI-EAS100R:6:73:941:1973#0/1
New format example: @EAS139:136:FC706VJ:2:2104:15343:197393 1:N:18:1

In the upcoming Kneaddata release we will allow for files without the pair identifier by specifying the read1 and read2 files on the command line.

Thank you,
Lauren

drelo · February 21, 2021, 4:46pm

Hi @lauren.j.mciver just a brief question about kneaddata I downloaded paired-end files from SRA too but they look like this with head :

@SRR6000869.1 1/1
CTATGACACACGCGTCATGGCCATGCAGAAGCAAGCTGCCGATCGTGAAGTGCCAACAGACC
+
A/AAA/EEAAEEAEEAEAEEEAAEAAEE/EEEEEEAEEEEA/AAEAEAEEEE/EE/E/E/EA
@SRR6000869.2 2/1
TATTCCCTGGAAAGGTAACTACTCCAGTTGGCTGGAACAGAAGACCAAGCGCATGGAGCAAGAGGAAAAGACCGCCAGCAAGCGCCGCAAGACGCTGGAACGCGAGCTGGAGTGGGTGCGCATGGCTCCCAAGGCCCGTCAGGCAAAGGG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6EEEEEEAEEAEEEEAEEEEAEEEEEEEEEEEEE/EEEAAEEEEAEEEEEEEEEEEEEEE<EE/EEEEAEEEEEAAEEEEEEE<AEEEEEEEEAE/EA<AAEEEAEEEEEEEEE
@SRR6000869.3 3/1
AAAAACGGCTTAGAATAGCTTTTCTTTCCACAGTTTCATTTATATTCTTACAGAAGAGTTGAAGTATTATTGCCTCGCGTACAGTTAACATCTTTTTTTCTTCAGTTTCATCATTATACAGTGTGGCATGTTTGGCATCCAGTGTGAAACG

I tried doing a simple sed replacement and the headers look like this
@SRR6000869.1#1/1 but that doesn’t seem to be working. Can I fix this doing a new replacement over the original file? Or should I use trimmomatic outside of kneaddata and use kneaddata only for removing human reads? I am afraid this workaround maybe will be harder to implement than doing both steps in a single line. I have to use this version anyway since it is in a cluster environment so I would prefer to fix this with sed before feeding it to kneaddata. Thanks for your help!
Sorry I replied with this question at the HUMANN thread I just deleted that duplicate.

lauren.j.mciver · February 22, 2021, 10:37pm

Hi, You should be able to make this change to the original file with sed; Just add the -i option to edit the file in place. It should hopefully work to allow kneaddata to be able to track the paired reads. If it does not seem to be working if you would post the details on what is going wrong here that would be great.

Thank you,
Lauren

drelo · February 24, 2021, 11:14am

Thanks @lauren.j.mciver! It seems the replacement worked fine, I was expecting to run ‘as fast as the run with the incorrect headers’ -although it didn’t make sense- and I thought there was something wrong after 2 hours. I rerun everything and it seems with these samples take close to 3 hours to clean them (while yesterday, in other cases/samples I run it in less than an hour). It was a mix of my bad choice of a ‘demo’ pair of reads and launching it with few threads Thanks again, now it is working fine.

lauren.j.mciver · February 24, 2021, 9:46pm

Thank you for the follow up! I am glad to hear it is now working okay.

Thank you,
Lauren

Topic		Replies	Views
Kneaddata unable to interpret sequencing header format KneadData	4	63	August 23, 2024
Updated kneaddata to fix issue with paired-end reads? KneadData	9	1508	October 12, 2023
Paired-end data results in unpaired output KneadData	27	5841	June 20, 2024
Kneaddata FASTQ header problem KneadData	8	2322	March 19, 2021
All paired-end read unmatched KneadData	34	5724	December 25, 2024

Does kneaddata 0.7.4 still require the /1 and /2 in the read ids?

Related topics