Does kneaddata 0.7.4 still require the /1 and /2 in the read ids?

My question is how do we tell whether the kneaddata version requires the /1 and /2? Also, is there a way to download the data from the SRA to be compatible with kneaddata?

Would greatly appreciate your guidance.

Hello, Yes, the latest Kneaddata version still requires the pair identifiers in the read headers. It accepts both the original and new illumina formats.

Original format example: @HWUSI-EAS100R:6:73:941:1973#0/1
New format example: @EAS139:136:FC706VJ:2:2104:15343:197393 1:N:18:1

In the upcoming Kneaddata release we will allow for files without the pair identifier by specifying the read1 and read2 files on the command line.

Thank you,

1 Like

Hi @lauren.j.mciver just a brief question about kneaddata I downloaded paired-end files from SRA too but they look like this with head :

@SRR6000869.1 1/1
@SRR6000869.2 2/1
@SRR6000869.3 3/1

I tried doing a simple sed replacement and the headers look like this
@SRR6000869.1#1/1 but that doesn’t seem to be working. Can I fix this doing a new replacement over the original file? Or should I use trimmomatic outside of kneaddata and use kneaddata only for removing human reads? I am afraid this workaround maybe will be harder to implement than doing both steps in a single line. I have to use this version anyway since it is in a cluster environment so I would prefer to fix this with sed before feeding it to kneaddata. Thanks for your help!
Sorry I replied with this question at the HUMANN thread I just deleted that duplicate.

Hi, You should be able to make this change to the original file with sed; Just add the -i option to edit the file in place. It should hopefully work to allow kneaddata to be able to track the paired reads. If it does not seem to be working if you would post the details on what is going wrong here that would be great.

Thank you,

Thanks @lauren.j.mciver! It seems the replacement worked fine, I was expecting to run ‘as fast as the run with the incorrect headers’ -although it didn’t make sense- and I thought there was something wrong after 2 hours. I rerun everything and it seems with these samples take close to 3 hours to clean them (while yesterday, in other cases/samples I run it in less than an hour). It was a mix of my bad choice of a ‘demo’ pair of reads and launching it with few threads Thanks again, now it is working fine.

Thank you for the follow up! I am glad to hear it is now working okay.

Thank you,