The bioBakery help forum

Massive difference between paired reads' counts

I’ve been using kneaddata v0.7.10 to discard human and host DNA (other animals) from metagenomic data. I tested a paired-reads dataset from pigs with the human database from kaneaddata, and I got around 9 times more reads in one fastq compared to its pair. I also tried --bypass-trf, but I got the same problem. When I used kneaddata without a database for decontamination (so I applied trimming and trf), I got the same amount of counts for both members of a pair as expected, so it seems that the problem arises when a database is used.

The IDs in the input files look like this:

@ERR1855536.1 NS500633:37:H3VL5BGXY:1:11101:10093:1034/2
@ERR1855536.2 NS500633:37:H3VL5BGXY:1:11101:16963:1035/2
@ERR1855536.3 NS500633:37:H3VL5BGXY:1:11101:17200:1037/2
@ERR1855536.4 NS500633:37:H3VL5BGXY:1:11101:5816:1038/2

The ids in the result files look like this:


The command used:

$ kneaddata --remove-intermediate-output --threads 32 --input {2} --input {3} \
--output $out_folder --reference-db $ref --sequencer-source NexteraPE \
 --trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:50" --trimmomatic \
$trimmo_path --bowtie2-options "--very-sensitive --dovetail"

I’m attaching a multiqc report of these runs, all with the same problem


Edit: I had the latest version installed in the cluster, 0.10, but for some reason, the environment activation isn’t working. Anyways, I’m making sure I’m executing the latest version now and I got the following error (which I posted in a different thread):

kneaddata_bowtie2_discordant_pairs: error: unrecognized arguments: --mode strict

I’ll update this post when I can run the latest version.

I tried the latest version, 0.10, and this problem is gone.