Massive difference between paired reads' counts

matrs · April 30, 2021, 3:06pm

Hello,
I’ve been using kneaddata v0.7.10 to discard human and host DNA (other animals) from metagenomic data. I tested a paired-reads dataset from pigs with the human database from kaneaddata, and I got around 9 times more reads in one fastq compared to its pair. I also tried --bypass-trf, but I got the same problem. When I used kneaddata without a database for decontamination (so I applied trimming and trf), I got the same amount of counts for both members of a pair as expected, so it seems that the problem arises when a database is used.

The IDs in the input files look like this:

@ERR1855536.1 NS500633:37:H3VL5BGXY:1:11101:10093:1034/2
@ERR1855536.2 NS500633:37:H3VL5BGXY:1:11101:16963:1035/2
@ERR1855536.3 NS500633:37:H3VL5BGXY:1:11101:17200:1037/2
@ERR1855536.4 NS500633:37:H3VL5BGXY:1:11101:5816:1038/2

The ids in the result files look like this:

@ERR1855536.22
@ERR1855536.32
@ERR1855536.52
@ERR1855536.62
@ERR1855536.72
@ERR1855536.82

The command used:

$ kneaddata --remove-intermediate-output --threads 32 --input {2} --input {3} \
--output $out_folder --reference-db $ref --sequencer-source NexteraPE \
 --trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:50" --trimmomatic \
$trimmo_path --bowtie2-options "--very-sensitive --dovetail"

I’m attaching a multiqc report of these runs, all with the same problem

ERR1855535
ERR1855536
ERR1855537
ERR1855538

Edit: I had the latest version installed in the cluster, 0.10, but for some reason, the environment activation isn’t working. Anyways, I’m making sure I’m executing the latest version now and I got the following error (which I posted in a different thread):

kneaddata_bowtie2_discordant_pairs: error: unrecognized arguments: --mode strict

I’ll update this post when I can run the latest version.

matrs · May 1, 2021, 7:08pm

I tried the latest version, 0.10, and this problem is gone.

Topic		Replies	Views
[wmgx workflow] Discrepancy in total read counts between kneaddata and humann KneadData	3	449	December 15, 2022
There are small unmatched reads in paired reads of kneaddata result KneadData	1	51	January 14, 2025
Questions about the read count table pulled from kneaddata logs KneadData	1	556	February 8, 2023
Higher number of reads after trimmed + contaminated step cf. raw reads? KneadData	1	552	July 10, 2020
Kneaddata paired output have different names KneadData	0	27	September 19, 2024

Massive difference between paired reads' counts

Related topics