I haven’t confirmed that this will work (so beware!), but I have been meaning to try setting decontaminate_pairs to lenient to see if that will bypass this issue.
Hi @levlitichev, I just tried that with my data and got a bowtie2 error: “fewer reads in file specified with -2 than in file specified with -1”. It’s been puzzling me because I confirmed that the number of reads were the same in each pair in the raw files and after going through trimmomatic, so now I’m trying to see if something went wrong in TRF (prior step before bowtie2).
In any case, wasn’t very helpful for solving the original issue.
I think the issue here is caused by bowtie2 not recognising the read pairs due to changes in their header line formatting introduced by Kneaddata at the begining of the wrapper.
I can align my reads to the same bowtie2 index used by Kneaddata prior to processing them with kneaddata, but if I try to use the reads that results from the trf step, these do not get sorted into the paired unaligned output files.
My current work around is aligning the reads myself against the bowtie2 indices prior to using kneaddata to run trimmomatic and trf. If the modified read headers cause an issue downstream in kraken2 I might just have to drop Kneaddata or introduce a step to patch the modified header lines.
Current working hypothesis at least.
Hello,
I am dealing with the same problem: all my paired-end data are in my unpaired output.
I changed the name of my id (as you can see in the picture below) but it didn’t resolve the problem…
Thank you in advance for your help
Like @fquerdasi recommended, downgrading to v0.10 worked for me.
Hi thank you for you answer @levlitichev !
Now I am not sure about the problem, because the name of the reads seams to
be changing…
Because if I look on the .command.log file, I see this line
“Reformatting file sequence identifiers…”
Does that mean that my results are correct? Cause if the name are changed,
maybe the void in my “paired” file are normal?
I think the reformatting is normal. It’s how KneadData makes sure that it can match up the forward and reverse reads. The problem (with v0.12.0) is that bowtie2 still can’t figure out how to match up the forward and reverse reads, so it returns zero for paired reads (“INFO: READ COUNT: final pair1” and “pair2” above). With v0.10.0, this problem appears to be fixed (for me). This is my understanding at least.
Hello,
I found a workaround for a similar issue. If the demo files work with your installation, here’s the situation I encountered.
The headers in the demo files were formatted like this:
@A2G7T130425:1:1101:10002:20172/1
Using KneadData v0.12.0, these headers were recognized by both Trimmomatic and Bowtie2 during the decontamination process.
However, my headers looked like this:
@A00560:373:H7KT2DRX5:1:2101:1443:1016 1:N:0:CAGCCGCCTA+CACGGCTAGT
I realized I needed to remove the final part and add a “/” within the blank space, modifying my headers to look like this:
@A00560:373:H7KT2DRX5:1:2101:1443:1016/1
I created a bash script to automate this process, and it worked as intended. Here is the script:
Basically it works selecting R1 fastqs and R2 based on their file name, so make sure that your file name is something like this:" SampleID_R1_something.fastq" (after gunzipping all the .gz files)
bash
#!/bin/bash
# Check if the directory is provided
if [ "$#" -ne 1 ]; then
echo "Usage: $0 <directory>"
exit 1
fi
directory=$1
# Function to modify headers in a FASTQ file
modify_headers() {
input_file=$1
output_file=$2
awk '{
if (NR % 4 == 1 && substr($0, 1, 1) == "@") {
split($0, arr, " ")
print arr[1] "/" substr(arr[2], 1, 1)
} else {
print
}
}' "$input_file" > "$output_file"
}
# Process all *_R1_*.fastq files
for file in "$directory"/*_R1_*.fastq; do
if [[ -f "$file" ]]; then
output_file="${file%.fastq}_modified.fastq"
echo "Modifying headers in $file..."
modify_headers "$file" "$output_file"
echo "Output written to $output_file"
fi
done
# Process all *_R2_*.fastq files
for file in "$directory"/*_R2_*.fastq; do
if [[ -f "$file" ]]; then
output_file="${file%.fastq}_modified.fastq"
echo "Modifying headers in $file..."
modify_headers "$file" "$output_file"
echo "Output written to $output_file"
fi
done
Note: You’ll need to gunzip the FASTQ files, run the script to change the headers, and then compress them again if necessary. Simply provide the folder containing all the fastq.gz files, and the script will handle the rest.
After modifying the headers, you can try running KneadData again.