Hi,
We are investigating to understand some discrepancies between our kneaddata command and the FastQc output.
We obtained shotgun data from a NovaSeq6000 sequencer. Both FastQc (start+end) report a very low amount of adapters (up to around 2% at position 137). These adapters are not removed despite us providing the --sequencer-source=“TruSeq3” option.
Also, despite providing --trimmomatic-options=“MINLEN:50”, we end up with a small but non-negligible fraction of reads whose length are evenly distributed between 14bp and <151bp. How is that possible? Could the tandem repeats finder be responsible and if so, is it possible to have it run before the trimmomatic step for the MINLEN option to be relevant? *UPDATES : disabling trf has no effect on the issue and it seems the order in which we input the trimmomatic options is the order in which the trimming will be done, which explains this second issue
Here is the exact command we ran:
kneaddata -v --input $SLURM_TMPDIR/${__INR1_FILENAME} --input $SLURM_TMPDIR/${__INR2_FILENAME} \
-db $__DB --bowtie2-options="--very-sensitive-local" \
-o $SLURM_TMPDIR/${__EXP_NAME} --output-prefix ${__EXP_NAME} \
--threads 23 --max-memory 30G --sequencer-source="TruSeq3" \
--trimmomatic-options="MINLEN:50 SLIDINGWINDOW:4:30" \
--run-fastqc-start --run-fastqc-end
And the corresponding log:
Running Trimmomatic ...
java -Xmx30G -jar /cvmfs/soft.mugqic/CentOS6/software/trimmomatic/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 23 -phred33 /localscratch/ronj2303.7200288.0/S-14-POLPIL-G/reformatted_identifiersji5zc25s_decompressed_iakqnu1u_NS.1821.001.IDT_i7_104---IDT_i5_104.S-14-POLPIL-G_R1 /localscratch/ronj2303.7200288.0/S-14-POLPIL-G/reformatted_identifiersc7yeowio_decompressed_qx7vb59j_NS.1821.001.IDT_i7_104---IDT_i5_104.S-14-POLPIL-G_R2 /localscratch/ronj2303.7200288.0/S-14-POLPIL-G/S-14-POLPIL-G.trimmed.1.fastq /localscratch/ronj2303.7200288.0/S-14-POLPIL-G/S-14-POLPIL-G.trimmed.single.1.fastq /localscratch/ronj2303.7200288.0/S-14-POLPIL-G/S-14-POLPIL-G.trimmed.2.fastq /localscratch/ronj2303.7200288.0/S-14-POLPIL-G/S-14-POLPIL-G.trimmed.single.2.fastq MINLEN:50 SLIDINGWINDOW:4:30