Trim Overrepresented Effect Downstream

Todd_Testerman · June 24, 2021, 5:23pm

Hello,

I processed a set of shotgun metagenomes about a year ago using kneaddata. At the time, the “–run-trim-repetitive” flag was not mentioned in the readme/exposed to the user as a flag. I proceeded with data processing over the course of the year and am now finishing things up. I am seeing this as a recommended parameter and am wondering if I will need to reprocess everything.

How large of an effect would not using this option likely have downstream? I looked through some of the fastqc files and it seems that most don’t have overrepresented sequences flagged by fastqc but some do.

Would there be any way to handle this without having to rerun kneaddata and start from square one?

Thanks again, I realize the program is in development so things can change periodically just hoping to rationalize whether to rerun this or not.

TNT

sagunmaharjann · July 9, 2021, 5:34pm

Hi @Todd_Testerman ,

Thank you for reaching out to the bioBakery Lab. Our latest version of Kneaddata runs :

Trim overrepresented sequences using Fastqc reports →
Trimmomatic(trim adapters, sliding window, minimum bp length) →
TRF(trim repetitive sequences) →
bowtie2 database(Trim contaminants reads)
in this order for Shotgun sequences.

Therefore, if you use the --bypass-trim,--bypass-trf, and not provide the -db flag to skip the bowtie2 step, you will be able to just trim the overrepresented seqences using FASTQC.

So, the final command would look something like this:

kneaddata --input demo.fastq  -o kneaddata_output --run-trim-repetitive --fastqc FastQC --bypass-trim --bypass-trf

Documentation here:

Regards,
Sagun

Topic		Replies	Views
Kneaddata trim options, file headers, and consistent options flags KneadData	0	659	January 13, 2022
Kneaddata MINLEN behaviour unexpected; Adapters remain unremoved KneadData	0	364	March 30, 2022
About TRF in kneaddata KneadData	6	1497	June 1, 2023
--run-trim-repetitive and --sequencer-source not part of Docker, Conda, or Pip KneadData	1	799	November 24, 2020
Kneaddata outputs KneadData	1	1150	July 9, 2021

Trim Overrepresented Effect Downstream

Related topics