Trim Overrepresented Effect Downstream

Hello,

I processed a set of shotgun metagenomes about a year ago using kneaddata. At the time, the “–run-trim-repetitive” flag was not mentioned in the readme/exposed to the user as a flag. I proceeded with data processing over the course of the year and am now finishing things up. I am seeing this as a recommended parameter and am wondering if I will need to reprocess everything.

How large of an effect would not using this option likely have downstream? I looked through some of the fastqc files and it seems that most don’t have overrepresented sequences flagged by fastqc but some do.

Would there be any way to handle this without having to rerun kneaddata and start from square one?

Thanks again, I realize the program is in development so things can change periodically just hoping to rationalize whether to rerun this or not.

TNT

Hi @Todd_Testerman ,

Thank you for reaching out to the bioBakery Lab. Our latest version of Kneaddata runs :

  • Trim overrepresented sequences using Fastqc reports →
  • Trimmomatic(trim adapters, sliding window, minimum bp length) →
  • TRF(trim repetitive sequences) →
  • bowtie2 database(Trim contaminants reads)
    in this order for Shotgun sequences.

Therefore, if you use the --bypass-trim,--bypass-trf, and not provide the -db flag to skip the bowtie2 step, you will be able to just trim the overrepresented seqences using FASTQC.

So, the final command would look something like this:

kneaddata --input demo.fastq  -o kneaddata_output --run-trim-repetitive --fastqc FastQC --bypass-trim --bypass-trf

Documentation here:

Regards,
Sagun