I have paired_end files from shoutgun metagenomics analysis (251 bp). Before starting with Metaphan, I run fastqc and fastq_screen to check how my files are.
I used KneadData to delete the human genome, and now it is ok. (I also notice that all my files do not pass the “Per Base Sequence Content.” Is this a problem? All the other control is OK.)
Should I also have to delete overlapping reads between R1 and R2? How can I do it? I try your preprocessing.py file in Python, but I do not understand the difference with KneadData. Can you help me?
Thanks
Michela