Question about using paired end reads with MetaPhlAn


I’m looking to use MetaPhlAn3 to profile some metatranscriptomics data I have. I have three files per sample:

  1. sample1_R1.fastq.gz - forward reads passing QC with matching reverse read in R2 file

  2. sample1_R2.fastq.gz - reverse reads passing QC with matching forward read in R1 file

  3. sample1_unpaired.fastq.gz - reads that only one of the pair passed QC

I’m curious about the best way to go about running the profiling. Would the best way be to run MetaPhlAn with the three files following this command (found in help manual via metaphlan -h)?
metaphlan sample1_R1.fastq.gz,sample1_R2.fastq.gz,sample1_unpaired.fastq.gz --bowtie2out sample1.bowtie2.bz2 --nproc 5 --input_type fastq -o profiled_sample1.txt

Or is there an advantage (or disadvantage) to joining the paired reads before running MetaPhlAn? I read in the old Google Group that MetaPhlAn does not use the paired information in the reads and I am curious if joining the paired reads would be advantageous in analyzing the data. If I did join the reads (R1 and R2) would it be acceptable to run MetaPhlAn3 with the new joined reads and the unpaired reads?

Thank you!

Hi Samantha,
the best practice would to use the three unmerged files with the single end reads and not merging the reads, you have no advantages in doing this. When merging the reads, you could have a very small fraction of merged reads due to the fact that the insert size can be quite different and this leads to no overlap between the two ends.

1 Like

I also merged my reads for another pipeline and have a large majority of the pairs merged due to limited insert size.
Would using merged reads and unmerged leftovers not make the sequences more unique as they grow longer (as suggested by Samantha)?