Paired end WGS sequence analysis

DEEPCHANDA7 · May 4, 2020, 10:40am

Hi all (@fbeghini , @NSegata)!!!
Please help me regarding this basic concept I’m struggling to understand. I want to profile already submitted paired-end WGS data (SRR2155174_1.fastq, SRR2155174_2.fastq). But I’m confused with the following statement and have the following questions:

“MetaPhlAn 2 can also natively handle paired-end metagenomes (but does not use the paired-end information)”

What is the meaning of “paired-end information” here? What is the difference between handling paired-end metagenome but not using paired-end information?

Should I concatenate the forward and reverse read files or just use command:

$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt

Why we don’t make contigs from the two reads (as I made during analysis by “MOTHUR” software)?

Thanks and Regards,
DC7

bigdoyle · May 5, 2020, 2:15pm

" What is the difference between handling paired-end metagenome but not using paired-end information?"

The reads are treated as independent single reads for mapping to reference. Paired-end reads are expected to align to reference at more-or-less a fixed distance apart and in opposite orientation. These ‘expectations’ are ignored in MetaPhlan2.

“Why we don’t make contigs from the two reads (as I made during analysis by “MOTHUR” software)?”

In WGS the reads are generally <150 nt and the fragments being sequenced often >300nt. So, the reads don’t overlap enough to be assembled into contigs. In 16S sequencing the 16S gene amplification primers and the method of sequencing are designed to produce overlapping paired-end reads in order to be able to assemble them.

DEEPCHANDA7 · May 5, 2020, 5:15pm

Waao… Such a nice explanation. Thanks @bigdoyle. Now it’s very clear to me.

rosepaul · December 2, 2020, 6:28pm

Thanks @bigdoyle for the explanation, I had the same question.
What would be the recommended read length (75 or 150) for paired end sequencing to use with Metaphlan pipeline for WGS?

Thanks,
Reeba

bigdoyle · December 2, 2020, 10:28pm

Longer is generally better for less likelihood of ambiguous mapping, and potentially greater classification accuracy. But, longer reads cost more… we run PE 150. I think most labs do.

saras22 · November 18, 2021, 7:29am

What do you mean by “fragments being sequenced often >300nt”?

DEEPCHANDA7 · November 18, 2021, 8:33am

For a DNA to be sequenced, it is at first fragmented and then it is sequenced from forward end first and then from the opposite (reverse) end. He wanted to tell that this fragment is overall >300 nt, but the sequencing done from each end is of <150 nt long.
Thanks,
DC7

saras22 · November 18, 2021, 8:52am

@DEEPCHANDA7 Thanks dear! I got it.

Topic		Replies	Views
Would MetaPhlAn support PE mapping in future version? MetaPhlAn	8	312	November 30, 2022
Help for metaPhlan3 with paired-end reads MetaPhlAn	3	538	April 21, 2023
Paired-end files HUMAnN2 HUMAnN	10	4156	February 22, 2022
MetaPhlAn preprocessing of reads MetaPhlAn	1	522	December 3, 2021
Paired end files processing MetaPhlAn	1	1754	December 3, 2021

Paired end WGS sequence analysis

Related topics