Paired-end files HUMAnN2

Erwan · April 28, 2020, 3:51pm

Hello,

I am new to shotgun metagenomics and I want to use your bioBakery toolbox, starting with kneadData, MetaPhlAn2 and HUMAnN2, and automate the analysis of shotgun data. I’m questioning myself about the best way to handle paired-end sequencing data.

Correct me if I’m wrong, but from what I understood, there is no benefit in providing paired-end files as MetaPhlAn2 and HUMANnN2 will basically use them like if they were two single-end files.

With that in mind, I am thinking about concatenating the forward and reverse files in a single file before performing any analysis. That way, I would have a similar workflow no matter if my data is single-end or paired-end, and it would be easier to handle it technically speaking (no paired.1, paired.2, single.1, single.2 files to deal with). Is there any drawback to this approach ?

Some other questions (actually related to the yes/no answer to my latter question) :

When dealing with overlapping paired-end reads, do you merge them at the beginning of the process ?
Is there any bioBakery tool that uses the paired-end info ?

franzosa · April 28, 2020, 4:06pm

KneadData is the only tool that uses end-pairing information (optionally). If you’re working with host-contaminated reads, we can use the end-pairing information when aligning the reads back to the host genome.

The tools that align to isolated gene sequences (including MetaPhlAn and HUMAnN) do not consider end-pairing information, and so we concatenate our first- and second-end reads into a single input file for those.

yctheolam · June 26, 2020, 8:03am

Hi franzosa,

I am wondering what is the point of merging paired end sequences if HUMAnN2 (or 3) doesn’t consider the end-pairing information? To provide more coverage?

Also, there are several merging programs available (e.g. bbmerge, ngmerge, vsearch), if you have to recommend one, what would be your choice? or do you think the simple cat can serve better under this context? Thank you so much!

franzosa · June 26, 2020, 2:11pm

Definitely cat - the goal is to convert the paired reads to a single input file, not to combine potentially overlapping reads.

In our experience, end-pairing information is very informative when aligning to longer sequences (contigs, genomes) when you expect both reads from a fragment to hit the same target in close proximity. When aligning to individual genes (as in most of our tools), it’s common for one read to align to a given gene while its mate overhangs that gene (aligning elsewhere or not at all). We’ve found it more straightforward to just align the reads separately rather than to check for and enforce concordant alignment in the fraction of cases where it would’ve been possible.

yctheolam · June 27, 2020, 10:48am

This totally makes sense to me. Thank you for answering!

Alya · August 14, 2020, 5:49pm

Is it ok to only use the forward reads when using humann3? Is there an advantage to concatenating the R1 and R2 files? Also how do you concatenate the files? Sorry for all the questions I’m really new to metagenomics and am trying to make sure I am on the right track.

franzosa · August 14, 2020, 6:11pm

Please see my reply on your other thread.

YikeShen · December 3, 2020, 2:20am

Hello franzosa, I used Bowtie to remove the host and resulting in a “sample#.sam” file. I am wondering should I use the cat forward/reverse read or directly using sam file as I also saw same input in the tutorial.
Thank you! -Yike

franzosa · December 15, 2020, 5:34pm

Sorry for the delayed reply. The SAM files that HUMAnN can accept are mappings that HUMAnN itself has generated; we don’t take SAM as a generic sequence input format. You would need to find a way to dump your reads from the SAM file to (e.g.) FASTQ to start a fresh HUMAnN run.

Also, if the SAM file you have is an alignment against the host genome, you’d presumably want to only dump the unmapped reads for analysis?

YikeShen · December 21, 2020, 4:06pm

Hello Eric,
Thanks for your answer. yes, in fact I tried to use Bowtie output Sam files to humann3. It successfully ran but it gave me funny/non human readable results. I cat forward and reverse reads and did the humann3 mapping.

jorondo1 · February 22, 2022, 10:36pm

Hi,
I understand paired-end samples should be concatenated before being input in the basic humann pipeline. We ran Kneaddata in paired-end mode for quality control. Should we include the single (unmatched) reads in the concatenation, i-e concatenate all 4 output files from kneaddata? (we ran it without decontamination because the host genome is unkwnown).

Topic		Replies	Views
Kneaddata fail to recognize paired end data KneadData	2	379	August 30, 2023
Single-end or Paired-end? KneadData	0	296	October 20, 2023
Humann3 Paired end reads HUMAnN	19	5782	October 30, 2024
Dealing with paired-end shotgun metagenomics sequencing ran on two lanes KneadData	0	371	May 13, 2022
Regarding the Installation and analysis of shot gun metagenomics samples Microbial community profiling	3	231	June 11, 2023

Paired-end files HUMAnN2

Related topics