Humann3 Paired end reads

Alya · August 14, 2020, 5:41pm

Hi,

I’m just wondering how HUMAnN3 handles paired-end reads? I’m pretty new to metagenomics data analysis and was wondering if you can help.

Thanks,
a

franzosa · August 14, 2020, 6:11pm

I’ve written a bit about this here:

In short, we recommend concatenating your paired reads upstream of MetaPhlAn and HUMAnN.

DEEPCHANDA7 · August 24, 2020, 5:04pm

Hi… Is it necessary to run kneaddata before running MetaPhlAn 3.0 even if I use
--ignore_eukarya option with metaphlan command?

franzosa · August 28, 2020, 6:25pm

--ignore_eukarya will only ignore reads recruited to eukaryotic microbial marker genes; it will not ignore/exclude generic human contaminant DNA. That said, MetaPhlAn itself is fairly robust to un-removed human contamination, but downstream steps (e.g. HUMAnN’s translated search) might not be.

wfgui · March 3, 2021, 1:41am

In other words,whether to merge two files directly？for example：

cat sample_R1.fq sample_R2.fq > merge_sample.fq

Thanks!

franzosa · March 11, 2021, 1:54pm

Correct, this is what we mean when we say to concatenate paired reads.

Gary · December 29, 2022, 8:15pm

Hi Eric,

I just wanted to follow up on the questions about how humann3 handles paired-end reads above. I wonder if we pass the concatenated reads to humann3 (e.g., concatenated reads = cat forward.fastq reverse.fastq), how does humann3 treat it? Will it treat the forward and reverse reads as two separate reads or it will consider them as a pair (e.g., a pair of reads hitting the same gene will be counted as 1 hit instead of 2 hits?

Thank you so much!

franzosa · January 24, 2023, 6:34pm

It will treat the forward and reverse reads as if they were independent unpaired reads. This is obviously an approximation, but from my simulations it doesn’t introduce any obvious biases in the downstream gene abundances.

strainliu · May 28, 2023, 2:46pm

Hi all, could I merge the forward and reverse file by Pear (Paired-End reAd mergeR), and then input the merged file to run the humann3?

For example,
pear -f test_1.fq -r test_2.fq -o test

Many thanks

franzosa · June 11, 2023, 2:45pm

This style of merging tries to overlap the paired reads into a new single read. This is only helpful if the single reads are long enough / sequencing fragment is short enough for the single reads to overlap. When that happens, it can be useful for applications like amplicon profiling since having a longer read gives more mapping specificity.

I don’t have experience using this procedure upstream of MetaPhlAn and HUMAnN. When we “merge” the reads for MetaPhlAn and HUMAnN we are really just concatenating them into one long file of single-end reads.

strainliu · June 13, 2023, 7:42am

Hi Franzosa, thanks for your reply. I have ran humann3 with some demo samples using the “cat” command to concatenate forward and reverse files. This command worked successfully.

Thanks,

Jun

franzosa · June 23, 2023, 5:52pm

Great, glad to hear that worked!

ahleyva65 · August 22, 2023, 10:28pm

I recognize that this merging the paired read files into one is unlikely to introduce significant bias based on latter comments. Would there be a benefit to including the unmatched reads since humann isn’t looking at the mates anyway?

Would it make sense to merge the forward reads from the paired ends, with both sets of unmatched reads to try to cover more of the genome rather than double-counting regions captured by paired-ends?

franzosa · September 5, 2023, 3:36pm

Our usual procedure is to merge the still-paired reads with both sets of orphaned reads as input to our methods. The procedure of throwing out one half of the still paired reads seems weird to me - I’m not convinced that it will have any notable result on unbiasing the results but it WILL hurt your effective sequencing depth / coverage.

I have occasionally seen people do things like analyze all read1s (paired + orphaned) vs. all read2s as a form of replication, although those ought to be nearly identical.

eo24a · October 6, 2024, 6:12pm

I wonder if it is ok to use just one pair for downstream analysis instead of concatenate the reads.

Another question, what is the disadvantages to use trimmed reads after kneaddata vs paired reads? I understand that the trimmed reads after trimmomatic and paired reads after bowtie2 but would that introduce any bias to gene analysis?

franzosa · October 11, 2024, 7:24pm

The main issue with analyzing the pairs separately is that the combination of the pairs can allow features to reach a coverage that the individual pairs wouldn’t. Hence you will be less sensitive to low-abundance features in the split analysis approach.

I don’t follow the second part of the question? Removing low-quality nucleotides (trimmed reads) tends to help alignment since the low-quality part might not align, which can limit the alignment of the read itself. KneadData uses Bowtie 2 to deplete reads that map to a reference database (e.g. host genome). Either of these processes can remove pairs of reads or a single read in a pair, resulting in orphans. Hence a typical KneadData run results in a set of intact pairs + read1s and read2s that have been orphaned from their original pairs. Most of our tools don’t need the pairing information at all, but we maintain it for processes that care, e.g. assembly.

eo24a · October 11, 2024, 10:19pm

Thank you franzosa!

For clarification because “Most of our tools don’t need the pairing information at all, but we maintain it for processes that care” This part confuses me. Are the paired files were the assemblies maintained bur shouldnt go to other tools?
Which output should go in to the humann3? after kneaddata output, I have files.
kneaddata.trimmed.1.fastq kneaddata.trimmed.2.fastq
kneaddata.repeats.removed.1.fastq kneaddata.repeats.removed.2.fastq
kneaddata_paired_1.fastq kneaddata_paired_2.fastq

Tangential question to understand workflow better. If the paired files already removed the reads that were not pairing, then, concatenating them will not make a difference or improve the sensitivity because to a coding region that read1 didnt align, its pair will not align either. Is that correct? Am I missing something here?

franzosa · October 15, 2024, 3:31pm

Sorry I was not being very clear there. I should’ve said that KneadData maintains the pairing information during QC so we don’t lose track of it. The final output of KneadData should be four files:

$SAMPLE_kneaddata_paired_1.fastq
$SAMPLE_kneaddata_paired_2.fastq
$SAMPLE_kneaddata_unmatched_1.fastq
$SAMPLE_kneaddata_unmatched_2.fastq

The larger table in the KneadData tutorial helps to explain the other files. There may also be an output file where the four files above have been concatenated (or this might be something specific to the bioBakery workflows).

Either way, that concatenated file is the typical input to tools like MetaPhlAn and HUMAnN. Those tools use the paired reads, they just don’t consider the pairing information as they work (treating all reads as independent, which is an approximation that works well in practice). Conversely, methods like metagenome assembly use the pairing information since it provides valuable structural details about the source genomes.

To your final question, it’s not necessarily the case that read1 aligning or failing to align to a coding region means that read2 will do the same. That’s because read1 might be inside the region while read2 is outside (or vice versa). This is part of the reason why we don’t enforce paired mapping in tools like MetaPhlAn that work with restricted gene catalogs.

eo24a · October 26, 2024, 1:31am

Thank you franzosa!

I understand better now. So for Metaphlan tutorial there is a section *1.fastq,*2.fastq as an entry so instead of doing do you suggest to concatenate all four files above and use that as an input, or in the case of metaphlan would that be paired_1.fastq,paired_2.fastq, unmatched_1.fastq and unmatched_2.fastq?

franzosa · October 30, 2024, 5:25pm

Correct. The four post-QC files for a given sample (paired 1/2 and unmatched 1/2) are all concatenated as a single input for MetaPhlAn and HUMAnN.

Topic		Replies	Views
Paired-end files HUMAnN2 HUMAnN	10	4183	February 22, 2022
HUManN3 functional annotation doubts HUMAnN	6	1336	June 29, 2022
Question about HUMANN3 HUMAnN	0	734	April 20, 2021
More Streamlined Handling of Paired-End Files HUMAnN	1	500	July 9, 2021
Running paired-end metagenome, only 3 result files were generated HUMAnN	1	28	March 7, 2025

Humann3 Paired end reads

Related topics