Humann3 Paired end reads

Hi,

I’m just wondering how HUMAnN3 handles paired-end reads? I’m pretty new to metagenomics data analysis and was wondering if you can help.

Thanks,
a

1 Like

I’ve written a bit about this here:

In short, we recommend concatenating your paired reads upstream of MetaPhlAn and HUMAnN.

Hi… Is it necessary to run kneaddata before running MetaPhlAn 3.0 even if I use
--ignore_eukarya option with metaphlan command?

--ignore_eukarya will only ignore reads recruited to eukaryotic microbial marker genes; it will not ignore/exclude generic human contaminant DNA. That said, MetaPhlAn itself is fairly robust to un-removed human contamination, but downstream steps (e.g. HUMAnN’s translated search) might not be.

1 Like

In other words,whether to merge two files directly?for example:

cat sample_R1.fq sample_R2.fq > merge_sample.fq

Thanks!

Correct, this is what we mean when we say to concatenate paired reads.

Hi Eric,

I just wanted to follow up on the questions about how humann3 handles paired-end reads above. I wonder if we pass the concatenated reads to humann3 (e.g., concatenated reads = cat forward.fastq reverse.fastq), how does humann3 treat it? Will it treat the forward and reverse reads as two separate reads or it will consider them as a pair (e.g., a pair of reads hitting the same gene will be counted as 1 hit instead of 2 hits?

Thank you so much!

It will treat the forward and reverse reads as if they were independent unpaired reads. This is obviously an approximation, but from my simulations it doesn’t introduce any obvious biases in the downstream gene abundances.

Hi all, could I merge the forward and reverse file by Pear (Paired-End reAd mergeR), and then input the merged file to run the humann3?

For example,
pear -f test_1.fq -r test_2.fq -o test

Many thanks

This style of merging tries to overlap the paired reads into a new single read. This is only helpful if the single reads are long enough / sequencing fragment is short enough for the single reads to overlap. When that happens, it can be useful for applications like amplicon profiling since having a longer read gives more mapping specificity.

I don’t have experience using this procedure upstream of MetaPhlAn and HUMAnN. When we “merge” the reads for MetaPhlAn and HUMAnN we are really just concatenating them into one long file of single-end reads.

Hi Franzosa, thanks for your reply. I have ran humann3 with some demo samples using the “cat” command to concatenate forward and reverse files. This command worked successfully.

Thanks,

Jun

Great, glad to hear that worked!

I recognize that this merging the paired read files into one is unlikely to introduce significant bias based on latter comments. Would there be a benefit to including the unmatched reads since humann isn’t looking at the mates anyway?

Would it make sense to merge the forward reads from the paired ends, with both sets of unmatched reads to try to cover more of the genome rather than double-counting regions captured by paired-ends?

Our usual procedure is to merge the still-paired reads with both sets of orphaned reads as input to our methods. The procedure of throwing out one half of the still paired reads seems weird to me - I’m not convinced that it will have any notable result on unbiasing the results but it WILL hurt your effective sequencing depth / coverage.

I have occasionally seen people do things like analyze all read1s (paired + orphaned) vs. all read2s as a form of replication, although those ought to be nearly identical.