Clear guidance needed for comparing across samples with varying sequencing depth

I have come across the following discussions:

I’m trying to minimize batch effects, and therefore have to rerun each sample rarefied to some depth appropriate for the samples in the cohort being analyzed.

Would it be theoretically possible to make a post-hoc normalization utility where metaphlan/humann3 results tables could be normalized given a sequencing depth parameter? Simply scaling the counts by the number of reads is ineffective due to the contribution of many reads to a given pathway or taxon assignment. Essentially if metaphlan is giving us a probable count of a particular taxon given a set of reads, we really need the probable count of a particular taxon given a set of reads and the total number of reads in a sample.

It would be great to have some clear guidance in the manual about how to address this (very common) situation; otherwise any subsequent differential abundance analysis will just be detecting rare taxa that get observed with deeper sequencing. Advice about prevalence/abundance thresholds would be useful as well; again due to the contribution of many reads to taxonomy assignment it is not trivial to determine if a bug is below the limit of detection or not. (As compared to amplicon sequencing, where we can reasonably say "Even though we detect 2 copies of bug_x in sample A with depth of 1000 reads, and 0 copies of bug_x in sample B with depth of 100 reads, we cannot assume these are deferentially abundant because we did not sequence enough molecules in Sample B to detect the relative abundance of 1/500 in sample A.)

Thanks in advance!

1 Like

Hello, any updates? This is critical to comparing samples across different depths and I worry we and other are misleading ourselves until we have a documented way of resolving this.

Hi @admins any thoughts on this?

Hi @admins any insight on this question?

Hi @admins @Curtis_Huttenhower any advice?

Hi @nickp60
Sorry for the late reply. I have been working on integrating a read’s subsampling procedure within MetaPhlAn. This utility will allow you to execute MetaPhlAn only in a defined subsample of the reads or even re-normalize an already executed analysis (if the bowtie2out output has been saved). The code is almost ready and we expect to release with the new version of the code adding this functionality during next week. However, if you are interested in post-hoc normalization, you can have a look at the following package: GitHub - hildebra/Rarefaction: Rarefaction scripts

Hi @nickp60
We recently pushed version 4.0.2 allowing the rarefaction at execution time and also post processing the bowtie2out files.

Hi @aitor.blancomiguez, thanks for the update! What depths do you reccomend rarefying to? some ntile of the sample spread? to the depth of the smallest sample you want to include? And what about functional profiling?

It seems like there is some work on this with ANPAN, where it “applies adaptive filtering to discard samples where the species was likely not present (based on low abundance and few non-zero observations) based on the sample’s functional profile.”.

How would one apply a similar methodology to a collection of Humann outputs to do differential abundance across samples while correcting for sample depth and complexity?

I would usually recommend to rarefy to the smallest sample depth you want to analyse.
For the second part of the question related to functional profiling, Humann already uses the Metaphlan profiles to determine which species are present and the map only against the pangenomes of the detected species.

Thanks @aitor.blancomiguez . Although humann is already using the metaphlan profile, won’t it still be searching the full non-rarefied set of reads against both the (rarefied) pangenome and the protein database? Wont that still lead to uneven detection of gene families between samples of varying depths?

Yes, after the first mapping against the pangenomes of the species present in the MetaPhlAn profile, humann will map the non-classified reads against the full UniRef database. So yes, the functional profiling it will be also affected by the different sample depths. But, I think I’m not getting your question

Sorry for the confusion; it may well stem from me not understanding how humann is working under the hood, so hopefully this clarifies things.

Here is an example. Aliquots A, B, C, D of some sample have been sequenced to 100k, 1M, 10M, and 100M reads. With the rarefaction option you added to metaphlan, I can now do taxonomic profiling at the minimum sample depth of 100k reads, which is great! I should recover near-identical rarefied taxonomic profiles. However, I want to do functional profiling as well. If I run Humann on these 4 samples, the detection of genes and pathways across samples is going to vary simply due to the sampling depths. It won’t matter if the first step in Humann is mapping against the rarefied pangenome, the subsequent step will still use all the remaining reads: deeper samples will still detect more genes and pathways, and shallower samples will miss genes and pathways. The stratified output of Human may still limit the assigned pathways to the rarefied pangenome taxa, but we typically aren’t using the stratified output when doing differential abundance. I want to avoid biasing our functional profiling by sampling depth. I don’t want to accidentally call gene1 differentially abundant due to its presence in samples C and D, when A and B simply didn’t detect it due to sampling.

It seems like one solution is to model the “detectability” of a gene relative to some limit of detection determined from both the sample complexity and depths of all the samples. Working on writing this up, more to follow.

HI @nickp60
I understand now your question. You are right, as humann do not perform any subsampling, the genes and pathways not assigned to a taxa in the humann output will be biased. As humann 3.6 still does not implement a subsampling procedure within the code, probably the easiest thing to do is to, previous to your analysis, simply randomly subsample the reads and run both metaphlan and humann on the rarefied metagenomes (at the depth of the shallowest sample)

Hello @aitor.blancomiguez and @nickp60
My understanding is that subsampling (single rarefaction) is a bit controversial for differential abundance analysis. Most of the 16S community has recently settled towards using non-rarefied data for differential abundance analysis and addressing the sequencing depth issue during statistical analysis using appropriate transformations such as CLR, ILR, etc. Therefore, I was wondering if it is okay to use non-rarefied data for the differential abundance of MetaPhlAn-driven microbial taxa and HUMAnN-driven functional metagenomics data.

To the best of my knowledge, there is no appropriate statistical tool that can be used for diversity analysis on the non-rarefied data having various sequence depths. Therefore, I do agree that subsampling should be recommended to address the varying sequencing depth issue for this analysis. One particular issue here is how we will determine the read number for subsampling. What if the smallest sample depth I want to analyze does not have enough reads to capture true diversity? In 16S we use the Shannon rarefaction curve and see where the curve reached to plate. A plateau in the Shannon rarefaction curve (please an example here QIIME 2 View) indicates that the sequencing depth at/above which may be sufficient to capture actual diversity in the microbial community. I do acknowledge that shotgun metagenomics is way more complex than 16S. MetaPhlAn-4 gives a pseudo-count with the parameter “rel_ab_w_read_stats”. Therefore, I was wondering if it is okay to use those pseudo-count data for generating an alpha-rarefaction curve.

From my perspective there are two key differences here between 16S and shotgun in the regards to rarefaction (which agreed, is pretty settled for 16S):

  • there is no 1:1 relationship between reads and taxonomic assignments. Multiple markers need to be detected to call a taxon present.
  • unless it is a very deep sample of a very simple community, the samples are generally not sequenced to saturation.

In my experience alpha diversity is pretty stable to differences in depth. We usually use the inverse Simpson index, and here are the results from some an experiment I did downsampling to various depths in 3 replicates across 10 samples:

I did this with kraken2. I do need to redo the analysis with metaphlan, its been on my to-do list for a while! When I do I’ll report back, and have the Shannon index as well…