Clear guidance needed for comparing across samples with varying sequencing depth

I have come across the following discussions:

I’m trying to minimize batch effects, and therefore have to rerun each sample rarefied to some depth appropriate for the samples in the cohort being analyzed.

Would it be theoretically possible to make a post-hoc normalization utility where metaphlan/humann3 results tables could be normalized given a sequencing depth parameter? Simply scaling the counts by the number of reads is ineffective due to the contribution of many reads to a given pathway or taxon assignment. Essentially if metaphlan is giving us a probable count of a particular taxon given a set of reads, we really need the probable count of a particular taxon given a set of reads and the total number of reads in a sample.

It would be great to have some clear guidance in the manual about how to address this (very common) situation; otherwise any subsequent differential abundance analysis will just be detecting rare taxa that get observed with deeper sequencing. Advice about prevalence/abundance thresholds would be useful as well; again due to the contribution of many reads to taxonomy assignment it is not trivial to determine if a bug is below the limit of detection or not. (As compared to amplicon sequencing, where we can reasonably say "Even though we detect 2 copies of bug_x in sample A with depth of 1000 reads, and 0 copies of bug_x in sample B with depth of 100 reads, we cannot assume these are deferentially abundant because we did not sequence enough molecules in Sample B to detect the relative abundance of 1/500 in sample A.)

Thanks in advance!

Hello, any updates? This is critical to comparing samples across different depths and I worry we and other are misleading ourselves until we have a documented way of resolving this.