Merging results of metaphlan and humman tables for two batches of the same study (two different timepoints)

Saharb · January 23, 2026, 5:38pm

Hi,

I’m facing an imminent deadline and would really appreciate your guidance at your earliest convenience.

I have two sequencing runs containing samples from different time points. For each run, I have already generated MetaPhlAn taxonomic profiles (count and relative abundance) and HUMAnN functional profiles (CPM and relative abundance). I would like to combine the results across runs so that all samples can be analyzed together for downstream analyses, but I want to ensure this is done correctly and in line with best practices.

Any advice you could provide on how to appropriately merge these outputs would be extremely helpful. My question is especially about relative abundance data for metaphlan and CPM and relative abundance data for humann.

Thank you very much for your time and help.

Sahra

WillNickols · January 23, 2026, 6:07pm

Hi Sahra,

There‘s quite a bit of documentation about how to use MaAsLin 3 (probably your first analysis step) here. There’s also information on standard visualization techniques here.

Saharb · January 23, 2026, 6:21pm

Hi Will,

I’m not sure I fully understand your response. My question is specifically about combining results from two separate sequencing runs. Each run has already been processed independently through the MetaPhlAn and HUMAnN pipelines. I would like to merge these outputs into unified tables: MetaPhlAn relative abundance profiles, and HUMAnN gene family and pathway abundance tables (both relative abundance and CPM), for downstream analysis.

I would appreciate your guidance on best practices in this situation. Is it sound to merge results after processing the runs separately, or is it recommended to reprocess all samples together in a single run before downstream analysis?

Thanks

WillNickols · January 23, 2026, 10:57pm

Ah, sorry - thought you were asking about downstream analysis more generally. Is this MetaPhlAn merging information and this HUMAnN merging information what you’re looking for? Both of these tools operate per-sample, so it shouldn’t matter whether you did single files or many files originally.

Saharb · January 23, 2026, 11:29pm

Thanks for your quick reply, Will. My main concern was whether it is appropriate to merge samples originating from different sequencing runs when the outputs are already normalized (relative abundance for MetaPhlAn and relative abundance/CPM for HUMAnN).

I wanted to confirm that it is acceptable to use the same merging scripts I typically use for combining samples within a single run, but instead apply them across two independent runs. Since both MetaPhlAn and HUMAnN perform normalization on a per-sample basis, my understanding is that I can place all *_profile.txt files from both runs into a single directory and merge them using the command below for MetaPhlAn count and relative abundance data.

merge_metaphlan_tables.py *_profile.txt > merged_abundance_table.txt

Similarly, for HUMAnN, I would copy all joined_taxonomic_profile.tsv files per database from both runs into one directory and run:

humann_join_tables --input $DIR --output joined_taxonomic_profile.tsv

Downstream, I would then account for potential batch effects by including sequencing run as a covariate in the statistical analyses. Please let me know if this approach is sound or if there are any additional considerations I should be aware of.

Thank you so much,

WillNickols · January 24, 2026, 7:30am

Yes - both MetaPhlAn and HUMAnN use per-sample normalization, so combining the batches like this should work fine. In downstream analysis, you could also include the batch as a covariate (in addition to read depth, which should always be there) if you think there are systematic changes per batch.

Saharb · January 24, 2026, 10:24pm

Thank you very much, Will, for the quick response. Just to clarify, by read depth you are referring to the read count per sample. Would you recommend using the raw read counts, or the read counts after quality control (e.g., post-QC / post-host removal)?

Thanks,

WillNickols · January 25, 2026, 12:28am

I’d use the total read count you tried to map to microbial genomes (which is probably your post-QC/host removal read count) since read count is mainly a confounder because when you map to microbial genomes, you’re more likely to see a particular species present if you have a higher total read count.

Saharb · January 25, 2026, 12:46am

Thanks, Will! that’s very helpful. For the per-sample total microbial-eligible read count (post-QC / post-host removal), would you recommend log-transforming this variable when including it as a covariate in downstream analyses?

WillNickols · January 25, 2026, 1:21am

If you’re using MaAsLin 3, I’d include it not log transformed (but leave the scale parameter TRUE like the defaults). For other analyses, it’s hard to give a blanket statement, but log transforming is likely a good idea.

Saharb · January 25, 2026, 2:33am

Thank you so much for getting back to me quickly and answering all my questions!!!

Topic		Replies	Views
Difference between metaphlan merged taxonomic abundance and humann3 joint taxonomic profile HUMAnN	2	783	August 31, 2021
Combine Output from Separate Runs of Single Sample HUMAnN	3	107	December 5, 2025
Combine HUMAnN gene families file for MaAsLin HUMAnN	3	373	December 8, 2023
Normalizing the merge abundance table MetaPhlAn	2	1074	September 24, 2020
HUMANN 3.0 function downstream analysis HUMAnN	3	1761	January 16, 2023

Merging results of metaphlan and humman tables for two batches of the same study (two different timepoints)

Related topics