FUGAsseM Coexpression-based prediction input file preparation

Dear officer,

I have a question regarding FUGAsseM Coexpression-based input file preparation. I’ve read the tutorial and have two questions about the input file preparation.

My first question concerns the MTX abundances table for protein families in each sample. I understand that I can use HUMAnN to generate a normalized GO format protein list for each sample, and I should skip the nucleotide-level pangenome mapping during this process. Is that correct? Once I obtain the normalized protein list from each sample, should I then calculate the total protein abundance for each bacterium in every sample? Is that correct?

My second question relates to the GO gene-cluster list. Is this list meant to identify which bacteria possess specific GO proteins (i.e., excluding proteins with 0 abundance for each bacterium)?

Best wishes,
Shuyuan

Hi Shuyuan,

Thanks for your interest in FUGAsseM!

Re1: Preparing input MTX tables from HUMAnN outputs is straightforward. 1) Run HUMAnN to generate the genefamilies table (i.e., UniRef90 clusters) for MTX samples with stratification enabled. 2) Use HUMAnN’s utility humann_renorm_table to renormalize the stratified features by setting "--mode levelwise”, which normalizes the expression of each UniRef90 entry within its corresponding species.

Re2: The second input is a mapping file linking gene families (e.g., UniRef90 identifiers) to functions (e.g., GO terms). If your goal is to predict functions at the GO term level, FUGAsseM can automatically construct informative GO term groupings by enabling the --go-mode parameter. Additional details and example workflows are available on the tutorial page (https://github.com/biobakery/biobakery/wiki/FUGAsseM#3-predicting-functions-using-fugassem-with-advanced-settings)

Best,
Yancong