Gene family analysis FDR correction- nothing is significant

SHN · March 23, 2022, 9:38pm

Good Afternoon,

I am working with a gene family dataset with CPM abundance normalization (the output of HUMAnN3). I am running this file with Maasline2. I know that I don’t need to do any additional normalization on the CPM values as these are not count data.
In the dataset, I have both the UniRef90 protein name and UniRef90| Unclassified. After removing the Unclassified rows as recommended by Maasline2, I end up with around 2 million gene_family. My issue is that after the analysis and the FDR correction, no gene_family survived significant. I was wondering if anyone encountered this issue, whether my approach is correct and if there is anything I can do?

fit_data1 = Maaslin2(input_data = gene_families, input_metadata = meta1, transform=“NONE”, analysis_method = “LM”, output = “Analysis1.output”, fixed_effects = c(“Diagnosis”), reference = c(“Diagnosis”, “group1”),
min_prevalence = 0.1, min_abundance = 0.0001, normalization = “NONE”)

Thank you,

franzosa · March 25, 2022, 3:21pm

We don’t normally recommend testing all quantified genes in this way (due to the power issues you’re encountering). I would instead regroup your genes to broader functional units (e.g. ECs or GO processes) and test those instead - this will mean doing thousands of tests rather than millions.

If you really want to do the testing at gene-level resolution, you will need to narrow your list of hypotheses first. For example, testing only the most abundant or variable genes, or perhaps only genes from a subset of species you’ve determined to be interesting a priori.

SHN · March 25, 2022, 5:05pm

Thanks for your response.
I am a newbie in the microbiome and HUMaAN3; but according to other posts, I should use the regroup_table from the HUMaAN3 to get to the EC s. I appreciate it if you could refer me to a link regarding this?!

I also have another question. Does regroup_table account for the UniRef90_Unclassified ones? I was thinking if the reference is just using the translated regions for protein annotations, then we might lose many new/novel proteins here, am I right?

Thanks

Topic		Replies	Views
Low number of EC IDs mapped from gene families in HUMANn3 HUMAnN	4	768	October 5, 2020
Should I first renorm gene_family to CPM and then regroup to pathway or on the contrary？ HUMAnN	5	892	December 5, 2023
Count of individual genes from ChocoPhLan database rather than UniRef gene family based RPK HUMAnN	2	468	January 8, 2021
Humann2_regroup_table for kegg : UNGROUPED! HUMAnN	3	660	December 15, 2022
Confusion with HUMAnN 'regroup_table' and higher-level pathway information HUMAnN	1	1182	February 2, 2024

Gene family analysis FDR correction- nothing is significant

Related topics