Metagenomic and min_abundance filtering

Hello, I am new to analyzing metagenomic data rather than just 16S. For my analysis, I have count data generated from Braken. I subset the data to have two files, one containing spp. and genus (hopefully that is a good approach). Both have not been normalized prior to running the model. This type of data contains so much more than what I was seeing with ASV counts. Because of that, I am getting back a lot of bacteria with a significance at Q=0.25 when using the following-

fit_func = Maaslin2(
input_data = taxa,
input_metadata = meta,
output = “bacteria_species”,
fixed_effects = c(“Treatment”),
random_effects = c(“Subject”),
min_abundance = 0.1,
min_prevalence = 0.70,
reference = c(“Treatment, BL”),
analysis_method = “LM”, ##Linear model, this is the default
normalization = ‘TSS’) ### relative abundance

I thought since I had so much, I could set the min_abundance and min_prevalence higher to filter. However, I do not lose much when I set the min_abundance to 0.1. Just to see if I was understanding, I set the min_abundance to 100 and still got back bacteria which were unfiltered (around 400 spp). Does this refer to % abundance since I set my normalization to TSS? I would have thought setting min_abundance to 100 would make it so none would fit.

Thank you for all your help!

Using ‘1.10.0’

The prevalence/abundance filtering happens before the normalization. If you’re familiar with R, you can try taking a look at the relevant bit of source code (from the line I linked down to about line 865) to get a sense of exactly how the filters are applied. Specifically in regards to your question, the min_abundance filter drops bugs that don’t have enough samples above the specified value in the unnormalized input data.

Ah, thanks Andrew. Always a lot of help. That would be why it’s still showing up after applying 100 in min_abundance, because its dropping below 100 counts, not 100% relative abundance.