Masslin2 runs forever for 530 K associations

YikeShen · January 8, 2021, 1:47pm

Hello bioBakery help forum,
I am running masslin2 for 530K associations from Humann3 genefamily_relab.tsv output. Summed Uniref90 to find genus associations with my exposure. The last step below takes forever. It’s running for more than 10 hours now, this is still counting…Could you help me solve this issue? The run is in HPC so I don’t see a memory problem.
2021-01-07 23:01:52 INFO::Counting total values for each feature

Thank you!
Yike

YikeShen · January 8, 2021, 5:19pm

results came after 15 hours… good reminder to request a long run time…

Kelsey_Thompson · January 11, 2021, 1:45pm

Hi!

Unfortunately with that many associations, the tool will take awhile to profile the associations within the community. You can improve the speed by reducing the search space (i.e. through things like filtering out low abundant or prevalent features or features that are common between your exposure and control).

I think from your message you are already doing this but make sure to also reduce the stratified table down to either the bugs contributing the function or the general gene families, this will also help the size of the dataset.

Finally, MaAsLin can be run in parallel with the --cores flag, which could also improve your run time.

I hope this helps!
Best,
Kelsey

YikeShen · January 11, 2021, 2:22pm

Hello Kelsey,
Thank you for your answer! one more question.
I was trying to explain which species are contributing to the association, and found a lot of redundant species with UniRef90 database output. I am just wondering how do people usually approach this redundancy? I only need data from one species association instead of multiple same association with different coef and q to show up in my paper. The associations for the same species were all positive, but the adjusted q value ranged from 0.4% to 10% and coefficient differs 0.1. Thank you!

himel.mallick · January 11, 2021, 5:42pm

Hi @YikeShen - in addition to @Kelsey_Thompson’s suggestion above on reducing the stratified table, I would suggest further filtering out features explainable by at most a single taxon before running MaAsLin 2. You can simply do that by discarding features with very high correlation with individual microbial abundances (a similar strategy was done in the original iHMP paper (see the last sentence of Differential microbiome feature abundance in Methods): https://www.nature.com/articles/s41586-019-1237-9). Hope this helps!

YikeShen · January 13, 2021, 6:19pm

Hello Biobakery help forum,
It looks like the most recent Humann 3 had the “human readable” UniRef protein attached to the output table. I already had my Humann3 runned and it took me a chunk of run time. Is there a downstream thing we can attach protein to the genefamily.tsv table?
Thank you!

Kelsey_Thompson · January 14, 2021, 9:00pm

Hi @YikeShen,

Yes, you can use the humann_rename_tables command to add human-readable names to the UniRefs- similar to how the tutorial does for ECs.

I hope this helps!
Kelsey

Topic		Replies	Views
Dividing an abundance table for multiple Maaslin runs MaAsLin	8	42	September 1, 2025
Maaslin3 stuck at filtering MaAsLin	9	126	March 24, 2025
Maaslin2 output error MaAsLin	8	1767	September 5, 2020
Output data reduction MaAsLin	1	315	August 26, 2022
About the Maaslin2 category MaAsLin	1	2246	August 9, 2022

Masslin2 runs forever for 530 K associations

Related topics