I am working with a longitudinal microbiome dataset, where there are differing number of samples per patient, approximately 10 per patient. I would like to find differential taxa based on an outcome of interest.
Is it ok to use this dataset as is, or does one need to keep same number of samples for each patient or do time windows with one sample/patient in a time window?
Any help will be great.
Hi there,
I would suggest for longitudinal data to switch to our newer feature-wise tool of MaAsLin 2 in which you can set subject as a random effect. In this tool it does not matter if you have uneven numbers of samples pre-subject.
I hope this helps!
Best,
Kelsey
Hi Kelsey
Thanks for the response, is MaAsLin2 also to identify differential taxa based on a phenotype? Also, if we want to run Lefse would that be only with data at a single time point, i.e. single sample/patient?
Thanks,
Arti
Hi Arti,
Yes, MaAsLin 2 is able to ID differential taxa based on phenotypic data. LEfSe is a bit more complicated when it comes to Longitudinal study designs. I believe given the way LEfSe is designed an unbalanced study design could cause spurious hits to occur. Additionally, if you have baseline samples LEfSe attempts to find characteristics of your phenotypes that are different in all subclasses (say timepoints) which with a case/control treatment design may not be in line with the question you are asking. In this case, if you would like to stick with using LEfSe I would try something like stratified by time association tests, as you suggest.
Best,
Kelsey