Analysis of longitudinal data

Hi all,

I’m having a longitudinal HUMAnN3 result and my bioinformatics skills were not expert enough to deal with this yet, therefore I really need some insights and guidance.

About the experiment, let’s say there are 2 group, group C and N. In each group, there were 7 mice (the mice were different in each group), the stool samples from the mice were collected within 24hrs at 7 different time points, let’s say from T1 to T7. Thus I would have 98 HUMAnN results (7 mice x 7 timepoints in each group C and N).

First, I want to check within a group C or N, is there any pathway differential from one to another? For example, I would put Timepoint T1 as reference, and I assume by doing this it will compare pathway A at T1 to T2, T3,… T7?

The input_data is the cpm unstratified HUMAnN result from group N only. fixed_effects, reference, random_effects were set as in the code but I am not sure if it’s correct for my current research question or not.

fit_data <- maaslin3(
input_data      = features_N,
input_metadata  = meta_short,
output          = “group_N”,
fixed_effects   = c(“Timepoint”),
reference       = “Timepoint,T1”,
random_effects = c(“MouseID”),
normalization   = “NONE”,     # Since I already did the humann_renorm_table?
transform       = “LOG”,
max_significance = 0.05,    
plot_associations = TRUE      
)

Is there anything I should change in my current code. Is there any other tools/packages you can recommend other than MaAsLin3 if this one will not work with my current data (but I believe it should work).

Many thanks in advance,
Huy

Yes - this all looks correct for your question. This will give you each time point compared against the baseline. Since you’re using an unstratified table, your normalized values should sum to 1 (proportions of a whole) within each sample. If this is already the case, setting normalization to none should be fine.

Will

​​​​​​​Hi Will,

Thank you for your answer, I hope you don’t mind looking at some of my results to make sure I have the correct interpretation, and also there is something missing in the result I think.

  1. In the heatmap and dot plot, I can see that the prevalence value is missing in many pathways, what could be the explanation of this?

  2. In the dot plot, are the vertical lines (abundance and prevalence) the “base lines”? For example, in dot plot Timepoint 20, if a circle or a triangle falls on the left side in one pathway, means that that certain pathway has a lower abundance/prevalence in Timepoint 20 compared to the “reference” in the code. Together with the color (dark violet/dark green) means it’s lower and it’s significant.

  3. I have a hard time understanding the heatmap, what does the beta coef value tell me in this case? Does it, for example, at time point 24, the color at that time point in a certain pathway is blue, mean that that pathway has a lower abundance/prevalence at time point 24 compared to the reference group?

  4. Running the upper code gives me a folder with the path is association_plots/Timepoint/linear and there are 5 png pictures, the pictures is an empty plot however, which could be the explanation for this?

So far that’s all my question regarding the result. I hope you can help me clear things up.

Best, Huy

  1. Checking the full results file would probably be more useful, but likely the issue is that these pathways are never absent, so there’s no sense in fitting a presence/absence model to them.
  2. The vertical lines are the null hypothesis (top of the legend), which is the median of the coefficients. This is the value the coefficients are compared against to determine significance.
  3. The coefficients are the same as in the output table and show, in your case, the relative increase or decrease in abundance/prevalence relative to whichever time point you set as the baseline.
  4. I’m not sure what the issue is with the pngs just from looking at it. If you want to email me a chunk of the data and code that can reproduce this at willnickols@g.harvard.edu I can check what’s going on.

Will