Input Data for MMUPHin

Hello!

The input data file of feature-by-sample matrix should a compilation of all the feature data from all studies I have pre-processed which is cross-referenced in the metadata file, right? Do each of the datasets have to be normalized/standardized separately prior compilation? Or is that the done in the “adjust_batch” step of the MMUPHin pipeline?

Thanks!

Salma

The input data file of feature-by-sample matrix should a compilation of all the feature data from all studies I have pre-processed which is cross-referenced in the metadata file, right?

Yes

Do each of the datasets have to be normalized/standardized separately prior compilation?

No, but you need to make sure they’re either proportions or counts.

Or is that the done in the “adjust_batch” step of the MMUPHin pipeline?

Yes

You can find examples in section 3 of the vignette here: Performing meta-analyses of microbiome studies with MMUPHin

1 Like

Thank you so much! That’s what I understood, it’s that section of the vignette that got me confused.

“It might be worthwhile to read through these as they perform many of the common tasks for preprocessing microbial feature abundance data in R, including sample/feature subsetting, normalization, filtering, etc.”

This made me second guess to whether or not my data should be normalised.

Thanks again!

Oh, I think that sentence is just referring to the short bit of code used to prepare CRC_abd and CRC_meta from the curatedMetagenomicData package. You can see that code by looking at the examples section of the help documentation for each: ?CRC_abd and ?CRC_meta