Input_data and input_metadata Specifications

How does the software match between data and metadata? Are the two data frames required to be in the same order or does the software match by row or column names? In the documentation,

input_data: The tab-delimited input file of features.
input_metadata: The tab-delimited input file of metadata.

Could more informative descriptions be provided instead with precise specifications of them? I tried providing a data frame with only a subset of samples, so the dimensions of the two data frames are different and it didn’t produce any errors. Did it work correctly? Perhaps there should be a \details{} section in the Rd file explaining about row and column names and expectations.

Hi @Dario,

Sorry for any confusion and thanks for the suggestions! Before MaAsLin runs the models it first identifies if the samples are in the rows or columns - by using the intersect command in R between the rownames and colnames of the input data tables. Then matches based on the previous step to filter down to only those samples with matching features and metadata. Then finally it reorders the two resulting data frames to be in the same order. Thus, in answer to your question if you provided a subset of the samples only and it ran error-free, then it did identify which rows/columns your samples were in, subset the two feature tables to match, and then ran without incident. If you want further information about a MaAsLin run you can always check the log file that is produced and it should log how many samples were included in the analysis etc.

We do provide more details about the requirements for the file types in our tutorial and the HTML vignette associated with Bioconductor. Again, apologies for any confusion that arose while running MaAsLin, and thank you for the additional suggestions on where to place this information to avoid confusion.

I hope this helps!
Best,
Kelsey