Tutorial help-newbie struggling

Thank you for the tutorial and the awesome tool.!!!
Hello, I HAVE SOME QIIME2 files which include a table.qza ( feature table file), a tree.nwk file, a metadata tsv file and taxonomy.qza file. Using the biome convert command, I made all of them into .tsv file .
Now I am trying to run MaAsLin 2 commands.
I imported the as followed.
input_metadata ← read.table(file = “/Users/bioinfo/Desktop/R_Maaslin_GHANA/sample-metadata.tsv”, sep = “\t”, header = T, row.names = 1)
input_data ← read.table(file = “/Users/bioinfo/Desktop/R_Maaslin_GHANA/taxonomy.tsv”, sep = “\t”, header = T, row.names = 1,
skip = 1, comment.char = " ")
this part worked fine.
now my problem is :
whenever I use this following command I get an error.
df_input_data = read.table(file = input_data, header = TRUE, sep = “\t”,
row.names = 1,
stringsAsFactors = FALSE)
df_input_data[1:5, 1:5]
Error in read.table(file = input_data, header = TRUE, sep = “\t”, row.names = 1, : **
** ‘file’ must be a character string or connection

Can someone please help? I tried load of tricks from online but nothing helps.

Another question is : Do the files used as input data and input metadata need to have the same id? My taxonomy file gives me OTU id and the metadata has sample id. Would that be a problem?

I am kinda struggling to understand the fixed and random effects as well as the reference command as well. I have 4 localities in my sample and want to see which bug or taxa is prevalent in each location.
so, I am assuming my fixed effects should be locality. There is another metadata column that indicates local density and I believe that to be a confounder. Should I add that to the fixed effect or the random effect? Also,as I am using four locality names as categorical data , would I need to set a refrence? and If I set reference name to be one of the localities would I include the name in the fixed effects as well.

I tried going over almost all the questions in the forum but I am still struggling. Looking forward to all your help.

I can not say edit my previous post and thus am attaching screenshots here of the data.I am looking forward to all your help.

Uploading: A6073278-522A-4625

Your first issue is that you’re calling read.table() on a data frame (input_data) instead of a file. You already have input_data read into R as a data frame after your second command. So just pass input_data to Maaslin2() and you’ll be fine.

For your second question, I don’t think I perfectly understand what you’re asking, but Maaslin does a little bit of auto-inspection to try to figure out how the rows and columns line up between the metadata and the data. It will print out a message about what it detects and possibly throw an error if it can’t figure it out. For the sake of simplicity it’s usually best to manipulate your data into the format shown in the tutorial. The dplyr and tidyr packages are probably the best things for you to try if you’re not experienced with that link.

You can try the commands below to peek at the tutorial files:

input_data <- system.file(
    'extdata','HMP2_taxonomy.tsv', package="Maaslin2")
input_metadata <-system.file(
    'extdata','HMP2_metadata.tsv', package="Maaslin2")

read.delim(input_data)[1:4, 1:4]

Third question: from your description it sounds like you want locality as a fixed effect. By “local density and I believe that to be a confounder” do you mean that local density influences bug abundances? If so that should also be a fixed effect.

Yes, if you have more than two localities you’ll need to set a reference e.g. reference = 'locality,accra' or whatever it needs to be.

Hello, Thank you for your reply. All the help is really appreciated.I have two questions.

  1. Could you tell me the basis or the rational behind choosing a reference?
    2.Something strange is happening after I read in my data.
    the feature table sample names are shuffled for some reason. The google sheet is the file I am reading into R and it somehow is changing it to some other sample name.
    I have attached the code screenshot along with my excel sheet screenshot and how R is showing the data frame.

![Screenshot 2022-09-14 at 14.54.29|690x431](upload://jg7


The reference locality is the locality other localities will be compared against in your case. Usually references are used to set baseline levels for factors e.g. if your conditions were control, drugA, drugB, you’d set control to be the reference.

The issue with your input data is hard to diagnose via screenshot. It could be the first line “# Constructed from …” throwing things off, it could be the extra comma you have in your command, it could be that the “#” in your first column name is making R ignore that line. You might try using readr::read_tsv() instead of read.table(), it tends to give more informative messages.

Thank you once again for your reply. I followed your suggestion and got this result back.My data was not parsed properly so used excel to fix that.
Howv, now when I run it a new problem shows up.

I can spot that you didn’t set the reference locality correctly, it needs a variable name paired with the reference value. I think the argument you want is reference = c("locality,barekuma").

Other than that, assuming Maaslin interpreted your command correctly, you may just not have any strong patterns in your data. You’ll have to inspect your output files to see if the regression statistics you got seem reasonable.

Thank you so much for all the help. I had just a few clarification to ask for.

  1. in my fixed effects, if the second variable ( local density) in my case has multiple layers, do I add references for them too?
    2.What is the rational behind choosing a random effect? and do we add reference layer for that if it has categorical data with more than 2 types ?

If you want to convert it to then treat it as a categorical variable you could, though I’m not sure if that would be advisable. In your earlier screenshot it looks to be numeric – converting numbers into categories usually involves a degree of information loss.

Random effects are usually used to handle things like repeated measurements from multiple individuals. A full explanation of what they’re for and how they work is beyond what can be done in a forum post. This course offers a good overview of random effects as part of multi-level models, though that particular lecture is pretty deep in the course, so it might be hard to just jump into that point.

In Maaslin, random effects don’t require a reference level.

Thank you for your reply and suggestion. I will go over the statistics courses video.