Picrust2 data to Maaslin2

Hello,
I wanted to thank you for such an incredible software that helps to deal with differential abundance in microbiome data! I am new to Maaslin2, and I am trying to fully grasp the entire concept of multivariate association in microbiome data. So far, I haven’t had a problem using my microbiome compositional 16S v3v4 data, but when it comes to using a metagenome prediciton table from PiCRUST2, I am not sure if I understand what Maaslin2 is giving me based on what I provide as input.

I am using the Maaslin2 package version 1.15.1, and here’s the function that I am running:

fit_data = Maaslin2(input_data = maaslin_pathways_df, input_metadata = metadata, output="test", 
                    fixed_effects="Community", normalization="NONE",transform = "NONE")

To my understanding, it is better if I leave the normalization to ‘NONE’, as the pathway prediction output from PiCRUST2 has been already normalized. Now, my problem comes when I try using a transformation approach. If I leave it as ‘none’, I get my results without any problem, but with extreme values as coefficients:


|feature|metadata|value|coef|stderr|N|N.not.0|pval|qval|
|---|---|---|---|---|---|---|---|---|
|A190|Community|IAB+|-17529.95233|565.466816|144|69|4.39E-65|8.34E-63|
|A258|Community|IAB+|-8176.950351|473.0754|144|40|9.60E-37|9.12E-35|
|A235|Community|IAB+|-9766.651068|825.5791585|144|68|6.67E-23|4.23E-21|
|A226|Community|IAB+|11065.48753|990.4606028|144|29|3.45E-21|1.64E-19|
|A108|Community|IAB+|15395.7534|1393.26793|144|82|7.17E-21|2.72E-19|
|A189|Community|IAB+|11519.99242|1082.164736|144|29|8.07E-20|2.56E-18|
|A222|Community|IAB+|10158.40909|973.2338364|144|25|2.78E-19|7.55E-18|
|A159|Community|IAB+|30014.99033|2891.98848|144|84|3.96E-19|9.40E-18|
|A276|Community|IAB+|33228.05812|3215.108635|144|99|5.13E-19|1.08E-17|
|A175|Community|IAB+|34423.63919|3337.614958|144|102|5.82E-19|1.11E-17|
|A161|Community|IAB+|35026.82739|3409.791572|144|103|7.45E-19|1.28E-17|
|A253|Community|IAB+|34817.69019|3398.549856|144|110|8.77E-19|1.28E-17|
|A281|Community|IAB+|35719.82409|3484.775709|144|114|8.50E-19|1.28E-17|
|A31|Community|IAB+|27907.52411|2768.099388|144|92|2.31E-18|3.14E-17|
|A103|Community|IAB+|-4047.986773|426.3939224|144|17|7.44E-17|9.43E-16|

My guess is that I am skipping the log2 transformation that Maaslin2 does to my data, and thus I get these values. When I try using a “LOG” transformation to my file, it gives me the following error:

Error in min(x[x > 0])/2 : non-numeric argument to binary operator

I suppose this is because I have negative values in my normalized output from PiCRUST2, correct? If that’s the case, I think I should stick to the coefficient values I am getting from Maaslin2 without applying normalization or transformation. But, when it comes to generating a volcano plot, I suppose people are ‘used to seeing Log2(FC)’. Is there a way where I can simply get the Log2(FC)? I considered using the non-normalized output from PiCRUST2 and then normalize it with Maaslin2, but given that PiCRUST2 already considers 16S variation and ASV abundance, I’d rather use that approach instead.

So I guess my questions are:

  • Should I use the maaslin2 coefficients without transformation and normalization from the output I get?
  • If so, is there a simple way to plot the coefficient numbers into a more ‘reader-friendly’ Log2(FC) using this data?
  • Should I just throw this away and use PiCRUST2 non-normalized pathway output and normalize/transform using Maaslin2?

Thanks a bunch!

Daniel CM

Hi there,

PiCRUST2 normalizes data by the estimated copy numbers in the genomes but I would suggest still analysis the data using something like TSS.

Thanks,
Jacob Nearing

Thank you for your fast response, @nearinj . I tried doing what you suggested by running this code:


fit_data = Maaslin2(input_data = maaslin_pathways_df, input_metadata = metadata, output="test", 
                    fixed_effects="Community", normalization="TSS",transform = "NONE")

Which gave me the following error:


[1] "Warning: Deleting existing log file: test/maaslin2.log"
2024-07-16 23:40:06.509337 INFO::Writing function arguments to log file
2024-07-16 23:40:06.517296 INFO::Verifying options selected are valid
2024-07-16 23:40:06.51873 INFO::Determining format of input files
2024-07-16 23:40:06.519922 INFO::Input format is data samples as rows and metadata samples as rows
2024-07-16 23:40:06.529866 INFO::Formula for fixed effects: expr ~  Community
2024-07-16 23:40:06.531565 INFO::Filter data based on min abundance and min prevalence
2024-07-16 23:40:06.532707 INFO::Total samples in data: 144
2024-07-16 23:40:06.533835 INFO::Min samples required with min abundance for a feature not to be filtered: 14.400000
2024-07-16 23:40:06.547121 INFO::Total filtered features: 113
2024-07-16 23:40:06.548561 INFO::Filtered feature names from abundance and prevalence filtering: A2, A3, A6, A7, A8, A13, A17, A20, A24, A27, A28, A29, A37, A40, A41, A48, A51, A52, A53, A54, A55, A56, A57, A63, A64, A65, A66, A68, A69, A74, A75, A76, A78, A79, A80, A82, A83, A85, A86, A90, A91, A97, A98, A100, A104, A107, A109, A117, A118, A119, A122, A123, A127, A134, A137, A138, A142, A143, A144, A145, A146, A147, A149, A151, A155, A157, A158, A162, A172, A178, A182, A184, A192, A196, A197, A198, A199, A202, A205, A207, A210, A213, A214, A215, A216, A234, A237, A241, A242, A243, A244, A245, A247, A248, A255, A257, A259, A264, A265, A266, A267, A271, A272, A273, A277, A278, A285, A288, A294, A295, A297, A301, A302
2024-07-16 23:40:06.561035 INFO::Total filtered features with variance filtering: 0
2024-07-16 23:40:06.562391 INFO::Filtered feature names from variance filtering:
2024-07-16 23:40:06.563462 INFO::Running selected normalization method: TSS
Error in FUN(newX[, i], ...) : invalid 'type' (character) of argument
In addition: Warning message:
In vegan::decostand(features_norm, method = "total", MARGIN = 1,  :
  input data contains negative entries: result may be non-sense

But when I run it without any normalization, it runs smoothly and I get the output files I am looking for. Is there a reason for this? This is the code that I run without normalizing it:

fit_data = Maaslin2(input_data = maaslin_pathways_df, input_metadata = metadata, output="test", 
                    fixed_effects="Community", normalization="NONE",transform = "NONE")

I am also attaching the output file from PiCRUST2. I modified the naming of the pathways to avoid any ‘naming’ problems when running Maaslin, so now my pathways are coded.
path_abun_unstrat_renamed.txt (470.3 KB)

Thanks in advance!

Daniel CM

Hello,

I am also attaching the full code in case that could help you understand better what I did. Sorry for the spam

setwd(path_picrust2)
maaslin_pathways_df<-(read.delim("pathways_out/path_abun_unstrat_renamed.txt",
                                 header = TRUE))                                  #I had to rename the file, as the pathways names were generating an error in Maaslin2
colnames(maaslin_pathways_df) <- sub("^.", "", colnames(maaslin_pathways_df), 
                                     perl = TRUE)                                 #R was adding an 'X' at the beginning of my sample names, so I removed that 'X'.
maaslin_pathways_df<-t(maaslin_pathways_df)
colnames(maaslin_pathways_df)<-maaslin_pathways_df[1,]
maaslin_pathways_df<-maaslin_pathways_df[-1,]
maaslin_pathways_df<-as.data.frame(maaslin_pathways_df)
View(maaslin_pathways_df)
metadata<-as.data.frame(sample_data(ps))
class(metadata)<-"data.frame"
#class(metadata)

fit_data = Maaslin2(input_data = maaslin_pathways_df, input_metadata = metadata, output="test", 
                    fixed_effects="Community", normalization="TSS",transform = "NONE")

Hi @daniel.castanedamogo ,

I was able to run your data using “TSS” normalization using some dumby metadata with the following code:

test <- read.table("~/Downloads/path_abun_unstrat_renamed.txt", sep="\t", header=T, row.names=1)


random_data <- data.frame(row.names = colnames(test),
                          rand_meta = sample(c(1,0), ncol(test), replace = T))



library(Maaslin2)

nfit_data = Maaslin2(input_data = test, input_metadata = random_data, output="~/Desktop/TEST/m2_question", 
                    fixed_effects="rand_meta", normalization="NONE",transform = "NONE")


nfit_data = Maaslin2(input_data = test, input_metadata = random_data, output="~/Desktop/TEST/m2_question", 
                     fixed_effects="rand_meta", normalization="TSS",transform = "NONE")

Which indicates to me that potentially there might be some issues in the way you are loading the data.

Thanks,
Jacob

Hi Jacob,

Thanks for your response. Sorry, I don’t understand what you mean by loading the data. My code looks like this:


path_picrust2 = "/Users/danielcm/Desktop/Sycuro/Projects/Diabetes/picrust2/picrust_july2024/default_run/default_output/"
setwd(path_picrust2)
maaslin_pathways_df<-(read.delim("pathways_out/path_abun_unstrat_renamed.txt",
                                 header = TRUE))                                  #I had to rename the file, as the pathways names were generating an error in Maaslin2
colnames(maaslin_pathways_df) <- sub("^.", "", colnames(maaslin_pathways_df), 
                                     perl = TRUE)                                 #R was adding an 'X' at the beginning of my sample names, so I removed that 'X'.
maaslin_pathways_df<-t(maaslin_pathways_df)
colnames(maaslin_pathways_df)<-maaslin_pathways_df[1,]
maaslin_pathways_df<-maaslin_pathways_df[-1,]
maaslin_pathways_df<-as.data.frame(maaslin_pathways_df)
View(maaslin_pathways_df)
metadata<-as.data.frame(sample_data(ps))
class(metadata)<-"data.frame"
#class(metadata)

fit_data = Maaslin2(input_data = maaslin_pathways_df, input_metadata = metadata, output="test", 
                    fixed_effects="Community", normalization="TSS",transform = "NONE")
fit_data = Maaslin2(input_data = maaslin_pathways_df, input_metadata = metadata, output="test", 
                    fixed_effects="Community", normalization="NONE",transform = "NONE")

I am unsure if it’s how I’m loading the data, as it works fine when I set the normalization to ‘NONE’, but then it crashes with ‘TSS’. I also read somewhere that the error I am getting

(Error in FUN(newX[, i], ...) : invalid 'type' (character) of argument
In addition: Warning message:
In vegan::decostand(features_norm, method = "total", MARGIN = 1,  :
  input data contains negative entries: result may be non-sense

could be due to a version issue, but I think I have the latest Maaslin2 installed.

Here’s my metadata in case that helps:
metadata_file.txt (20.3 KB)

Thanks again,

Daniel

Hi @daniel.castanedamogo,

It looks like you are reading in your feature table as characters and then not converting them to numerics before passing them into maaslin2.

Try adding the fellow line of code before inputting your data into the maaslin2 workflow:

 maaslin_pathways_df[] <- lapply(maaslin_pathways_df, as.numeric)

thanks,
Jacob

Oh my god! That did the job! I’m unsure why it was saved as characters, but now that’s fixed and working fine! Thank you so much!