Dear bioBakery devs,
I am a novice in metagenomic / metatranscriptomics analysis and slowly progressing on the steep learning curve.
I came across your article “Statistical approaches for differential expression analysis in metatranscriptomics” from @YancongZhang and since I have both MGX and MTX data, the methods you propose for normalizing the MTX data with paired MGX data is something I’d like to apply.
The experiment is about comparing the effect of phytoremediation on a contaminant. We sampled from both the planted and unplanted mesocosms over time which generated longitudinal data with 4 time points. At each time point I have 8 samples as in this post.
I have transformed both the DNA and RNA datasets with a TSS transformation. I then proceeded to subset the genes that are present in each of the 2 condition and in 3 out of 4 samples in either condition, to have more “confidence” in the differential abundance results. This still leaves me with 7 850 344 genes. I’ve seen in this post, that @franzosa advised to replace the removed genes with a single per-sample “other” feature that absorbs the removed mass; which I haven’t done for now…
I ran the following model :
> fit_data <- MTXmodel(D42_0.75_ubi_rna_t,
> D42_sed_sdata,
> output="MTXmodel_test",
> fixed_effects = c("Sample_type"),
> random_effects = NULL,
> reference = "Sample_type,No_plant",
> min_abundance = 0.0,
> min_prevalence = 0.0,
> normalization = 'NONE',
> transform = "LOG",
> correction="BH",
> max_significance=0.05,
> standardize = FALSE,
> input_dnadata = D42_0.75_ubi_gene_t,
> rna_dna_flt = "lenient" )
and here is part of the output:
> 2024-05-02 14:50:55.54114 INFO::Total samples in data: 8
> 2024-05-02 14:50:55.542702 INFO::Min samples required with min abundance for a feature not to be filtered: 0.000000
> 2024-05-02 14:56:06.15202 INFO::Total filtered features: 4024563
From what i understand, the model filtered out 4024563 features. Still the model timed out after 5hours of computing an a 120G ram cluster.
My questions are :
-
Would you reduce even more the number of features ?
-
If so would you rather use the min_abundance and min_prevalence parameters from the model. Would that solve the problem created by the manual subsetting which causes the libraries to differ in size ?
-
Does the MTXmodel function automaticly detects which model number to apply, in my case it would be M4.
Many thanks,
Simon