I can't see why I am getting this error in lefse in conda?

could you please guide me why I am getting an error?
I am working with the latest version of lefse in conda environment
I already have worked with the tutorial command (below attached from your shared workflow example)
without any error.

Could you please advise if it is preferable to convert my data into relative abundance?

# (updatedR) r01mt19@kitcat:~/intestineFinal/intestineSmartData/Final Final: STEPS/Chall ds mr vs veg/tables$ lefse_format_input.py lefseConda.txt lefseConda.in -c 1 -s 2 -u 3 -o 1000000
# (updatedR) r01mt19@kitcat:~/intestineFinal/intestineSmartData/Final Final: STEPS/Chall ds mr vs veg/tables$ lefse_run.py lefseConda.in lefseConda.res
# Number of significantly discriminative features: 1 ( 1 ) before internal wilcoxon
# R[write to console]: Error in lda.default(x, grouping, ...) : 
#   variable 1 appears to be constant within groups
# 
# R[write to console]: In addition: 
#   R[write to console]: Warning messages:
#   
#   R[write to console]: 1: package ‘survival’ was built under R version 4.1.1 
# 
# R[write to console]: 2: package ‘mvtnorm’ was built under R version 4.1.1 
# 
# R[write to console]: 3: package ‘coin’ was built under R version 4.1.1 
# 
# Traceback (most recent call last):
#   File "/home/r01mt19/.conda/envs/updatedR/bin/lefse_run.py", line 10, in <module>
#   sys.exit(lefse_run())
# File "/home/r01mt19/.conda/envs/updatedR/lib/python3.9/site-packages/lefse/lefse_run.py", line 90, in lefse_run
# if params['rank_tec'] == 'lda': lda_res,lda_res_th = test_lda_r(cls,feats,class_sl,params['n_boots'],params['f_boots'],params['lda_abs_th'],0.0000000001,params['nlogs'])
# File "/home/r01mt19/.conda/envs/updatedR/lib/python3.9/site-packages/lefse/lefse.py", line 206, in test_lda_r
# z = robjects.r('z <- suppressWarnings(lda(as.formula('+f+'),data=sub_d,tol='+str(tol_min)+'))')
# File "/home/r01mt19/.conda/envs/updatedR/lib/python3.9/site-packages/rpy2/robjects/__init__.py", line 438, in __call__
# res = self.eval(p)
# File "/home/r01mt19/.conda/envs/updatedR/lib/python3.9/site-packages/rpy2/robjects/functions.py", line 198, in __call__
# return (super(SignatureTranslatedFunction, self)
#         File "/home/r01mt19/.conda/envs/updatedR/lib/python3.9/site-packages/rpy2/robjects/functions.py", line 125, in __call__
#         res = super(Function, self).__call__(*new_args, **new_kwargs)
#         File "/home/r01mt19/.conda/envs/updatedR/lib/python3.9/site-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
#         cdata = function(*args, **kwargs)
#           File "/home/r01mt19/.conda/envs/updatedR/lib/python3.9/site-packages/rpy2/rinterface.py", line 680, in __call__
#         raise embedded.RRuntimeError(_rinterface._geterrmessage())
#         rpy2.rinterface_lib.embedded.RRuntimeError: Error in lda.default(x, grouping, ...) : 
#           variable 1 appears to be constant within groups

lefseConda.txt (223.2 KB)

Hello,
This issue arises when the LDA process is run on singular covariance matrix. I believe this is happening because in your data, your “class” and “subclass” variables are essentially the same (i.e. marine vs vegetable). Since that is the case, simply specify the class variable and remove the subclass variable from your dataset. Additionally, I recommend removing the “Rep1”, “Rep2”, etc. designations from your subject ID, since it assumes that samples taken from the same unit (site in your case, I believe) have the same values for the subject ID variable.
I hope that helps!
Best,
Meg

1 Like

many thanks for your response.
Do I need to have class and subclass? does it make any difference if there is no subclass (if both the class and subclass have the same outputs of classification but with different labels)?

I was advised by picrust2 users that I need to convert the count table to relative abundance (rather than absolute ones) before statistically analysing using lefse, could you please comment on that?

Kind regards
M

Subclass is not necessary, class is needed in order to have groups to test between. My understanding is that LEfSe should be able to take count data as input, and it will make the necessary transformation to relative abundances if needed.

1 Like

looking more deeper into my command and making changes to my input file as was adviced
lefse_format_input.py lefseConda.txt lefseConda.in -c 1 -s 2 -u 3 -o 1000000
so I kept class in the first line and removed the subclass and -u as the subject sample names, removing rep1, rep2 etc will leave the sample IDs for all of the reps the same.
lefse_format_input.py lefseConda1.txt lefseConda1.in -c 1 -u 2 -o 1000000
downstream analysis of the first command example on the left and of the second command example on the right:


lefseConda.txt & lefseConda1.txt

I don’t why there are differences in the results when changing the input file format?
I tried also to remove the rep1, rep2, etc and it gave no differential features between the groups
please let me know if you need any further info