HAllA no FDR adjustment

SimonWetzel · November 20, 2022, 11:04am

Hi everyone,

Is it possible to run the newest HAllA version without any FDR adjustment?

Thank you in advance,
Simon

andrewGhazi · November 20, 2022, 6:47pm

Sorry, the fdr_method argument only accepts the same set of values allowed by statsmodels.stats.multitest.multipletests :

https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html

SimonWetzel · November 21, 2022, 7:59am

Thank you for your reply! Is it somehow possible to implement the argument “no_adjustment” from ealier HAllA legacy versions or to install an earlier release of HAllA?

andrewGhazi · November 21, 2022, 9:08pm

I think it would be relatively simple to add a control flow block to this function, but I don’t have bandwidth right now to implement and test it. Something like:

if method == "none": return(pvalues)

If you’re comfortable tinkering with the code and installing from source, you could give that a try.

github.com

biobakery/halla/blob/master/halla/utils/stats.py#L185


      
                                              permute_func=permute_func,
                                              iters=permute_iters,
                                              speedup=permute_speedup, alpha=alpha, seed=seed)
          
          
    test_end = time()
              test_length = test_end - test_start
              extrapolated_time = 4 * test_length * X.shape[0] * Y.shape[0] / n_threads # 4 = safety factor
              timing_string = "The first p-value computation took about " + str(round(test_length, 3)) + " seconds. Extrapolating from this, computing the entire p-value table should take roughly " + str(round(extrapolated_time,3)) + " seconds..."
              return(extrapolated_time, timing_string)
          
          
def pvalues2qvalues(pvalues, method='fdr_bh', alpha=0.05):
              '''Perform p-value correction for multiple tests (Benjamini/Hochberg)
              Args:
              - pvalues: a 1D-array of pvalues
              - alpha  : family-wise error rate
              Return a tuple (adjusted p-value array, boolean array [True = reject]
              '''
              return(multipletests(pvalues, alpha=alpha, method=method)[:2])
          
          
def compute_result_power(significant_blocks, true_assoc):
              '''Compute power (recall: TP / condition positive) given args:

SimonWetzel · November 22, 2022, 9:39am

Thank you very much! I added a control flow block to the function like this:

def pvalues2qvalues(pvalues, method=“none”, alpha=0.05):
‘’‘Perform p-value correction for multiple tests (Benjamini/Hochberg)
Args:
- pvalues: a 1D-array of pvalues
- alpha : family-wise error rate
Return a tuple (adjusted p-value array, boolean array [True = reject]
‘’’
if method == “none”:
return(pvalues)
return(multipletests(pvalues, alpha=alpha, method=method)[:2])

Then I created a new conda enviroment and set it up for HAllA by running:

python3 setup.py develop

If I run HAllA now, the added control flow block seems to be recognized as reported by the config parameters below:

halla -x crohn_bacterial_species.txt -y crohn_fungal_species.txt -o Crohn_Species_none --fdr_method=“none”
"
Setting config parameters (irrelevant parameters will be ignored)…
preprocess:
max freq thresh : 1
transform funcs : None
discretize bypass if possible : True
discretize func : None
discretize num bins : None
association:
pdist metric : spearman
hierarchy:
sim2dist set abs : True
sim2dist func : None
linkage method : average
permute:
iters : 1000
func : gpd
speedup : True
stats:
fdr alpha : 0.05
fdr method : none
fnr thresh : 0.2
rank cluster : best
output:
dir : Crohn_Species_none
verbose : True
== Loading and preprocessing data ==
Preprocessing step completed:

X shape (# features, # size) (32, 17)

Y shape (# features, # size) (16, 17)
== Completed; total duration: 0:00:00.024488 ==

== Performing HAllA ==
– Step 1: Computing pairwise similarities, p-values, and q-values –
Generating the similarity table…
Generating the p-value table…
100%|██████████████████████████████████████████| 32/32 [00:00<00:00, 207.13it/s]
Generating the q-value table…
Traceback (most recent call last):
File “/usr/local/bin/halla”, line 11, in
load_entry_point(‘HAllA’, ‘console_scripts’, ‘halla’)()
File “/home/simonw/Schreibtisch/R Scripts/DNA Sequencing/5_ High-sensitivity pattern discovery (HAllA)/halla-master/scripts/halla.py”, line 258, in main
instance.run()
File “/home/simonw/Schreibtisch/R Scripts/DNA Sequencing/5_ High-sensitivity pattern discovery (HAllA)/halla-master/halla/main.py”, line 395, in run
self.compute_pairwise_similarities()
File "/home/simonw/Schreibtisch/R Scripts/DNA Sequencing/5 High-sensitivity pattern discovery (HAllA)/halla-master/halla/main.py", line 112, in _compute_pairwise_similarities
self.fdr_reject_table, self.qvalue_table = pvalues2qvalues(self.pvalue_table.flatten(), config.stats[‘fdr_method’], config.stats[‘fdr_alpha’])
ValueError: too many values to unpack (expected 2)
Error in sys.excepthook:
Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/apport_python_hook.py”, line 72, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File “/usr/lib/python3/dist-packages/apport/init.py”, line 5, in
from apport.report import Report
File “/usr/lib/python3/dist-packages/apport/report.py”, line 13, in
import fnmatch, glob, traceback, errno, sys, atexit, locale, imp, stat
File “/usr/lib/python3.8/imp.py”, line 31, in
warnings.warn("the imp module is deprecated in favour of importlib; "
DeprecationWarning: the imp module is deprecated in favour of importlib; see the module’s documentation for alternative uses

Original exception was:
Traceback (most recent call last):
File “/usr/local/bin/halla”, line 11, in
load_entry_point(‘HAllA’, ‘console_scripts’, ‘halla’)()
File “/home/simonw/Schreibtisch/R Scripts/DNA Sequencing/5_ High-sensitivity pattern discovery (HAllA)/halla-master/scripts/halla.py”, line 258, in main
instance.run()
File “/home/simonw/Schreibtisch/R Scripts/DNA Sequencing/5_ High-sensitivity pattern discovery (HAllA)/halla-master/halla/main.py”, line 395, in run
self.compute_pairwise_similarities()
File "/home/simonw/Schreibtisch/R Scripts/DNA Sequencing/5 High-sensitivity pattern discovery (HAllA)/halla-master/halla/main.py", line 112, in _compute_pairwise_similarities
self.fdr_reject_table, self.qvalue_table = pvalues2qvalues(self.pvalue_table.flatten(), config.stats[‘fdr_method’], config.stats[‘fdr_alpha’])
ValueError: too many values to unpack (expected 2)

However, this error occurs after that, implying that there are to many values to unpack. Do you have any idea, how to solve this?

Best,
Simon

andrewGhazi · November 22, 2022, 2:45pm

From this part

self.fdr_reject_table, self.qvalue_table = pvalues2qvalues(self.pvalue_table.flatten(), config.stats[‘fdr_method’], config.stats[‘fdr_alpha’])
ValueError: too many values to unpack (expected 2)

It looks like naively returning the input doesn’t give the right size or type or something. Try running that multipletests()[:2] function with some test data to figure out what a typical return value is supposed to look like.

SimonWetzel · November 22, 2022, 4:58pm

Seems like multipletests returns a tuple:

#Return a tuple (adjusted p-value array, boolean array [True = reject])

I tried this, which didnt’t work:

if method == "none": 
    return(tuple(pvalues))

Does this error maybe occur, because the loop doesn’t create a tuple with the boolean array [True =reject]?

SimonWetzel · November 22, 2022, 5:06pm

If I run multipletests with test data…

pvals=[0.5,0.3,0.5]
s.multipletests(pvals)

I get this output:
(array([False, False, False]), array([0.75 , 0.657, 0.75 ]), 0.016952427508441503, 0.016666666666666666)

Topic		Replies	Views
Changing the FDR HAllA	4	394	May 2, 2021
Q-values in Maaslin2 MaAsLin	1	902	January 3, 2023
Applying a different p-value correction (storey 's q-value) MaAsLin	1	685	February 22, 2023
FDR adjustment in MaAslin2-update MaAsLin	2	1039	December 6, 2021
Adjusting for covariates in HAllA HAllA	9	785	February 9, 2021

HAllA no FDR adjustment

Related topics