Uncertainty about Execution time of Halla (As for my data it‘s much time-consuming)

Dear Halla:

I have encountered an problem when running latest Halla 0.8.17 : For now, it has been taking around 14 hours to conduct similarity calculation between two datasets features calculation.

I am wonder an expected estimation or acceptable time interval for my task, to help take normal and abnormal conditions apart.

I have assign 8 cores (with --nproc para) but it does not work under WSL environment of Windows 10 of my laptop.

I have list my data, command, and system info:

20 features and 15 samples are used from first datesets

19837 features and 15 samples are used from second datesets

Output files will be written to: C:\Users\path\dir

0:03:14.560000 h:m:s similarity caluclation between two datasets features time —

  • Intel I7-4720HQ and 16 GB memory with 18% and 40% occupation respectively.

  • Run program in python2.7 environment created by Anaconda.

  • Example data run swiftly and no Errors.

  • Follow the instructor

halla  -X  huichang_2020-02-18_genus_halla.txt
-Y huichang_2020-02-18_genes_halla.txt  
--nproc 8
-o pouchitis_output
-m spearman   --header    -q 0.05

Thank you very much. Any suggestion is welcome. I look forward for your reply.

Best regards,

Derek

Hello Derek, Sorry for the slow response. HAllA was developed and tested on Linux and Mac OS platforms. If you are running on Windows if you could possibly use HAllA in docker it should resolve the sub-process issue you are seeing. With the multiple processes running it should speed up your runs though I am not sure exactly how much time they might take. Sorry we will gather benchmarks so we have this data available in the future.

Thank you,
Lauren