Plotting Hallagram with big datasets

Hello HAllA users,

I’m currently using HAllA for a multi-omics study on the gut microbiome. The first data set is a long table composed of Metatranscriptomic, Metagenomic and 16S V4 data set (~30 000 rows) over 50 samples that I’m running over the second data set which is a Metabolomic data set.

The run is over, however the plots are not in the output folder has expected, but everything went well and I have this message on my console:

--- Writing plotting outputs to output_folder --- 22:33:03.557676 h:m:s plotting results time

Is that a bug from HAllA, or is that because my files are huge?
Is that worthwhile to run the Hallagram function?

  • similarity_table.txt (394M)
  • hypotheses_tree.txt (398K)
    -associations.txt (2.1M)

Do you have any recommendations?

Thank you very much!!


Hi Marie, my first suggestion is to try running the hallagram command with the -i option pointing to your output path to see if it can successfully generate a plot. However, the latest version of HAllA has a slightly different output structure from earlier versions, so you may need to re-run halla from the start. If you have an extremely large number of features among the significant associations detected, you may want to tweak the --block_num option from the default 30 to show more.