Hi Biobakery Team!
I am noticing that the sample2markers.py script is taking up very large amounts of memory. With a input sam.bz2 of 134M, it ends up consuming 80GB memory. Is this a bug? Or is there a way of executing it on fragments of the alignment and recombining to make it more scalable?
Hi @nickp60
Thanks for reporting this, we never experimented such a high consumption of RAM when executing sample2markers.py Could it be possible to share the input sam file to have a better idea of what is going on?
Then I think it is expected, the memory consumption of sample2markers will grow linearly with the number of cores used. However, if you are interested we are currently working in a new version that should speed up the process while maintaining a stable consumption of memory. You can check an alpha version of the code in this branch of the mpa repository: GitHub - biobakery/MetaPhlAn at sample2markers_speedup