High Memory Usage of sample2markers.py

Hi Biobakery Team!
I am noticing that the sample2markers.py script is taking up very large amounts of memory. With a input sam.bz2 of 134M, it ends up consuming 80GB memory. Is this a bug? Or is there a way of executing it on fragments of the alignment and recombining to make it more scalable?

Thanks in advance!

Hi @nickp60
Thanks for reporting this, we never experimented such a high consumption of RAM when executing sample2markers.py Could it be possible to share the input sam file to have a better idea of what is going on?

Sure, whats the best email for you? I’ll send a download link. Thanks so much!

Here are the samtools stats, for the record:
sample2markers_highmem.stats.txt (30.8 KB)

Hi @nickp60
How many procs where you using for the sample2markers execution?