Long running time with the --bypass-prescreen mode

Hello everyone,

I am interested in knowing the functional potential of the microbial communities in some soil metagenomic samples. After following the standard pipeline of HUMAnN3, I obtained some decent results regarding pathways and gene families. However, the relative abundance of unmapped results seems to be pretty high with roughly 93%. Therefore, I thought I might be able to improve the mapping rate by skipping the taxonomic profiling step using the –bypass-prescreen flag.

HUMAnN3 did not complete the task in 3 days by utilizing 187G of memory (48 cores) on a university HPC when I turn on the —bypass-prescreen flag on a sample with a size of ~15GB (obtained by merging 150bp paired end reads). The process took 16 hours with the same setting without —bypass-prescreen.

So my questions here are:

  1. Am I using the —bypass-prescreen function for the right purpose, can I actually improve the mapping rate with it?
  2. I am expecting it to take much longer since it will utilize the whole Chocophlan DB. Can I get any perspective as to how much longer is it going to take?
  3. The job stopped running after 3 days, I’d like to get it running again using - -resume , but it didn’t work, is there a compatibility issue when you have both —resume and —bypass-prescreen?

Any insight would be much appreciated!

Best,
Rui

--bypass-prescreen will index the entire ChocoPhlAn database for a really broad nucleotide-level search, which is not usually the best course (the option is mostly there for testing purposes). For a soil sample I’d recommend switching to the UniRef50 database, which will allow HUMAnN to pursue more relaxed mapping during the translated search phase (thus increasing your unclassified mapping rate).

1 Like

Thank you for your advice, I will give it a try!