Long running time with the --bypass-prescreen mode

ruizzhan · June 7, 2021, 3:32am

Hello everyone,

I am interested in knowing the functional potential of the microbial communities in some soil metagenomic samples. After following the standard pipeline of HUMAnN3, I obtained some decent results regarding pathways and gene families. However, the relative abundance of unmapped results seems to be pretty high with roughly 93%. Therefore, I thought I might be able to improve the mapping rate by skipping the taxonomic profiling step using the –bypass-prescreen flag.

HUMAnN3 did not complete the task in 3 days by utilizing 187G of memory (48 cores) on a university HPC when I turn on the —bypass-prescreen flag on a sample with a size of ~15GB (obtained by merging 150bp paired end reads). The process took 16 hours with the same setting without —bypass-prescreen.

So my questions here are:

Am I using the —bypass-prescreen function for the right purpose, can I actually improve the mapping rate with it?
I am expecting it to take much longer since it will utilize the whole Chocophlan DB. Can I get any perspective as to how much longer is it going to take?
The job stopped running after 3 days, I’d like to get it running again using - -resume , but it didn’t work, is there a compatibility issue when you have both —resume and —bypass-prescreen?

Any insight would be much appreciated!

Best,
Rui

franzosa · June 9, 2021, 7:48pm

--bypass-prescreen will index the entire ChocoPhlAn database for a really broad nucleotide-level search, which is not usually the best course (the option is mostly there for testing purposes). For a soil sample I’d recommend switching to the UniRef50 database, which will allow HUMAnN to pursue more relaxed mapping during the translated search phase (thus increasing your unclassified mapping rate).

ruizzhan · June 10, 2021, 2:26am

Thank you for your advice, I will give it a try!

Topic		Replies	Views
Query regarding HUMAnN2 HUMAnN	2	637	April 6, 2020
Is long computation time in humann3 related to not using knead data? HUMAnN	1	689	February 18, 2021
High proportion of Unmapped reads in metagenomic data HUMAnN	6	446	November 10, 2023
No results reported in humann output HUMAnN	1	325	July 14, 2020
Humann3 computation speed HUMAnN	1	2028	September 29, 2020

Long running time with the --bypass-prescreen mode

Related topics