I am interested in knowing the functional potential of the microbial communities in some soil metagenomic samples. After following the standard pipeline of HUMAnN3, I obtained some decent results regarding pathways and gene families. However, the relative abundance of unmapped results seems to be pretty high with roughly 93%. Therefore, I thought I might be able to improve the mapping rate by skipping the taxonomic profiling step using the –bypass-prescreen flag.
HUMAnN3 did not complete the task in 3 days by utilizing 187G of memory (48 cores) on a university HPC when I turn on the —bypass-prescreen flag on a sample with a size of ~15GB (obtained by merging 150bp paired end reads). The process took 16 hours with the same setting without —bypass-prescreen.
So my questions here are:
- Am I using the —bypass-prescreen function for the right purpose, can I actually improve the mapping rate with it?
- I am expecting it to take much longer since it will utilize the whole Chocophlan DB. Can I get any perspective as to how much longer is it going to take?
- The job stopped running after 3 days, I’d like to get it running again using - -resume , but it didn’t work, is there a compatibility issue when you have both —resume and —bypass-prescreen?
Any insight would be much appreciated!