ShortBRED for Nanopore data

shortbred_quantify.py v0.9.5

Hello! I have been attempting to use shortBRED to quantify relative gene abundance for a small set of genes (4 total) in ONT Nanopore sequence data.

I am curious if anyone has any advice regarding settings to optimize returns on hits without compromising the results? If anyone has any experience applying this tool to long read sequence data (particularly nanopore data) I’d be grateful to hear about it!

My most recent run used commands like so:

shortbred_quantify.py
–markers markers.faa
–wgs {nanopore_reads.faa}
–results results.txt
–tmp tmp_shortbred_dir
–usearch <usearch_path>

I have not yet tried playing with settings like --id or --minreadBP and so forth, but plan on doing so for optimization. Hence the call for aid.
Thank you!

Sorry for the slow reply! We don’t have much experience with nanopore reads, but off the top of my head I wouldn’t expect that you’d need to make major changes to the parameters. I know nanopore has a higher error rate than other methods, but I don’t think it’s high enough to merit lowering the --id threshold (for example).

Please report back if you learn anything interesting / have any helpful tips in this process as I’m sure it will be useful to other ShortBRED users!

Hello!

What we found was lowering the threshold did make an appreciable difference in the results. Given Nanopore’s R9 basecalling accuracy was on average Q9, we tried a threshold of 80% just to see what would happen. We also added a housekeeping gene (gyrA) to have a control marker. What surprised us was how before lowering the threshold, no detectable levels of gyrA were found.

Upon lowering the threshold, we saw some of our target genes emerge (which we knew were present from BLAST searches), but we decided to take an alternative approach because lowering the % identity almost felt like we were forcing an outcome.

I think shortBRED will benefit heavily with the newer R10 chemistry that brings nanopore basecalling up to 99%+.

From our experience, the best option is to find a % identity threshold a user is comfortable with. Given Nanopore’s accuracy (for R9 chemsitry), I’d say 80% feels the most “fair”.

Very interesting - thanks for the update and suggestions! Including gyrA as a control for read map-ability was very clever. :slight_smile: