The bioBakery help forum

ShortBRED identify gives no markers

Hello,
I am trying to screen my metagenomic libraries for antimicrobial peptides present. Therfore I downloaded the database from APD3 (https://academic.oup.com/nar/article/44/D1/D1087/2503090) (https://wangapd3.com/APD_sequence_release_09142020.fasta). After filtering peptides coming from bacteria only I tried to build markers using shortbred_identify.py against UniRef90.

However, I was not able to get any markers. After Grouping my input proteins with CD-HIT no sequences have been sorted. The input proteins are quite small (Min: 2, 25% Quartile 19, Median 30, 75% Quartile 44, Max 100 amino acids.

Do I need to adjust flags like --markerlength or --minAln (the minimum for a short, high-identity region.)? I played around with these flags but got only errors.

Best,
Philipp

Hmm, those parameters are all about the sizes of marker regions within full-length proteins, which is what we designed ShortBRED to target (the idea being to screen-out non-unique regions of longer proteins that might induce false positives when profiling with short reads). Your starting proteins/peptides are so small that they might be hitting a size limit early in the pipeline, e.g. clustering inside CD-HIT.

Where your peptides are so small to begin with (on the order of a read-length or smaller), you might try just directly searching your reads against the peptides with an accelerated search (with the peptides indexed as a database). As a ShortBRED-like filter, you could also map the peptides against UniRef90 and note if they occur as subsequences of longer proteins. If some do, I’d be less confident about their abundances from the initial search. (The second step would be analogous to ShortBRED saying that it could not identify a unique marker sequence for a protein.)