This is saying that USEARCH thinks your database is a nucleotide database, and so it wants to know if it should be searching the given (plus) strand or also considering the reverse-complement (minus) strand. However, the databases for ShortBRED should all be protein sequences. Was mymarkers.faa generated with shortbred_identify? Can you confirm that it contains protein sequences?
Thank you very much for your kind reply! I have double checked and it appears that my marker.faa file contains nucleotide sequences. My original input looks like I was using the nucleotide fasta protein homolog model instead of the protein fasta protein homolog model, so I have edited that below:
One or more of the sequences in your input file has an id that ShortBRED cannot use as a valid folder name during the clustering step, so ShortBRED has stopped. Please edit ** protein_fasta_protein_homolog_model.fasta ** to remove any slashes,asterisks, etc. from the fasta ids. The program utils/AdjustFastaHeadersForShortBRED.py in the ShortBRED folder can do this for you. ShortBRED halted on this gene/protein:gb|ACT97415.1|ARO:3002999|CblA-1
I suppose at this point I just need to parse the output marker.faa to remove the slashes and underscores. Is this a common step? Hoping that this fixes the error and I can get what I need to move forward without any hiccups.
Thank you so much for your attention, please let me know if I can provide any more information that may help you in troubleshooting!
Indeed, this is common (hence including the AdjustFastaHeadersForShortBRED utility to help with formatting). The space of characters that are allowed in FASTA headers is broader than the space allowed for file names, so we have to clean up some of the special characters before ShortBRED runs.
I just wanted to reach out to you and thank you for your kindness and attentiveness in answering my question. I was able to complete my analysis with your help! Excellent set of tools you have in BioBakery, and I am looking forward to using them much more in the future. All the best.