Specifying strands in USEARCH within shortbred_quantify

Hi ShortBRED team & users,

I am trying to run shortbred_quantify on some paired-end reads. I am using the following command :

shortbred_quantify.py --markers mymarkers.faa --wgs R1_001-paired.fastq R2_001-paired.fastq --results results.txt --tmp tmp_quantify --avgreadBP 101

I am getting the following error, asking to specify strands in USEARCH:

—Fatal error—
Must specify -strand plus or both with nt db
('Using this version of usearch: ‘, u’v10.0.240’)

Are there any workarounds for this issue, or known solutions?

Thank you in advance.

This is saying that USEARCH thinks your database is a nucleotide database, and so it wants to know if it should be searching the given (plus) strand or also considering the reverse-complement (minus) strand. However, the databases for ShortBRED should all be protein sequences. Was mymarkers.faa generated with shortbred_identify? Can you confirm that it contains protein sequences?

Hello Dr. Franzosa,

Thank you very much for your kind reply! I have double checked and it appears that my marker.faa file contains nucleotide sequences. My original input looks like I was using the nucleotide fasta protein homolog model instead of the protein fasta protein homolog model, so I have edited that below:

#1. Make markers from CARD

shortbred_identify.py --goi protein_fasta_protein_homolog_model.fasta --ref uniref100.fasta --markers my_markers.faa

I am getting an error that says the following:

One or more of the sequences in your input file has an id that ShortBRED cannot use as a valid folder name during the clustering step, so ShortBRED has stopped. Please edit ** protein_fasta_protein_homolog_model.fasta ** to remove any slashes,asterisks, etc. from the fasta ids. The program utils/AdjustFastaHeadersForShortBRED.py in the ShortBRED folder can do this for you. ShortBRED halted on this gene/protein:gb|ACT97415.1|ARO:3002999|CblA-1

I suppose at this point I just need to parse the output marker.faa to remove the slashes and underscores. Is this a common step? Hoping that this fixes the error and I can get what I need to move forward without any hiccups.

Thank you so much for your attention, please let me know if I can provide any more information that may help you in troubleshooting!

Lauren

Indeed, this is common (hence including the AdjustFastaHeadersForShortBRED utility to help with formatting). The space of characters that are allowed in FASTA headers is broader than the space allowed for file names, so we have to clean up some of the special characters before ShortBRED runs.

Hello Dr. Franzosa,

I just wanted to reach out to you and thank you for your kindness and attentiveness in answering my question. I was able to complete my analysis with your help! Excellent set of tools you have in BioBakery, and I am looking forward to using them much more in the future. All the best.

Lauren

1 Like