Specifying strands in USEARCH within shortbred_quantify

LRB · October 1, 2021, 10:10pm

Hi ShortBRED team & users,

I am trying to run shortbred_quantify on some paired-end reads. I am using the following command :

shortbred_quantify.py --markers mymarkers.faa --wgs R1_001-paired.fastq R2_001-paired.fastq --results results.txt --tmp tmp_quantify --avgreadBP 101

I am getting the following error, asking to specify strands in USEARCH:

—Fatal error—
Must specify -strand plus or both with nt db
('Using this version of usearch: ‘, u’v10.0.240’)

Are there any workarounds for this issue, or known solutions?

Thank you in advance.

franzosa · October 12, 2021, 3:08pm

This is saying that USEARCH thinks your database is a nucleotide database, and so it wants to know if it should be searching the given (plus) strand or also considering the reverse-complement (minus) strand. However, the databases for ShortBRED should all be protein sequences. Was mymarkers.faa generated with shortbred_identify? Can you confirm that it contains protein sequences?

LRB · October 12, 2021, 3:42pm

Hello Dr. Franzosa,

Thank you very much for your kind reply! I have double checked and it appears that my marker.faa file contains nucleotide sequences. My original input looks like I was using the nucleotide fasta protein homolog model instead of the protein fasta protein homolog model, so I have edited that below:

#1. Make markers from CARD

shortbred_identify.py --goi protein_fasta_protein_homolog_model.fasta --ref uniref100.fasta --markers my_markers.faa

I am getting an error that says the following:

One or more of the sequences in your input file has an id that ShortBRED cannot use as a valid folder name during the clustering step, so ShortBRED has stopped. Please edit ** protein_fasta_protein_homolog_model.fasta ** to remove any slashes,asterisks, etc. from the fasta ids. The program utils/AdjustFastaHeadersForShortBRED.py in the ShortBRED folder can do this for you. ShortBRED halted on this gene/protein:gb|ACT97415.1|ARO:3002999|CblA-1

I suppose at this point I just need to parse the output marker.faa to remove the slashes and underscores. Is this a common step? Hoping that this fixes the error and I can get what I need to move forward without any hiccups.

Thank you so much for your attention, please let me know if I can provide any more information that may help you in troubleshooting!

Lauren

franzosa · October 12, 2021, 4:37pm

Indeed, this is common (hence including the AdjustFastaHeadersForShortBRED utility to help with formatting). The space of characters that are allowed in FASTA headers is broader than the space allowed for file names, so we have to clean up some of the special characters before ShortBRED runs.

LRB · October 24, 2021, 1:43am

Hello Dr. Franzosa,

I just wanted to reach out to you and thank you for your kindness and attentiveness in answering my question. I was able to complete my analysis with your help! Excellent set of tools you have in BioBakery, and I am looking forward to using them much more in the future. All the best.

Lauren

Topic		Replies	Views
ShortBRED run failed due to <Signals.SIGBUS: 7> ShortBRED	0	164	February 20, 2024
Creating markers after blasting ShortBRED	5	738	April 1, 2020
ShortBRED for Nanopore data ShortBRED	3	261	October 20, 2023
Usearch_local for quantification ShortBRED	1	349	August 28, 2020
Shortbred warming usearch error ShortBRED	0	412	May 10, 2022

Specifying strands in USEARCH within shortbred_quantify

Related topics