Creating markers after blasting

luayou · March 9, 2020, 10:38am

Hi there!

I am trying to create some markers, and I am having this problem https://groups.google.com/forum/#!topic/shortbred-users/paafRPifkM8

I have already run blast of my proteins against all uniref90 (blast/2.2.29), it was 48 days. So, if you can help to solve the problem using this output I will be very grateful.

Thank you,
Laura.

franzosa · March 9, 2020, 3:49pm

Hi Laura - Looks like there’s a bunch of things going on in the thread you linked. Could you clarify what specific problem you’re having? Thanks.

luayou · March 9, 2020, 10:39pm

Hi Franzosa!

Thank you for replying to me! Sure, I can do that by copy-pasting the error that I got when I run the following command “shortbred_identify.py --cdhit /hpcfs/apps/cd-hit/4.6.1/cd-hit --usearch {main}/usearch11.0.667_i86linux32 --goiclust {main}/attempt2/clust/clust.renamed.faa --goiblast {main}/attempt2/blastresults/selfblast.txt --refblast {main}/attempt2/blastresults/refblast.txt --map_in ${main}/attempt2/clust/clust.map --markers TAII.markers.faa --threads 12”

In my understanding, based on the thread that I linked, the problem is bc the differences between headers of fasta files and blast queries. In fact, I found that clust.faa headers are different to the IDs of the blast queries. The difference was easy to fix since blast queries are getting as protein IDs only the characters before the first space of the clus.faa header, so I changed the header of clust.faa but the problem persists. Another thing that I review was to look for this protein ‘WP_080765506.1’ which is reported in the error, this protein is present in all the files (clus.faa, clus.map, and self.txt), but not in the ref.txt, I don’t know if it could be associated with the problem.

Finding overlap with reference database…
Finding overlap with family consensus database…
Checking dependencies…
Checking to make sure that installed version of usearch can make databases…
Traceback (most recent call last):
File “/hpcfs/home/ciencias/biologia/postgrado/l.avellaneda50/.conda/envs/concoct_env/bin/shortbred_identify.py”, line 364, in
dictGOICounts = pb.MarkX(dictGOIGenes,dictGOICounts)
File “/hpcfs/home/ciencias/biologia/postgrado/l.avellaneda50/.conda/envs/concoct_env/bin/src/process_blast.py”, line 491, in MarkX
dictOverlap[strName][i] = dictOverlap[strName][i] + 9999999
KeyError: ‘WP_080765506.1’
DONE

Thank you,
Laura.

luayou · March 24, 2020, 8:46am

Hi there!

I couldn’t fix this problem yet, and I really want to use shortbred with this set of proteins. So, if someone can help me I will be grateful.

Thank you,
Laura

luayou · March 31, 2020, 7:35am

Hi! I wonder if you checked my new post. Thank you.

franzosa · April 1, 2020, 3:24pm

Sorry for the long delay. My best guess is still that this results from something in the sequence renaming (e.g. the sequence’s name in the FASTA of genes of interest is not the same as the version in the BLAST output, perhaps due to an extra space at the end, removal of the .1, etc.).

If that’s not the case, I found a similar error from a long time ago that resulted from a duplicated sequence ID, though I believe that issue was corrected in the software. Still, worth a double check that the renaming process didn’t induce duplicated sequence IDs in your files.

Topic		Replies	Views
Shortbred has been running for more than one day and not stopped, is that normal? ShortBRED	0	22	March 3, 2025
Wordparams.cpp(171) assert failed: MinFractId >= 0.0 && MinFractId <= 1.0 ShortBRED	1	320	July 1, 2022
Specifying strands in USEARCH within shortbred_quantify ShortBRED	4	814	October 24, 2021
Shortbred identify performance with CARD and Uniref90 ShortBRED	0	754	July 30, 2020
ShortBRED Identify and CARD 2023 ShortBRED	1	321	January 23, 2024

Creating markers after blasting

Related topics