Bad subject id header

I am encountering problems when I try to run it. It tells us:

“LETHAL ERROR: bad subject id header”. Could you or someone from your group help us resolve this?

Thanks for all you do.

Did you creature your own database? If so, are you sure it conforms to the formatting specification laid out in the manual? If not, and you’re using either our full database or the demo database, could you give some more detail about where this happens in the WAAFLE workflow (e.g. which script you were running at the time)?

Yes I created my database . To test the pipeline , I downloaded a ref E. coli seq from gen bank and use the command " makeblastdb’ to make the database . I then run the query contig.fasta to got the blastout file . However when I passed the file into the waafle_genecaller command I get the error i described.

Gotcha - the sequences you downloaded won’t be in the format that WAAFLE expects. Please see “formatting a sequence database for WAAFLE” under the manual page:

I read the manual and did that but still got the error . Below is for format of the ref file

AE005174.2 | Escherichia coli O157:H7 str. EDL933 genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTCTGACAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA

Sorry for for some reason the leading " >" gets deleted when i paste on this forum but there is a leading “>”

AE005174.2 | Escherichia coli O157:H7 str. EDL933 genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTCTGACAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA

Also the bad subject ID error I am getting is specifically “LETHAL ERROR: bad subject id header: AE005174.2” when I input the sequence above

The > character indicates that you are starting a quotation block in Discourse (part of Markdown syntax).

I think what’s happening is that BLAST is probably deleting everything after the space in your search output (if you inspect the .blastout file you can confirm this). So when WAAFLE goes to look up the taxon name it’s not there.

The best format would be to have no spaces, e.g.

AE005174.2|Escherichia_coli_O157:H7_str._EDL933_genome

Thanks . That worked