Hi,
I’m using Shortbred to receive virulence factors. I use the provided database. Some family annotations include a description of the virulence factor or an id with which I can find it via NCBI. But especially the information derived from the Mvir database has often no helpful information attached or better said I have no idea how to use the given information to get information on the virulence factor. I tried blasting the protein sequences but this results in a lot of matches and I don’t really know how to find the fitting one.
So I have two questions:
- How can I get information about the results that have no gb/gi id and no textual description?
- Is there a way to nicely attach the virulence factor information without searching for each id manually?
Examples:
virulence|9992|vfid|16697|vsiid|21770|ssid|RecName__virulence|25075|vfid|60054|vsiid|80921|ssid|tetracycline_virulence|25078|vfid|60060|vsiid|80927|ssid|RecName__TM_#01
MNRTVMMALVIIFLDA
→ Here I have the “tetracycline_virulence” information which is great. I extract those descriptions via regular expressions.
VFDB|VFG000049(gb|NP_880889)_virulence|13251|vfid|20418|vsiid|39888|ssid|type_virulence|13251|vfid|20418|vsiid|39889|ssid|type
→ Here I have NP_880889, which is not optimal but still good.
virulence|9883|vfid|16588|vsiid|21130|ssid|SubName_
→ I have no idea how to get any information about this virulence factor.
Thanks in advance!