Question about ICTV taxonomy version and naming in BaqLaVa

Subject: Question about ICTV taxonomy version and naming in BaqLaVa

Dear BaqLaVa team,

I hope this message finds you well.

I am currently working on a project where I need to standardize viral annotations from NCBI to ICTV species names, and I would like to clarify which ICTV taxonomy version and naming conventions BaqLaVa is using.

In brief, I have a table where each virus has:

  • a Genome (NCBI gi) identifier, and

  • a NCBI virus name (e.g. Enterobacteria phage T7, Mycobacterium phage Che9c, etc.).

Using the ICTV Virus Metadata Resource (VMR), I am mapping NCBI virus names to ICTV species names via the Virus name(s)Species columns. During this process, I encountered a few issues and would greatly appreciate your guidance on the following points:

  1. ICTV MSL version used in BaqLaVa

    • Which ICTV MSL release (e.g. MSL #37, #38, #39, #40, etc.) is BaqLaVa currently based on?

    • Do you keep the ICTV taxonomy fixed to a particular MSL version, or is it updated regularly in BaqLaVa? If so, how frequently are these updates made?

  2. Relationship between BaqLaVa taxonomy and ICTV VMR

    • Are the virus names/IDs used in BaqLaVa directly derived from a specific ICTV VMR version, or do you maintain an internal curated list of ICTV species and virus names?

    • In other words, to what extent can we assume that BaqLaVa’s taxonomy is synchronized with the current ICTV VMR?

  3. Handling of legacy names – example: Enterobacteria phage T7

    As an example, NCBI still commonly uses the name “Enterobacteria phage T7”, whereas ICTV has updated the species name over time as follows:

    • Enterobacteria phage T7Escherichia virus T7Teseptimavirus T7 (current binomial name in the latest ICTV release).

    In the VMR file I am using, I do not see “Enterobacteria phage T7” listed under Virus name(s) anymore, but I do see names such as “Escherichia phage T7” mapped to the species Teseptimavirus T7.

    • How does BaqLaVa handle such legacy names?

    • For example, if a sequence historically referred to as Enterobacteria phage T7 is present in BaqLaVa, under which name/species does it appear (e.g. Teseptimavirus T7)?

  4. Recommended workflow to map NCBI GI / NCBI virus name to BaqLaVa and ICTV

    My goal is to connect:

    • NCBI GI / NCBI virus nameBaqLaVa IDs or namesICTV species names,
      in a way that is consistent with the taxonomy used by BaqLaVa.

    • Do you provide (or recommend) an official mapping table (e.g. TSV/CSV) linking BaqLaVa virus IDs/names to ICTV species names?

    • If not, what would you recommend as the best practice for users who start from NCBI information (GI and NCBI virus name) and want to harmonize their results with the taxonomy used in BaqLaVa and ICTV?

To summarize, I would like to know:

  • which ICTV MSL version BaqLaVa is based on,

  • how BaqLaVa handles legacy NCBI names such as Enterobacteria phage T7 relative to the current ICTV binomial species names, and

  • whether there is a recommended or official way to map NCBI identifiers and names to BaqLaVa’s viral taxonomy and ICTV species names.

Any clarification or documentation you could share would be extremely helpful for my current analysis and for ensuring that my annotations are consistent with BaqLaVa.

Thank you very much for your time and help.

Best regards,

Hello @alsdnr2295!

BAQLaVa taxonomy is currently using ICTV MSL #38. Taxonomy will not be fixed to MSL #38 and will update in the future as new BAQLaVa database versions are released, to be up to date with ICTV taxonomy at the time of release. BAQLaVa directly uses the ICTV reference accession numbers to source viral genomes and uses the corresponding Scientific Name (genus and species name) for each virus (to that extent, the current databases is exclusively synced with MSL #38, in that BaqLaVa species names should exactly match ICTV taxonomic names). BAQLaVa does not reference any “virus names” in the official metadata.

Corresponding with the above, BAQLaVa does not use or reference legacy NCBI names. Since ICTV considers Enterobacteria phage T7 == Escherichia virus T7 == Teseptimavirus T7, BAQLaVa’s Teseptimavirus T7 is the correct viral genome bin you are looking for.

Since viral taxonomy is in ongoing flux, I do not have a single recommended workflow. It is indeed a messy space at present and a problem not yet fully solved! A couple options to help you navigate mapping taxonomy, especially for what seems like may be older virus names:

  • Using the NCBI Reference/Accession Sequence: For example, NCBI’s “Enterobacteria phage T7, complete genome” reference sequence is NC_001604, corresponding in ICTV’s MSL #38 as Teseptimavirus T7 (you can download the VMR, Viral Metadata Resource, directly from ICTV which has RefSeq and GenBank accession numbers for all viral species).
  • NCBI’s Entrez search system: searching the taxonomy database for the term Enterobacteria phage T7 will return a taxid (“IdList”) of 10760. This taxid can in turn be searched in the taxonomy database which will return a list of Scientific Names at each taxonomic rank, the ScientificName affiliated with taxid 10760 at the rank of species being Teseptimavirus T7.

There are also combinations of the above approaches that may suit your needs depending on the source of your virus names. For example, searching NCBI directly for Enterobacteria phage T7 shows that the recognized organism name is actually Escherichia phage T7 (https://www. ncbi.nlm.nih.gov/nuccore/NC_001604, “ORGANISM” entry), which as you observed would be able to convert directly with the MSL file. The Entrez nuccore database allows you to query this in an automated fashion as well (e.g. querying the nuccore database for NC_001604 will return the text from the website above).

Happy searching!
Jordan