ShortBRED Identify and CARD 2023

Hello,

I am currently trying to run shortbred_identify.py on the 2023 CARD release and Uniref90 using these resources at my institute.

(I have already analysed my data using the 2017 pre-computed markers provided on bioBakery. However, analysis using a more current marker set would be preferable.)

#!/bin/bash
#SBATCH --time=336:00:00
#SBATCH --cpus-per-task=96
#SBATCH --mem=500G
#SBATCH --partition long

The process has been at the below stage for around four days now:

Making BLAST database for the family consensus sequences…
Making BLAST database for the reference protein sequences…
BLASTing the consensus family sequences against themselves…
BLASTing the consensus family sequences against the reference protein sequences…

Does anyone have experience with how long this might take, and whether I should request additional resources to process this? Or is the bioBakery team planning to pre-compute a more current CARD marker set in the near future?

Many thanks

1 Like

Hello, did you ever make this marker database? This would be such a helpful resource.