Hi there,
I want to run ShortBRED and shortbred_identify.py runs fine, however when I want to run shortbred_quantify.py I get the following error message in the log file:
File "/opt/share/software/packages/shortbred-0.9.4/pyvirt/local/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 589, in parse
raise ValueError("Unknown format '%s'" % format)
ValueError: Unknown format 'unknown'
The whole log file looks like that:
Tested usearch. Appears to be working.
Tested blastp. Appears to be working.
Tested muscle returned a nonzero exit code (typically indicates failure). Please check to ensure the program is working. Will continue running.
Path for cdhit appears to be fine. This program returns an error [exit code=1] when tested and working properly, so ShortBRED does not check it.
Tested makeblastdb. Appears to be working.
Usearch appears to be working.
Clustering proteins of interest...
================================================================
Program: CD-HIT, V4.7 (+OpenMP), Feb 01 2021, 15:06:42
Command: /opt/share/software/packages/cdhit-4.6.8/bin/cd-hit
-i ./SeqForSpartina.fasta -o
tmp126341618579194354/clust/clust.faa -d 0 -c 0.85 -b
10 -g 1
Started: Fri Apr 16 15:19:54 2021
================================================================
Output
----------------------------------------------------------------
total seq: 25
longest and shortest : 954 and 109
Total letters: 10886
Sequences have been sorted
Approximated minimal memory consumption:
Sequence : 0M
Buffer : 1 X 10M = 10M
Table : 1 X 65M = 65M
Miscellaneous : 0M
Total : 75M
Table limit with the given memory limit:
Max number of representatives: 1279296
Max number of word counting entries: 90502859
comparing sequences from 0 to 25
25 finished 25 clusters
Apprixmated maximum memory consumption: 76M
writing new database
writing clustering information
program completed !
Total CPU time 0.11
Protein sequences clustered.Creating folders for each protein family...
Making a fasta file for each protein family...
Aligning sequences in each family, producing consensus sequences...
Making BLAST database for the family consensus sequences...
Making BLAST database for the reference protein sequences...
BLASTing the consensus family sequences against themselves...
Warning: [blastp] Number of threads was reduced to 80 to match the number of available CPUs
BLASTing the consensus family sequences against the reference protein sequences...
Warning: [blastp] Number of threads was reduced to 80 to match the number of available CPUs
Finding overlap with reference database...
Finding overlap with family consensus database...
Found True Markers...
No Quasi Markers needed...
Tmp markers saved to tmp126341618579194354/framecheck/FirstMarkers.faa
Processing complete! Final markers saved to ./markersforSpartina.fasta
Checking dependencies...
Checking to make sure that installed version of usearch can make databases...
Tested usearch. Appears to be working.
Treating input as a wgs file...
usearch v7.0.1090_i86linux32, 4.0Gb RAM (528Gb total), 80 cores
(C) Copyright 2013 Robert C. Edgar, all rights reserved.
http://drive5.com/usearch
Licensed to: hgruber@mpi-bremen.de
00:00 19Mb Reading input
00:00 22Mb 1.0% Masking
00:00 22Mb 100.0% Masking
00:00 35Mb 1.0% Word stats
00:00 35Mb 100.0% Word stats
00:00 73Mb 0.0% Building slots
00:01 73Mb 0.8% Building slots
00:02 73Mb 53.8% Building slots
00:02 73Mb 100.0% Building slots
00:02 60Mb 1.0% Build index
00:02 64Mb 100.0% Build index
00:02 64Mb 0.0% Rows
00:02 64Mb 100.0% Rows
00:02 64Mb Buffers
00:02 80Mb 1.0% Seqs
00:02 80Mb 100.0% Seqs
00:02 64Mb 100.0% completed, split 1 (97 seqs)
00:02 64Mb Total 1 splits, 97 seqs
List of files in WGS set:./SPAdes_Spartina1.contigs.fa
List of files in WGS set (after unpacking tarfiles):./SPAdes_Spartina1.contigs.fa
Working on file 1 of 1
Traceback (most recent call last):
File "/opt/share/software/packages/shortbred-0.9.4/shortbred_quantify.py", line 522, in <module>
for seq in SeqIO.parse(streamWGS, strFormat):
File "/opt/share/software/packages/shortbred-0.9.4/pyvirt/local/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 589, in parse
raise ValueError("Unknown format '%s'" % format)
ValueError: Unknown format 'unknown'
DONE!
Can someone help me to solve that problem?
Thank you!