Site-package init.py Error "Unknown format"

ekroeber · April 18, 2021, 11:38am

Hi there,

I want to run ShortBRED and shortbred_identify.py runs fine, however when I want to run shortbred_quantify.py I get the following error message in the log file:

 File "/opt/share/software/packages/shortbred-0.9.4/pyvirt/local/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 589, in parse
    raise ValueError("Unknown format '%s'" % format)
ValueError: Unknown format 'unknown'

The whole log file looks like that:

Tested usearch. Appears to be working.
Tested blastp. Appears to be working.
Tested muscle returned a nonzero exit code (typically indicates failure). Please check to ensure the program is working. Will continue running.
Path for cdhit appears to be fine. This program returns an error [exit code=1] when tested and working properly, so ShortBRED does not check it.
Tested makeblastdb. Appears to be working.
Usearch appears to be working.

Clustering proteins of interest...
================================================================
Program: CD-HIT, V4.7 (+OpenMP), Feb 01 2021, 15:06:42
Command: /opt/share/software/packages/cdhit-4.6.8/bin/cd-hit
         -i ./SeqForSpartina.fasta -o
         tmp126341618579194354/clust/clust.faa -d 0 -c 0.85 -b
         10 -g 1

Started: Fri Apr 16 15:19:54 2021
================================================================
                            Output                              
----------------------------------------------------------------
total seq: 25
longest and shortest : 954 and 109
Total letters: 10886
Sequences have been sorted

Approximated minimal memory consumption:
Sequence        : 0M
Buffer          : 1 X 10M = 10M
Table           : 1 X 65M = 65M
Miscellaneous   : 0M
Total           : 75M

Table limit with the given memory limit:
Max number of representatives: 1279296
Max number of word counting entries: 90502859


comparing sequences from          0  to         25

       25  finished         25  clusters

Apprixmated maximum memory consumption: 76M
writing new database
writing clustering information
program completed !

Total CPU time 0.11
Protein sequences clustered.Creating folders for each protein family...
Making a fasta file for each protein family...
Aligning sequences in each family, producing consensus sequences...
Making BLAST database for the family consensus sequences...
Making BLAST database for the reference protein sequences...
BLASTing the consensus family sequences against themselves...
Warning: [blastp] Number of threads was reduced to 80 to match the number of available CPUs
BLASTing the consensus family sequences against the reference protein sequences...
Warning: [blastp] Number of threads was reduced to 80 to match the number of available CPUs
Finding overlap with reference database...
Finding overlap with family consensus database...
Found True Markers...
No Quasi Markers needed...

Tmp markers saved to tmp126341618579194354/framecheck/FirstMarkers.faa

Processing complete! Final markers saved to ./markersforSpartina.fasta
Checking dependencies...
Checking to make sure that installed version of usearch can make databases...
Tested usearch. Appears to be working.
Treating input as a wgs file...
usearch v7.0.1090_i86linux32, 4.0Gb RAM (528Gb total), 80 cores
(C) Copyright 2013 Robert C. Edgar, all rights reserved.
http://drive5.com/usearch

Licensed to: hgruber@mpi-bremen.de

00:00  19Mb Reading input
00:00  22Mb    1.0% Masking
00:00  22Mb  100.0% Masking
00:00  35Mb    1.0% Word stats
00:00  35Mb  100.0% Word stats
00:00  73Mb    0.0% Building slots
00:01  73Mb    0.8% Building slots
00:02  73Mb   53.8% Building slots
00:02  73Mb  100.0% Building slots
00:02  60Mb    1.0% Build index   
00:02  64Mb  100.0% Build index
00:02  64Mb    0.0% Rows       
00:02  64Mb  100.0% Rows
00:02  64Mb Buffers     
00:02  80Mb    1.0% Seqs
00:02  80Mb  100.0% Seqs
00:02  64Mb 100.0% completed, split 1 (97 seqs)
00:02  64Mb Total 1 splits, 97 seqs

List of files in WGS set:./SPAdes_Spartina1.contigs.fa

List of files in WGS set (after unpacking tarfiles):./SPAdes_Spartina1.contigs.fa 

Working on file 1 of 1
Traceback (most recent call last):
  File "/opt/share/software/packages/shortbred-0.9.4/shortbred_quantify.py", line 522, in <module>
    for seq in SeqIO.parse(streamWGS, strFormat):
  File "/opt/share/software/packages/shortbred-0.9.4/pyvirt/local/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 589, in parse
    raise ValueError("Unknown format '%s'" % format)
ValueError: Unknown format 'unknown'
DONE!

Can someone help me to solve that problem?

Thank you!

ekroeber · April 19, 2021, 6:12am

Solved:

All files must be named clearly .fasta and not just .fa!

ppelayo · February 28, 2022, 12:12am

Hi there,
I am getting the same error but I can’t seem to get rid of it. Could you please elaborate on how you fixed the issue?

Here is what I submit:
./shortbred_quantify.py --markers markers_85_sialidases.fasta --wgs /n/holyscratch01/vmrc_Stanford_MTG/MG_1063201209/MG_1063201209.R2.fq --results 1063201209R1_results.txt --tmp 1063201209R1_quantify_tmp

Here is the error I get:
Working on file 1 of 1
Traceback (most recent call last):
File “./shortbred_quantify.py”, line 527, in
for seq in SeqIO.parse(streamWGS, strFormat):
File “/n/home00/ppelayo/.conda/envs/ShortBRED_env/lib/python2.7/site-packages/Bio/SeqIO/init.py”, line 680, in parse
raise ValueError(“Unknown format ‘%s’” % format)
ValueError: Unknown format ‘unknown’

Topic		Replies	Views
Error running ShortBRED tutorial example ShortBRED	1	512	July 8, 2022
MUSCLE in ShortBRED ShortBRED	4	406	September 5, 2023
AttributeError: 'NoneType' object has no attribute 'group' running shortbred_quantify.py ShortBRED	3	529	March 21, 2023
Non-printing character 0x00 in sequence FASTA file ERROR ShortBRED	9	750	April 27, 2021
ShortBRED CalledProcessErrror & KeyError ShortBRED	1	194	February 2, 2024

Site-package __init__.py Error "Unknown format"

Related topics

Site-package init.py Error "Unknown format"