Hi,
I ran ShortBRED with contig files and the results were not as expected, so I wanted to run it again with the error corrected joined sequence fasta files, however, I always get a
Non-printing character 0x00 in sequence FASTA file
Error, when I check the log file:
Tested usearch. Appears to be working.
Tested blastp. Appears to be working.
Tested muscle returned a nonzero exit code (typically indicates failure). Please check to ensure the program is working. Will continue running.
Path for cdhit appears to be fine. This program returns an error [exit code=1] when tested and working properly, so ShortBRED does not check it.
Tested makeblastdb. Appears to be working.
Usearch appears to be working.
Clustering proteins of interest...
================================================================
Program: CD-HIT, V4.7 (+OpenMP), Feb 01 2021, 15:06:42
Command: /opt/share/software/packages/cdhit-4.6.8/bin/cd-hit
-i ./SeqForSpartina.fasta -o
tmp69981618818831066/clust/clust.faa -d 0 -c 0.85 -b
10 -g 1
Started: Mon Apr 19 09:53:51 2021
================================================================
Output
----------------------------------------------------------------
total seq: 25
longest and shortest : 954 and 109
Total letters: 10886
Sequences have been sorted
Approximated minimal memory consumption:
Sequence : 0M
Buffer : 1 X 10M = 10M
Table : 1 X 65M = 65M
Miscellaneous : 0M
Total : 75M
Table limit with the given memory limit:
Max number of representatives: 1279296
Max number of word counting entries: 90502859
comparing sequences from 0 to 25
25 finished 25 clusters
Apprixmated maximum memory consumption: 76M
writing new database
writing clustering information
program completed !
Total CPU time 0.11
Protein sequences clustered.Creating folders for each protein family...
Making a fasta file for each protein family...
Aligning sequences in each family, producing consensus sequences...
Making BLAST database for the family consensus sequences...
Making BLAST database for the reference protein sequences...
BLASTing the consensus family sequences against themselves...
Warning: [blastp] Number of threads was reduced to 64 to match the number of available CPUs
BLASTing the consensus family sequences against the reference protein sequences...
Warning: [blastp] Number of threads was reduced to 64 to match the number of available CPUs
Finding overlap with reference database...
Finding overlap with family consensus database...
Found True Markers...
No Quasi Markers needed...
Tmp markers saved to tmp69981618818831066/framecheck/FirstMarkers.faa
Processing complete! Final markers saved to ./markersforSpartina.fasta
Checking dependencies...
Checking to make sure that installed version of usearch can make databases...
Tested usearch. Appears to be working.
Treating input as a wgs file...
usearch v7.0.1090_i86linux32, 4.0Gb RAM (528Gb total), 64 cores
(C) Copyright 2013 Robert C. Edgar, all rights reserved.
http://drive5.com/usearch
Licensed to: hgruber@mpi-bremen.de
00:00 19Mb Reading input
00:00 22Mb 1.0% Masking
00:00 22Mb 100.0% Masking
00:00 35Mb 1.0% Word stats
00:00 35Mb 100.0% Word stats
00:00 73Mb 0.0% Building slots
00:01 73Mb 24.7% Building slots
00:02 73Mb 78.1% Building slots
00:02 73Mb 100.0% Building slots
00:02 60Mb 1.0% Build index
00:02 64Mb 100.0% Build index
00:02 64Mb 0.0% Rows
00:02 64Mb 100.0% Rows
00:02 64Mb Buffers
00:02 80Mb 1.0% Seqs
00:02 80Mb 100.0% Seqs
00:02 64Mb 100.0% completed, split 1 (97 seqs)
00:02 64Mb Total 1 splits, 97 seqs
List of files in WGS set:./joinedMSAP1.fasta
List of files in WGS set (after unpacking tarfiles):./joinedMSAP1.fasta
Working on file 1 of 1
usearch v7.0.1090_i86linux32, 4.0Gb RAM (528Gb total), 64 cores
(C) Copyright 2013 Robert C. Edgar, all rights reserved.
http://drive5.com/usearch
Licensed to: hgruber@mpi-bremen.de
00:00 19Mb Reading /scratch/ekroeber/tmp.942458/joinedMSAP1_shortbred_tmp/markersforSpartina.fasta.udb
00:00 57Mb Database loaded
00:00 58Mb 0.1% Searching, 0.0% matched
00:01 62Mb 0.5% Searching, 0.4% matched
00:02 62Mb 1.2% Searching, 0.4% matched
00:03 62Mb 2.0% Searching, 0.4% matched
00:04 62Mb 2.9% Searching, 0.4% matched
00:05 62Mb 3.7% Searching, 0.4% matched
00:06 62Mb 4.5% Searching, 0.4% matched
00:07 62Mb 5.3% Searching, 0.4% matched
00:08 62Mb 6.1% Searching, 0.4% matched
fastaseqsource.cpp(242):
/opt/extern/bremen/symbiosis/phyloFlash_old/tools/usearch7 --usearch_local /scratch/ekroeber/tmp.942458/joinedMSAP1_shortbred_tmp/fasta.fna --db /scratch/ekroeber/tmp.942458/joinedMSAP1_shortbred_tmp/markersforSpartina.fasta.udb --id 0.95 --userout /scratch/ekroeber/tmp.942458/joinedMSAP1_shortbred_tmp/wgs_01out_01.out --userfields query+target+id+alnlen+mism+opens+qlo+qhi+tlo+thi+evalue+bits+ql+tl+qs+ts --maxaccepts 1 --maxrejects 32 --threads 1
---Fatal error---
Non-printing character 0x00 in sequence FASTA file '/scratch/ekroeber/tmp.942458/joinedMSAP1_shortbred_tmp/fasta.fna' line 1610696
Traceback (most recent call last):
File "/opt/share/software/packages/shortbred-0.9.4/shortbred_quantify.py", line 558, in <module>
iThreads=args.iThreads,dID=args.dID,iAccepts=args.iMaxHits, iRejects=args.iMaxRejects,strUSEARCH=args.strUSEARCH )
File "/opt/share/software/packages/shortbred-0.9.4/src/quantify_functions.py", line 232, in RunUSEARCH
"--maxrejects",str(iRejects),"--threads", str(iThreads)])
File "/usr/lib/python2.7/subprocess.py", line 186, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/extern/bremen/symbiosis/phyloFlash_old/tools/usearch7', '--usearch_local', '/scratch/ekroeber/tmp.942458/joinedMSAP1_shortbred_tmp/fasta.fna', '--db', '/scratch/ekroeber/tmp.942458/joinedMSAP1_shortbred_tmp/markersforSpartina.fasta.udb', '--id', '0.95', '--userout', '/scratch/ekroeber/tmp.942458/joinedMSAP1_shortbred_tmp/wgs_01out_01.out', '--userfields', 'query+target+id+alnlen+mism+opens+qlo+qhi+tlo+thi+evalue+bits+ql+tl+qs+ts', '--maxaccepts', '1', '--maxrejects', '32', '--threads', '1']' returned non-zero exit status 1
DONE!
What am I doing wrong. How can I solve this problem? I need the results asap, since I want to include them into the revisions for a mansucript which is due for re-submission very soon.
Can someone help me, please?