Hi,
I didn’t find any information about this and I’m worried that my results are not correct because I used ShortBRED with zipped files. So my question is, can I use fastq.gz files as input for ShortBRED quantify?
Thanks in advance!
Hi,
I didn’t find any information about this and I’m worried that my results are not correct because I used ShortBRED with zipped files. So my question is, can I use fastq.gz files as input for ShortBRED quantify?
Thanks in advance!
Hi @Thalla,
The input files for ShortBRED has to be fasta format(.faa). Please feel free to see the example input tutorial. shortbred · biobakery/biobakery Wiki · GitHub.
Regards,
Sagun
Hi @sagunmaharjann,
I would assume that a zipped FASTA file is still a FASTA file. But if zipped and non zipped files or only the latter was meant, is not stated clearly in the documentation.
I have looked it up in the code now. Should have done this in the first place . So thanks for your answer! That pushed me in the right direction.
The code was easy to follow :
The tutorial says that .fasta files are needed for ShortBRED quantify. So I assume fastq is fine, too. And in the code I can see that fastq is mentioned as example:
aaFileInfo is array of string arrays, each with details on the file so ShortBRED
knows how to process it efficiently. Each line has the format:
[filename, format, “large” or “small”, extract method, and corresponding tarfile (if needed)]
An example:
[‘SRS011397/SRS011397.denovo_duplicates_marked.trimmed.1.fastq’, ‘fastq’, ‘large’, ‘r:bz2’, ‘/n/CHB/data/hmp/wgs/samplesfqs/SRS011397.tar.bz2’]
Considering the zipped/unzipped problem there is a line in the quantify script that checks which extraction method is needed:
strExtractMethod= sq.CheckExtract(strWGS)
And the corresponding function definition looks like this:
def CheckExtract(strWGS):
if strWGS.find(“.tar.bz2”) > -1:
strExtractMethod = ‘r:bz2’
elif strWGS.find(“.tar.gz”) > -1:
strExtractMethod = ‘r:gz’
elif strWGS.find(“.gz”) > -1:
strExtractMethod = ‘gz’
elif strWGS.find(“.bz2”) > -1:
strExtractMethod = ‘bz2’
else:
strExtractMethod = “”return strExtractMethod
All in all I think it is safe to assume that fastq.gz files are handled correctly.
Regards,
Thalla