Fastq.gz file as input for ShortBRED?

Hi,

I didn’t find any information about this and I’m worried that my results are not correct because I used ShortBRED with zipped files. So my question is, can I use fastq.gz files as input for ShortBRED quantify?

Thanks in advance!

Hi @Thalla,

The input files for ShortBRED has to be fasta format(.faa). Please feel free to see the example input tutorial. shortbred · biobakery/biobakery Wiki · GitHub.

Regards,
Sagun

Hi @sagunmaharjann,

I would assume that a zipped FASTA file is still a FASTA file. But if zipped and non zipped files or only the latter was meant, is not stated clearly in the documentation.
I have looked it up in the code now. Should have done this in the first place :sweat_smile:. So thanks for your answer! That pushed me in the right direction.

The code was easy to follow :+1: :slightly_smiling_face: :

The tutorial says that .fasta files are needed for ShortBRED quantify. So I assume fastq is fine, too. And in the code I can see that fastq is mentioned as example:

aaFileInfo is array of string arrays, each with details on the file so ShortBRED
knows how to process it efficiently. Each line has the format:
[filename, format, “large” or “small”, extract method, and corresponding tarfile (if needed)]
An example:
[‘SRS011397/SRS011397.denovo_duplicates_marked.trimmed.1.fastq’, ‘fastq’, ‘large’, ‘r:bz2’, ‘/n/CHB/data/hmp/wgs/samplesfqs/SRS011397.tar.bz2’]

Considering the zipped/unzipped problem there is a line in the quantify script that checks which extraction method is needed:

strExtractMethod= sq.CheckExtract(strWGS)

And the corresponding function definition looks like this:

def CheckExtract(strWGS):
if strWGS.find(“.tar.bz2”) > -1:
strExtractMethod = ‘r:bz2’
elif strWGS.find(“.tar.gz”) > -1:
strExtractMethod = ‘r:gz’
elif strWGS.find(“.gz”) > -1:
strExtractMethod = ‘gz’
elif strWGS.find(“.bz2”) > -1:
strExtractMethod = ‘bz2’
else:
strExtractMethod = “”

return strExtractMethod

All in all I think it is safe to assume that fastq.gz files are handled correctly.

Regards,
Thalla