Meet "OSError: Invalid data stream" when using panphlan_profiling.py

I am new to panphlan3 and i meet error when i used panphlan_profiling.py.
The code i perform is as following:
(panphlan) wangzhenyu@wangzhenyudeMacBook-Pro panphlan % panphlan_profiling.py
-i ./map_results/
-p ./Bifidobacterium_pseudocatenulatum/Bifidobacterium_pseudocatenulatum_pangenome.tsv
–o_matrix ./profiling_results/matrix.csv

STEP 1. Processing genes informations from pangenome file…
Number of reference genomes: 18
Average number of gene-families per genome: 1761
Total number of pangenome gene-families 4550

STEP 2. Create coverage matrix
Traceback (most recent call last):
File “/Users/wangzhenyu/opt/miniconda3/envs/panphlan/bin/panphlan_profiling.py”, line 932, in
main()
File “/Users/wangzhenyu/opt/miniconda3/envs/panphlan/bin/panphlan_profiling.py”, line 864, in main
dna_samples_covs = read_map_results(args.i_dna, args.verbose)
File “/Users/wangzhenyu/opt/miniconda3/envs/panphlan/bin/panphlan_profiling.py”, line 299, in read_map_results
dna_samples_covs[dna_sample_id] = read_gene_cov_file(os.path.join(i_dna, dna_covs_file))
File “/Users/wangzhenyu/opt/miniconda3/envs/panphlan/bin/panphlan_profiling.py”, line 285, in read_gene_cov_file
for line in f:
File “/Users/wangzhenyu/opt/miniconda3/envs/panphlan/lib/python3.7/bz2.py”, line 215, in readline
return self._buffer.readline(size)
File “/Users/wangzhenyu/opt/miniconda3/envs/panphlan/lib/python3.7/_compression.py”, line 68, in readinto
data = self.read(len(byte_view))
File “/Users/wangzhenyu/opt/miniconda3/envs/panphlan/lib/python3.7/_compression.py”, line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream

Do anyone also meet this problem and know how to solve it?
Thanks for viewing and answering.

1 Like

did you already solve this? It happens to me also.

Hi,

what’s the content of your map_results/ folder ? It should be only compressed output files of panphlan_mapping.py. The profiling, if provided with a folder path in the --input parameter, will consider all the files in it. It might be that one of them is not a mapping output and mess up with the reading process that expect compression.

To avoid that, you can provide a text file with the list of path to the mapping results in the panphlan_profiling.py --input parameter. I usually go for that setup to avoid issue.

Hope this can help

1 Like

Hi, thanks for your reply, the output is only tsv.bz2 files, but still having this issue.

STEP 2. Create coverage matrix
[I] Reading mapping result file: Sample2.Ecoli.tsv.bz2
[I] Reading mapping result file: .DS_Store
Traceback (most recent call last):
File “/Users/arrow/miniconda3/bin/panphlan_profiling.py”, line 763, in
main()
File “/Users/arrow/miniconda3/bin/panphlan_profiling.py”, line 709, in main
dna_samples_covs = read_map_results(args.i_dna, args.verbose)
File “/Users/arrow/miniconda3/bin/panphlan_profiling.py”, line 286, in read_map_results
dna_samples_covs[dna_sample_id] = read_gene_cov_file(os.path.join(i_dna, dna_covs_file))
File “/Users/arrow/miniconda3/bin/panphlan_profiling.py”, line 272, in read_gene_cov_file
for line in f:
File “/Users/arrow/miniconda3/lib/python3.9/bz2.py”, line 208, in readline
return self._buffer.readline(size)
File “/Users/arrow/miniconda3/lib/python3.9/_compression.py”, line 68, in readinto
data = self.read(len(byte_view))
File “/Users/arrow/miniconda3/lib/python3.9/_compression.py”, line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream

I will try your idea.
thank you again. :slight_smile:

Pablo