HELP! OSError:invalid data stream

Hello,
Thanks for your great contributions for the advanced of bioinfo analysis.
i meet a terrible error when using metaphlan and have no idea to handle it after google and reinstall the software. please help me, the detail infos are as follows,

1. code

srun -c 64 metaphlan --nproc 64 ~/test.dir/SRR1976948_1.fastq --input_type fastq --bowtie2db ./ -x mpa_vOct22_CHOCOPhlAnSGB_202212 > ~/test.metaphlan

2. error

Traceback (most recent call last):
File “/public5/home/sch7108/.conda/envs/metagenome/bin/metaphlan”, line 10, in
sys.exit(main())
File “/public5/home/sch7108/.conda/envs/metagenome/lib/python3.9/site-packages/metaphlan/metaphlan.py”, line 1084, in main
mpa_pkl = pickle.load( a )
File “/public5/home/sch7108/.conda/envs/metagenome/lib/python3.9/bz2.py”, line 161, in peek
return self._buffer.peek(n)
File “/public5/home/sch7108/.conda/envs/metagenome/lib/python3.9/_compression.py”, line 68, in readinto
data = self.read(len(byte_view))
File “/public5/home/sch7108/.conda/envs/metagenome/lib/python3.9/_compression.py”, line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream

3.data

head ~/test.dir/SRR1976948_1.fastq

@SRR1976948.1 1/1
NTGGTACCATCCAGAGTGCAGCTATCAGATATTGTCCTTCTGTAGAAGATAAGATAATTAGGTTCCCCGAAAAAGAATCTCATCAGATATTCTTAGAACCAGAGGGATATAATACTGAAGAAATATATTTACAGGGATTTTTTACCAGTCTACCTGCTGATGTCCAACAGGAAGCCTTGCACACTATTGAAGGTTTGGAAAATTGCAAGATTATGCGCTACGGATATGCTATTGAATATGACATTATATAT
+
#55,<<<BBBBBBBBBF@FFFFF?FFBCCF=EGDFGFGHFHDFBDDBC/D?FH@EGDGHHHHHHHHHFHCBHBHEHHHHHHDHHFGBFBFHGHHHHFHHHFHFHHEEHHHHHHHHHFFFFDFFFFHDFHHHHFFFE@FFFEEEEEEECEEEEEEEECEECEEEEEE?CA?A?CEEEEECCEEACEACC:??EECEE8::::CEAEE:CEAEEEACEE?;?EA8AAECCEEEE?:CECEEAECEEEEE##

4.system info

1.python version : 3.7.3.final.0
2.virtual packages : __linux=3.10.0=0
__glibc=2.17=0
__unix=0=0
__archspec=1=x86_64
3.MetaPhlAn version 4.0.4 (17 Jan 2023)
4.PhyloPhlAn version 3.0.67 (24 August 2022)

5. database info

mpa_latest mpa_vOct22_CHOCOPhlAnSGB_202212.rev.1.bt2l
mpa_vOct22_CHOCOPhlAnSGB_202212.1.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212.rev.2.bt2l
mpa_vOct22_CHOCOPhlAnSGB_202212.2.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212_SGB.fna.bz2
mpa_vOct22_CHOCOPhlAnSGB_202212.3.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212_species.txt.bz2
mpa_vOct22_CHOCOPhlAnSGB_202212.4.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212.tar
mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.md5 mpa_vOct22_CHOCOPhlAnSGB_202212_VINFO.csv
mpa_vOct22_CHOCOPhlAnSGB_202212_marker_info.txt.bz2 mpa_vOct22_CHOCOPhlAnSGB_202212_VSG.fna.bz2
mpa_vOct22_CHOCOPhlAnSGB_202212.pkl

I have the same issue. it looks like some wrong with the file ‘mpa_vOct22_CHOCOPhlAnSGB_202212.pkl’.

When I read this file according to the source code, I get an error. “OSError: Invalid data stream”. And the last version ‘mpa_vJan21_CHOCOPhlAnSGB_202103.pkl’ is good.

So you can try previous version database for test.

Hi @Richard_Liu @licui
Thanks for reporting this problem. I apologise for the inconvenience. Indeed, we have just detected that the pkl file uploaded to our servers had been corrupted. I just re-uploaded all the data and retested the databases again and all seem to be working. Please, try to re-download the database again and let me know if you encounter any other problem.

Was struggling with this OSError: Invalid data stream error all day yesterday…

Just started a fresh conda environment (conda create -n metaphlan4), and installed metaphlan (conda install -c bioconda metaphlan). Tried to install the database, got this:

Downloading https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAA4XDP85WHon_eHvztxkamTa/file_list.txt?dl=1

Warning: Unable to download https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAA4XDP85WHon_eHvztxkamTa/file_list.txt?dl=1

Traceback (most recent call last):
File “/usr/local/bin/miniconda3/envs/metaphlan4/bin/metaphlan”, line 10, in
sys.exit(main())
File “/usr/local/bin/miniconda3/envs/metaphlan4/lib/python3.6/site-packages/metaphlan/metaphlan.py”, line 1187, in main
pars[‘index’] = check_and_install_database(pars[‘index’], pars[‘bowtie2db’], pars[‘bowtie2_build’], pars[‘nproc’], pars[‘force_download’])
File “/usr/local/bin/miniconda3/envs/metaphlan4/lib/python3.6/site-packages/metaphlan/metaphlan.py”, line 589, in check_and_install_database
index = resolve_latest_database(bowtie2_db, ls_f[‘mpa_latest’], force_redownload_latest)
UnboundLocalError: local variable ‘ls_f’ referenced before assignment

Same error occurs if I use metaphlan --install or run a full metaphlan analysis command line.

Should it be trying to download something from Dropbox?

Thanks for any suggestions…

Hi @MikeC
It seems your system is trying to install an old version of metaphlan (probably version 3) you can check the version by running $ metaphlan --version

You’re right…

Yesterday, I ran:

conda install -c conda-forge -c bioconda metaphlan

and got MetaPhlAn version 4.0.5 (23 Feb 2023).

Today, I ran (in a new environment):

conda install -c bioconda metaphlan

and got MetaPhlAn version 3.0 (25 Feb 2020).

Is inclusion of the conda-forge channel necessary to get version 4?

Hi @MikeC
It depends on the conda installation, sometimes due to package inconsistencies it is necessary

Ok, I’ve reverted to yesterday’s environment (that has metaphlan v 4.0.5 installed), and I’m re-downloading the databases.

Fingers crossed…

Thank you for your help.

Still something wrong. Cleaned out prior database files at /usr/local/bin/miniconda3/envs/metap/lib/python3.7/site-packages/metaphlan/metaphlan_databases, and re-ran install:

(metap environment) # metaphlan --install

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_latest
Downloading file of size: 0.00 MB
0.01 MB 25600.00 % 60.35 MB/sec 0 min -0 sec
Downloading MetaPhlAn database
Please note due to the size this might take a few minutes

\Downloading and uncompressing indexes

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar
Downloading file of size: 20348.27 MB
20348.27 MB 100.00 % 17.53 MB/sec 0 min -0 sec
Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.md5
Downloading file of size: 0.00 MB

Warning: Unable to extract /usr/local/bin/miniconda3/envs/metap/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212_bt2.tar.

Downloading and uncompressing additional files

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212.tar

Warning: Unable to download http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212.tar

Downloading http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212.md5

Warning: Unable to download http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212.md5

File “/usr/local/bin/miniconda3/envs/metap/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212.md5” not found!
File “/usr/local/bin/miniconda3/envs/metap/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212.tar” not found!

MD5 checksums not found, something went wrong!

Not a disk space issue, I have terabytes of free space. The (corrupted) databases I downloaded yesterday using the same command worked fine.

Ideas?

Stand by – I might have a disk space issue after all… Installation was mistakenly pointing to a 1 Tb SSD drive rather than my much larger data partition…

Deleted everything and started over, from the installation of miniconda.

I did have to include the conda-forge channel to get MetaPhlAn version 4.0.5 (23 Feb 2023).

Installed the databases without issue.

Ran a test analysis on a metagenome, and metaphlan completed without errors.

Thank you for your help!

hi @aitor.blancomiguez, thanks for updating the file. It works well.

With the new database, I found a bug in the Humann3 config.py . You set metaphlan_v4_db_version=“vJan21” , but latest database names “vOct”. And error likes

config.metaphlan_v3_db_version+" or “+metaphlan_v4_db_version+” . Please update your version of MetaPhlAn to at least v3.0."
NameError: name ‘metaphlan_v4_db_version’ is not defined

I guess this might be a bug and for your reference

Hi @licui
The current version of Humann 3.6 is still not compatible with the latest version of the metaphlan database (vOct22). My suggestion will be to keep them separately in two different environments, one keeping humann 3.6 with vJan21 for humann profiling and a separate one with metaphlan with vOct22 if you are interested on having the tax profiles with the latest database

hello, i am running humann3 but failed many times. i download metaphlan database(vJan21) use wget commond in the metaphlan installation dir, couse you said that Humann 3.6 is still not compatible with the latest version of the metaphlan database (vOct22). but when i try human3 again, it still auto download the database and error again. belows is my database content,

mpa_vJan21_CHOCOPhlAnSGB_202103.1.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103.2.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103.3.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103.4.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103_bt2.tar
mpa_vJan21_CHOCOPhlAnSGB_202103.rev.1.bt2l
mpa_vJan21_CHOCOPhlAnSGB_202103.rev.2.bt2l

i find it lack of some files such as

mpa_latest, ***.pkl , ***_VINFO.csv

compared with database download by metaphlan --install --bowtie2db command. so can you tell me how to resolve this problem? or i want konw how to use metaphlan --install --bowtie2db to donwnload the old version database. thank you very much!