Sorry, I realize this has been asked a few times for strainphlan, but I have the same problem and I could not understand which part it’s having error in. I use the most lenient settings for sample and marker threshold but it still keeps throwing this error message. I base my clade_markers of choice from metaphlan results so I am expecting these markers to be present in at least a quarter of my samples.
When I do --print-clades-only it doesn’t return me any clade either.
Wed Mar 29 21:44:51 2023: Start StrainPhlAn 4.0.6 execution
Wed Mar 29 21:44:51 2023: Creating temporary directory...
Wed Mar 29 21:44:51 2023: Done.
Wed Mar 29 21:44:51 2023: Filtering markers and samples...
Wed Mar 29 21:44:51 2023: Getting markers from main samples...
Wed Mar 29 21:44:51 2023: Done.
Wed Mar 29 21:44:51 2023: Getting markers from main references...
Wed Mar 29 21:44:51 2023: Done.
Wed Mar 29 21:44:51 2023: Removing bad markers / samples...
Error message:
Wed Mar 29 21:44:51 2023: [Error] Phylogeny can not be inferred. Too many samples were discarded.Wed Mar 29 21:44:51 2023: Stop StrainPhlAn execution.
Tmp output:
|-tmp1ijb_u1t
| |-t__SGBxxxx.fna
| |-blastn
Is there a way to know at least if this is a problem of the quality of my sequences (so more upstream) or if it’s something fixable in the parameters (so downstream) ? The range of size of the .pkl of my generated consensus_markers are 5.5 MB - 28.1 MB.
Did you run metaphlan with the database version Jan21 or Oct22 ? In version 4.0.6 Oct22 is the default database, so if you ran MetaPhlAn with the previous version it will lead to this kind of results
Okay, thank you. It seems the metaphlan results we have were generated from vJan21. The analysis I’m trying out is based on these metaphlan results so I would rather adjust to accommodate the vJan21 data. Is there a way to download chocophlan vJan21 instead or an earlier version of strainphlan that supports vJan21?
Traceback (most recent call last):
File "/dir/anaconda3/envs/mph4/bin/strainphlan", line 8, in <module>
sys.exit(main())
File "/dir/anaconda3/envs/mph4/lib/python3.7/site-packages/metaphlan/strainphlan.py", line 624, in main
strainphlan_runner.run_strainphlan()
File "/dir/anaconda3/envs/mph4/lib/python3.7/site-packages/metaphlan/strainphlan.py", line 452, in run_strainphlan
self.print_clades()
File "/dir/anaconda3/envs/mph4/lib/python3.7/site-packages/metaphlan/strainphlan.py", line 370, in print_clades
species2samples = self.detect_clades(markers2species)
File "/dir/anaconda3/envs/mph4/lib/python3.7/site-packages/metaphlan/strainphlan.py", line 353, in detect_clades
sample = ConsensusMarkers(pkl_file=sample_path)
File "/dir/anaconda3/envs/mph4/lib/python3.7/site-packages/metaphlan/utils/consensus_markers.py", line 100, in __init__
self.from_pkl(pkl_file)
File "/dir/anaconda3/envs/mph4/lib/python3.7/site-packages/metaphlan/utils/consensus_markers.py", line 93, in from_pkl
pkl_file)[1] == ".bz2" else pickle.load(open(pkl_file, "rb"))
ValueError: unsupported pickle protocol: 5
Hi @ange
It looks like the sample2markers was run with python 3.8+ which generated the pkl files with protocol 5, while you were running the strainphlan with python version lower than 3.8 (in this case 3.7). I will update the python version to >3.8 in the environment you are running strainphlan