I am going to strainphlan4 to abtain strain level data from mgs data, but when I use command sample2markers.py -i sams/*.sam.bz2 -o consensus_markers -n 8, I don’t get any pkl files, but json.bz2 files. I don’t know how to solve this problem, please help me, thanks
.
I have solved. In strainphlan4.1 version, the json files replace pkl files.
So how did you solved it? Do you use a older version of strainphlan or metaphlan?
Hi, this is not an issue. Since StrainPhlAn 4.1, the output of sample2markers is in JSON format (not .pkl anymore), as described in the changelog of MetaPhlAn 4.1. The only difference is that now you will provide JSON files during the strainphlan command execution instead of .pkl.
Hi, I am facing the same situation but with biobakery_workflow wmgx. The workflow expects a pkl file not a json.bz2. Do you know how I can modify the workflow in order to change the expected extension?
Thanks
Hi there, I’ve managed to solve it by replacing:
1 biobakery_workflows/utilities.py
2 biobakery_workflows/tasks/shotgun.py
More details here: Add strainphlan wildcard in one more location · biobakery/biobakery_workflows@c9a6adb · GitHub
Best regards
Took me forever to find this post, but so glad I did, (thank you to the OP, all commenters, and especially BlueK77. That being said, I’m going to add more information in case anybody else needs to find this in the future, and like me, had to do more legwork and local code edits still after what was listed here to fix the issue.
- Important to list for people trying to find this post: the specific error (non-fatal) that gets printed to the command line and in the anadama.log is:
2025-02-14 12:12:18,899 LoggerReporter task_failed ERROR: task 68, strainphlan_sample2markers____HD42R4_subsample : Failed! Error message : Failed to produce target `/Users/nyb/test/biobakery_wf_output/strainphlan/HD42R4_subsample_bowtie2/HD42R4_subsample_bowtie2.pkl'. Original exception: Traceback (most recent call last):
File "/Users/nyb/miniforge3/envs/biobakery_workflows/lib/python3.11/site-packages/anadama2/runners.py", line 219, in _get_task_result
targ_compares.append(list(target.compare()))
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nyb/miniforge3/envs/biobakery_workflows/lib/python3.11/site-packages/anadama2/tracked.py", line 379, in compare
stat = os.stat(self.name)
^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/Users/nyb/test/biobakery_wf_output/strainphlan/HD42R4_subsample_bowtie2/HD42R4_subsample_bowtie2.pkl'
1
It shows up when “strainphlan_sample2markers____[your sample name]” fails to complete the run.
-
In BlueK77 's comment, there’s a reference to a change made to utilities.py, I don’t see any evidence of a change made to that file on the GitHub link, and reading my local copy of the file, I don’t see any reference to MetaPhlAn versioning written there. As a result, I did not touch/edit that file. Not knocking the comment, just clarifying in case anybody else gets confused like I was.
-
BlueK77 only lists one link to a Github code edit they made, but technically there were two edits they made in separate commits which are both important (thank you), I’ve listed both links here so people know to check for both in their local files:
-
There’s one problem with those edits: They don’t account for the fact that new versions of MetaPhlAn continue to be released, and the command checks the version against a static string. I have MetaPhlAn version 4.1.1, which means it doesn’t look for and detect the json.bz2 files that get created. The janky workaround for this is to edit your local files the same as shown in BlueK77’s git commits but just update what’s written to match the version you’re using (e.g. in my case “4.1.1.” instead of “4”). The proper way to fix this though would be for someone who knows Python to update the code to state that if your installed version of MetaPhlan is version 4 or greater, to use the json.bz2, else use the .pkl extension. I may circle back and make that commit to the GitHub myself when I’m done working on analyzing this data (TBD), but I’m just starting to learn Python, so I imagine someone else could do it cleaner/faster than me.
-
There’s another file that needs to be edited (at least on my system…) in order to get this all to work. I have two separate shotgun.py files in different areas, the file BlueK77 mentioned is here:
/Users/nyb/biobakery_workflows/biobakery_workflows/tasks
While the second file is here:
/Users/nyb/miniforge3/envs/biobakery_workflows/lib/python3.11/site-packages/biobakery_workflows/tasks/shotgun.py
They are different files content wise. In the second file, I needed to update line 798 so extension="pkl"
instead read extension="json.bz2"
. After that, the error went away, as shown below:
strainphlan_markers_temp = utilities.name_files(sam_files, output_folder, subfolder="strainphlan", extension="json.bz2", create_folder=True)
That’s not a great way of fixing it though, again, that should be rewritten to handle both options with a wild card (similar to how BlueK77 fixed the first shotgun.py file to change the extension based on the version of MetaPhlAn you’re using, but with the addition of the recommendation I made to to allow for continued releases of versions past MetaPhlAn 4). The edit I listed only makes sense if you are using >= version 4 and the file output/extension type doesn’t change in future versions.
- Finally, after getting past this error, the tasks labeled
strainphplan_print_clades
/order_clade_list
/strainphlan_clade_[insert number]
still fail when they go to run.
If you update line 823 in that last shotgun.py file to use “json.bz2” (instead of “pkl”) as shown below, it will fix the error. Again, this will only work if the version of MetaPhlAn you are using is >=4 and the output file type/extension doesn’t change in future versions.
"strainphlan --samples [args[0]]/*/*.json.bz2 --output_dir [args[0]] --print_clades_only > [targets[0]] "+options,
depends=strainphlan_markers,
targets=clade_list,
args=os.path.abspath(os.path.join(os.path.dirname(strainphlan_markers[0]),"..")),
name="strainphlan_print_clades")```