Unserach error in metawibele preprocess

Hi, again, Yancong, thanks for your work on MetaWIBELE. I run the MetaWIBELE-characterize workflow with my own processed non-redundant protein sequences, gene counts and metadata files. I have encountered other errors before and solved them with your help. This time I encountered a new error in mspminer process:

(Oct 09 00:19:31) [27/45 -  60.00%] **Failed   ** Task 31: mspminer
(Oct 09 00:19:31) [27/45 -  60.00%] **Ready    ** Task 30: ln__counts_file
(Oct 09 00:19:31) [27/45 -  60.00%] **Started  ** Task 30: ln__counts_file
(Oct 09 00:19:31) [28/45 -  62.22%] **Completed** Task 30: ln__counts_file
(Oct 09 00:19:31) [28/45 -  62.22%] **Ready    ** Task 24: store__uniref_taxa
(Oct 09 00:19:31) [28/45 -  62.22%] **Started  ** Task 24: store__uniref_taxa
(Oct 09 00:19:37) [29/45 -  64.44%] **Completed** Task 24: store__uniref_taxa
(Oct 09 00:19:37) [30/45 -  66.67%] **Failed   ** Task 34: mspminer_msp
(Oct 09 00:19:37) [31/45 -  68.89%] **Failed   ** Task 36: mspminer_msp_uniref_annotation
(Oct 09 00:19:37) [32/45 -  71.11%] **Failed   ** Task 38: mspminer_msp_taxonomy_annotation
(Oct 09 00:19:37) [33/45 -  73.33%] **Failed   ** Task 40: mspminer_protein
(Oct 09 00:19:37) [34/45 -  75.56%] **Failed   ** Task 46: mspminer_protein_family_taxonomy
(Oct 09 00:19:37) [35/45 -  77.78%] **Failed   ** Task 50: store__mymsp_detail_taxa
(Oct 09 00:19:37) [36/45 -  80.00%] **Failed   ** Task 49: store__mymsp_detail_taxa_all_family
(Oct 09 00:19:37) [37/45 -  82.22%] **Failed   ** Task 48: store__mymsp_detail_taxa_family
(Oct 09 00:19:37) [38/45 -  84.44%] **Failed   ** Task 44: msp_protein_family
(Oct 09 00:19:37) [39/45 -  86.67%] **Failed   ** Task 42: mspminer_protein_family
(Oct 09 00:19:37) [40/45 -  88.89%] **Failed   ** Task 69: summary_function_annotation
(Oct 09 00:19:37) [41/45 -  91.11%] **Failed   ** Task 76: finalize_annotation
(Oct 09 00:19:37) [42/45 -  93.33%] **Failed   ** Task 73: summary_all_annotation
(Oct 09 00:19:37) [43/45 -  95.56%] **Failed   ** Task 66: summary_function_annotation
(Oct 09 00:19:37) [44/45 -  97.78%] **Failed   ** Task 74: finalize_annotation
(Oct 09 00:19:37) [45/45 - 100.00%] **Failed   ** Task 71: summary_all_annotation
Run Finished
Task 31 failed
  Name: mspminer
  Original error:
  Error executing action 0. Original Exception:
  Traceback (most recent call last):
    File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/runners.py", line 200, in _run_task_locally
      action_func(task)
    File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/helpers.py", line 107, in actually_sh
      ret = _sh(s, **kwargs)
    File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
      raise ShellException(proc.returncode, msg.format(cmd, ret[0].decode('utf-8'), ret[1].decode('utf-8')))
  anadama2.util.ShellException: [Errno 255] Command `mspminer /parastor/home/zhangwj02/MetaWIBELE/Res_First1/abundance_annotation/MSPminer_setting.cfg >/parastor/home/zhangwj02/MetaWIBELE/Res_First1/abundance_annotation/metawibele_MSPminer_msp.run_mspminer.log 2>&1' failed.

Could you please tell me how to deal with this?

Because of the error, I tried again to run the preprocessing workflow with the raw fastq data, and Everything went smoothly at first, until usearch reported an error:

10/16/2024 01:17:07 PM - LoggerReporter - INFO: task 886, format_protein_sequences : completed successfully
10/16/2024 01:17:07 PM - LoggerReporter - INFO: task 888, usearch__sorting : ready and waiting for resources
10/16/2024 01:17:07 PM - LoggerReporter - INFO: task 888, usearch__sorting : starting to run
10/16/2024 01:17:07 PM - LoggerReporter - INFO: Executing with shell:  usearch -sortbylength /parastor/home/zhangwj02/MetaWIBELE/Res_First_Rawdata/finalized/First_raw_combined_protein.faa -fastaout /parastor/home/zhangwj02/MetaWIBELE/Res_First_Rawdata/finalized/First_raw_combined_protein.sorted.faa -minseqlength 0 >/parastor/home/zhangwj02/MetaWIBELE/Res_First_Rawdata/finalized/First_raw_combined_protein.sorted.log 2>&1
10/16/2024 01:17:07 PM - LoggerReporter - ERROR: task 888, usearch__sorting :  Failed! Error message : Error executing action 0. Original Exception:
Traceback (most recent call last):
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/runners.py", line 200, in _run_task_locally
    action_func(task)
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/helpers.py", line 107, in actually_sh
    ret = _sh(s, **kwargs)
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
    raise ShellException(proc.returncode, msg.format(cmd, ret[0].decode('utf-8'), ret[1].decode('utf-8')))
anadama2.util.ShellException: [Errno 1] Command `usearch -sortbylength /parastor/home/zhangwj02/MetaWIBELE/Res_First_Rawdata/finalized/First_raw_combined_protein.faa -fastaout /parastor/home/zhangwj02/MetaWIBELE/Res_First_Rawdata/finalized/First_raw_combined_protein.sorted.faa -minseqlength 0 >/parastor/home/zhangwj02/MetaWIBELE/Res_First_Rawdata/finalized/First_raw_combined_protein.sorted.log 2>&1 ' failed.

Could you please help me with these two problems? Thank you very much!

Hi there,

Re1: Could you check the detailed log file in metawibele_MSPminer_msp.run_mspminer.log and see if you can find more information about the error? Additionally, make sure that mspminer is well set and can run smoothly with your inputs (which is independent with MetaWIBELE).

Re2: Could you check First_raw_combined_protein.sorted.log file for detailed error information? Due to the license requirements for usearch, we have replaced it with seqkit (an open-source tool) since MetaWIBELE v0.4.8.

Thanks!
Yancong

Hi Yancong. The log file in metawibele_MSPminer_msp.run_mspminer.log seems like no error message:

Progress: 99%
Progress: 99%
Progress: 100%
Metagenomic species creation done.

Printing Metagenomic Species Pan-genomes...
Done.

Time Statistics:
Reading count matrix: 0 min 19 sec
Computing number of mapped reads in samples: 0 min 0 sec
Sqrt transformation: 0 min 0 sec
Filtering rare genes: 0 min 2 sec
Creating genes bins: 0 min 1 sec
Creating seeds: 10 min 46 sec
Printing all seeds: 0 min 1 sec
Merging seeds: 0 min 7 sec
Printing merged seeds: 0 min 0 sec
Extracting core seeds: 0 min 1 sec
Printing core seeds: 0 min 0 sec
Creating Metagenomic Species Pan-genomes: 0 min 40 sec
Printing Metagenomic Species Pan-genomes: 0 min 0 sec
Total: 11 min 57 sec

Mspminer generates many result files in path /abundance_annotation/MSPminer_output/:

I tried to run mspminer independently with my input gene count file, I got similar results and no error occurred, but the size of each result file(such as all_msps.tsv, genes_bins.tsv) is larger than the corresponding produced by that with metawibele. I don’t know where the problem is.
Thank you again for your help

Hi there,

It looks that you have already got the mspminer’s results in your previous metawibele run. Then, the error you pointed in your first message may be caused by other steps. To further debug, could you show me the header of your input abundance file? The current version of MetaWIBELE-characterize requires the header of the abundance input file to start with “ID …” (i.e., the first column in the first row must be labeled as “ID”). If the input file is not properly formatted, it may cause issues during processing.

For mspminer results, it’s possible that the config file for mspminer that you used for you independent run was different from that one you used running with metawibele. Different config files could produce different results.

Thanks!
Yancong

Okay, Yancong, my input files are as follows, respectively.

Abundance file(first five rows):

ID	CZ121.1	HLD.8	QH100	QH105	QH53	QH57	QH63	QH66	QH72	QH76	QH80	CZ122.1	QH84	QH85	QH89	QH91	QH95	CZ123	QH99	CZ123.1	CZ157	CZ19	CZ53.1	CZ98.1	HLD.1	HLD.11	HLD.15	HLD.19	HLD.6	HLD.3	CZ121	QH51	QH55	QH59	QH61	QH68	QH70	QH74	QH78	QH82	QH87	QH93	QH97	CZ138	CZ156.1	CZ29.1	CZ3	CZ53	CZ73.1	CZ77.1	CZ79.1	CZ86.1	CZ89	CZ89.1	HLD.13	HLD.17	CZ4	HLD.10	CZ89.2	HLD.4	HLD.5	HLD.14	HLD.9	QH52	QH56	QH62	CZ122	QH67	QH73	QH77	QH81	QH88	QH90	QH94	QH98	CZ20	CZ29	CZ64	CZ77.2	CZ79	CZ86	HLD.18	HLD.20	CZLJE	HLD.12	HLD.7	QH101	QH103	QH106	QH54	QH58	QH60	QH64	QH65	QH69	QH71	QH75	QH79	QH83	QH86	QH92	QH96	CZ124	CZ135	CZ156	CZ73	CZ77	CZ98	HLD.16	HLD.2
CZ121.1_scaffold1_1_11154	3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	7	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	19	0	0	0	0	0	0
CZ121.1_scaffold2_1_21037	292	66	241	0	0	0	0	1	6	0	0	0	0	3	0	0	0	0	7	0	4	0	0	0	0	0	5	13	2	0	2	0	26	0	23	1	0	0	0	0	0	0	2	284	14	0	30	0	0	0	0	0	0	0	0	0	0	0	0	3	0	0	0	0	5	0	0	0	0	0	0	0	43	0	0	0	0	2	1	1	0	2	3	1	2	341	0	1	0	0	0	3	2	1	0	0	13	0	0	0	0	0	0	0	18	0	0	0	0	0
CZ121.1_scaffold1_1_11154_2	37	0	0	0	26	18	0	0	29	0	11	5	0	0	0	0	0	0	0	0	0	6	0	0	0	0	0	0	17	6	4	2	44	0	0	14	0	0	0	0	0	0	0	42	0	0	0	0	0	0	7	3	0	0	0	25	11	0	0	0	0	0	0	0	0	0	4	3	0	4	0	0	0	2	0	3	0	0	1	4	2	0	0	0	11	1	3	0	1	7	0	40	3	2	0	0	0	7	0	0	19	0	0	71	0	0	0	0	0	0
CZ121.1_scaffold2_1_21037_2	466	149	6	0	0	0	0	0	7	0	0	0	0	0	0	0	0	0	11	0	0	0	15	0	0	0	1	0	0	0	0	0	58	0	0	0	0	0	0	0	0	0	7	490	0	0	38	48	0	0	0	0	0	0	0	0	0	1	0	3	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	3	0	0	31	0	0	0	0	0	0	0	0	0	0	18	0	0	0	0	0	0	0	0	0	0	0	0	0

Protein sequence file (first six rows):

>CZ121.1_scaffold1_1_11154
LLQGCDKEVVELLGKLQRNGFLQALEAAENNKFDPDCYYTDFENCYSDFCAIFEGLKSYVSEYKVQYIRPLTLAEIAFFEARMSLAKQAKGGIPHIYMKGCN*
>CZ121.1_scaffold2_1_21037
MFLKTESFEHNGVTVTLSELSALQRIEHLALMKRQAEQAESDSNRKFTVEDAIRTGAFVVAMSLWHNHPQKTKQPSMNEAVKQIEQEVLTTWPTEAISHAENVVYRLSGMYEFVVNNTPEQTEDAGPAEPVSAGKCSTVS*

Metadata file (first six rows):

SID	diagnosis
CZ121.1	ICH
HLD.8	ICH
QH100	Control
QH105	Control
QH53	Control

Thanks for dealing with my problem!

Thanks for sending. Did you run MetaWIBELE-characterize successfully using the demo data (metawibele · biobakery/biobakery Wiki · GitHub)?

Best,
Yancong

Hi, Yancong, I run MetaWIBELE-characterize successfully using the demo data, and same error occurred:

10/21/2024 10:35:20 AM - LoggerReporter - INFO: task 30, ln__counts_file : completed successfully
10/21/2024 10:35:20 AM - LoggerReporter - INFO: task 24, store__uniref_taxa : ready and waiting for resources
10/21/2024 10:35:20 AM - LoggerReporter - INFO: task 24, store__uniref_taxa : starting to run
10/21/2024 10:35:20 AM - LoggerReporter - INFO: Executing with shell:  cp -f /parastor/home/zhangwj02/MetaWIBELE/Res_demo_inter/global_homology_annotation/metawibele_protein_annotation.uniref90_annotation.tsv /parastor/home/zhangwj02/MetaWIBELE/Res_demo_inter/finalized/metawibele_protein_annotation.uniref90_annotation.tsv
10/21/2024 10:35:20 AM - anadama2.helpers - INFO: Execution complete. Stdout:
Stderr:
10/21/2024 10:35:20 AM - LoggerReporter - INFO: task 24, store__uniref_taxa : completed successfully
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 34, mspminer_msp :  Failed! Error message : Task failed because parent task `31' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 36, mspminer_msp_uniref_annotation :  Failed! Error message : Task failed because parent task `34' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 38, mspminer_msp_taxonomy_annotation :  Failed! Error message : Task failed because parent task `36' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 40, mspminer_protein :  Failed! Error message : Task failed because parent task `34' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 46, mspminer_protein_family_taxonomy :  Failed! Error message : Task failed because parent task `40' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 50, store__mymsp_detail_taxa :  Failed! Error message : Task failed because parent task `46' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 49, store__mymsp_detail_taxa_all_family :  Failed! Error message : Task failed because parent task `46' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 48, store__mymsp_detail_taxa_family :  Failed! Error message : Task failed because parent task `46' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 44, msp_protein_family :  Failed! Error message : Task failed because parent task `40' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 42, mspminer_protein_family :  Failed! Error message : Task failed because parent task `40' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 69, summary_function_annotation :  Failed! Error message : Task failed because parent task `42' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 76, finalize_annotation :  Failed! Error message : Task failed because parent task `42' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 73, summary_all_annotation :  Failed! Error message : Task failed because parent task `50' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 66, summary_function_annotation :  Failed! Error message : Task failed because parent task `42' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 74, finalize_annotation :  Failed! Error message : Task failed because parent task `48' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: task 71, summary_all_annotation :  Failed! Error message : Task failed because parent task `48' failed
10/21/2024 10:35:20 AM - LoggerReporter - ERROR: AnADAMA run finished with errors.
Traceback (most recent call last):
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/bin/characterize.py", line 385, in <module>
    main(parse_cli_arguments())
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/bin/characterize.py", line 380, in main
    workflow.go()
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/workflow.py", line 804, in go
    self._handle_finished()
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/workflow.py", line 836, in _handle_finished
    raise RunFailed()
anadama2.workflow.RunFailed

The strange thing is that I used the demo data to run MetaWIBELE-characterize before and it worked smoothly. I don’t know what went wrong now. How can I solve it?

Are there any differences in your installation or runtime environment between the successful demo run last time and the failed run this time? For example, software updates, changes to the metawibele.cfg file, or modifications to the running command?

If you’re still unsure what caused the issue, I might need more details for deeper troubleshooting. Could you redirect all the log messages printed on the screen from the demo run into a file and share it with me, along with the metawibele.cfg file you used and the corresponding output folder?

Thanks!
Yancong

Hi, Yancong,
During this period of time, I have newly installed or upgraded some software, so I don’t know where the problem lies. I have compressed the required files and sent them to your email. Thank you again for helping me solve the problem!
Best wishes,
WenjinZhang