Metawibele error about mspminer

Hi, Yancong, thanks for your work on MetaWIBELE, it is a very powerful tool!
I installed MetaWIBELE with conda, and installed its dependencies to use it. Because I failed to download psortb, and interproscan related dependencies were not configured successfully, I ran the MetaWIBELE-characterize workflow with the following codes:

srun -n 1 metawibele characterize --input-sequence "/parastor/home/zhangwj02/MetaWIBELE/Res_First/output_proteins.fasta" \
--input-count "/parastor/home/zhangwj02/MetaWIBELE/Res_First/combined_expression.tsv" \
--input-metadata "/parastor/home/zhangwj02/MetaWIBELE/Res_First/Metadata_information.tsv" \
--output /parastor/home/zhangwj02/MetaWIBELE/Res_First/ \
--bypass-psortb --bypass-interproscan

I received an error (full report attached):

09/18/2024 03:24:48 AM - LoggerReporter - INFO: task 59, abundance_normalization : completed successfully
09/18/2024 03:24:48 AM - LoggerReporter - INFO: task 63, abundance_annotator : ready and waiting for resources
09/18/2024 03:24:48 AM - LoggerReporter - INFO: task 63, abundance_annotator : starting to run
09/18/2024 03:24:49 AM - LoggerReporter - INFO: Executing with shell:  metawibele_abundance_annotator -a /parastor/home/zhangwj02/M                                                                                               etaWIBELE/Res_First/abundance_annotation/metawibele_genecatalogs_nrm.tsv -c /parastor/home/zhangwj02/temp/metawibele_proteinfamilie                                                                                               s.clstr -m /parastor/home/zhangwj02/MetaWIBELE/Res_First/Metadata_information.tsv -f protein -t DNA_abundance -o /parastor/home/zha                                                                                               ngwj02/MetaWIBELE/Res_First/abundance_annotation/metawibele_DNA_proteinfamilies.ORF.abundance.detail.tsv >/parastor/home/zhangwj02/                                                                                               MetaWIBELE/Res_First/abundance_annotation/metawibele_DNA_proteinfamilies.ORF.abundance.detail.log 2>&1
09/18/2024 03:36:21 AM - anadama2.helpers - INFO: Execution complete. Stdout:
Stderr:
09/18/2024 03:36:21 AM - LoggerReporter - INFO: task 63, abundance_annotator : completed successfully
09/18/2024 03:36:21 AM - LoggerReporter - INFO: task 51, sum_to_protein_family_abundance : ready and waiting for resources
09/18/2024 03:36:21 AM - LoggerReporter - INFO: task 51, sum_to_protein_family_abundance : starting to run
09/18/2024 03:36:22 AM - LoggerReporter - INFO: Executing with shell:  metawibele_sum_to_protein_family_abundance -i /parastor/home                                                                                               /zhangwj02/MetaWIBELE/Res_First/abundance_annotation/combined_expression.refined.tsv -c /parastor/home/zhangwj02/temp/metawibele_pr                                                                                               oteinfamilies.clstr -o /parastor/home/zhangwj02/MetaWIBELE/Res_First/abundance_annotation/metawibele_proteinfamilies_counts.all.tsv                                                                                                >/parastor/home/zhangwj02/MetaWIBELE/Res_First/abundance_annotation/metawibele_proteinfamilies_counts.all.log 2>&1
09/18/2024 03:36:46 AM - LoggerReporter - ERROR: task 51, sum_to_protein_family_abundance :  Failed! Error message : Error executin                                                                                               g action 0. Original Exception:
Traceback (most recent call last):
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/runners.py", line 200, in _run_task_locall                                                                                               y
    action_func(task)
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/helpers.py", line 107, in actually_sh
    ret = _sh(s, **kwargs)
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
    raise ShellException(proc.returncode, msg.format(cmd, ret[0].decode('utf-8'), ret[1].decode('utf-8')))
anadama2.util.ShellException: [Errno 1] Command `metawibele_sum_to_protein_family_abundance -i /parastor/home/zhangwj02/MetaWIBELE/                                                                                               Res_First/abundance_annotation/combined_expression.refined.tsv -c /parastor/home/zhangwj02/temp/metawibele_proteinfamilies.clstr -o                                                                                                /parastor/home/zhangwj02/MetaWIBELE/Res_First/abundance_annotation/metawibele_proteinfamilies_counts.all.tsv >/parastor/home/zhang                                                                                               wj02/MetaWIBELE/Res_First/abundance_annotation/metawibele_proteinfamilies_counts.all.log 2>&1' failed.
Out:
Err:

09/18/2024 03:36:46 AM - LoggerReporter - ERROR: task 53, abundance_filtering :  Failed! Error message : Task failed because parent                                                                                                task `51' failed

It looks like an error occurred in the sum_to_protein_family_abundance (Task 51) step of the mspminer process. Within a few minutes of this error, MetaWIBELE stopped running and generated an incomplete result file as shown below:

The output files in MSPminer_output look fine and are the same as the files generated by running the ./mspminer settings.ini command when I tested mspminer alone. The following figure shows all the result files in the abundance_annotation folder:

The error in metawibele_proteinfamilies_counts.all.log is as follows:

(metaw) [zhangwj02@mu03 MSPminer_1_1_3]$ cat "/parastor/home/zhangwj02/MetaWIBELE/Res_First/abundance_annotation/metawibele_proteinfamilies_counts.all.log"
09/18/2024 03:36:22 AM - metawibele.config - INFO: ### Start sum_to_protein_family_abundance step ####
09/18/2024 03:36:22 AM - metawibele.config - INFO: Get cluster info ......starting
09/18/2024 03:36:46 AM - metawibele.config - INFO: Get cluster info ......done
09/18/2024 03:36:46 AM - metawibele.config - INFO: Assign counts to protein families ......starting
Traceback (most recent call last):
  File "/parastor/home/zhangwj02/miniconda3/envs/mw/bin/metawibele_sum_to_protein_family_abundance", line 10, in <module>
    sys.exit(main())
  File "/parastor/home/zhangwj02/.local/lib/python3.7/site-packages/metawibele/characterize/sum_to_protein_family_abundance.py", line 196, in main
    assign_counts (pep_cluster, values.t, values.i, values.o)
  File "/parastor/home/zhangwj02/.local/lib/python3.7/site-packages/metawibele/characterize/sum_to_protein_family_abundance.py", line 125, in assign_counts
    mys = titles[myindex]
KeyError: 1

I’d very appreciate if you could help me with this problem. Thanks!

Hi there,

It seems like that the process was unable to recognize the header line of your input abundance file, combined_expression.tsv. The header should be formatted as "ID ..." (i.e., the first column in the first row must be labeled as “ID”). Could you adjust the header of combined_expression.tsv and try again to see if it resolves the issue?

In general, MetaWIBELE-characterize typically takes MetaWIBELE-preprocess’s outputs as inputs. Users can feed their own inputs to MetaWIBELE-characterize, but they need to format their inputs as required.

Thanks!
Yancong