Task combine_gene_sequences crushing

yuzie · April 6, 2022, 1:23am

Hi Yancong,

We now can use your tool, but we always have to rerun preprocess step twice in your tool.
The step is ‘conbine_gene_sequences.py’

Command ran successfully for our test_set when we ran again but the first try that failed and we didn’t change anything.
However, we need to merge your tool to our pipeline and if such situation we have to manually build soft link or re-run every time, which is not so ideal for us.
Do you have any idea or solution ?

Thanks in advance,
Yuzie

YancongZhang · April 6, 2022, 10:08pm

Hi Yuzie
To help me further debug, could you fill me in more about your running context:

Could you tell me your running command? I’d like to know which parameters you used.
Did you check the log file about the failed first run? I am curious about whether there are some errors/warnings reported in the logs (you can check both the global log file and local logs for each step).
Did you try to test a small set of data to see whether it works? If it can be run successfully, something like resource distribution in your computing environment may be related to this failure. In this case, you may want to balance the size of your input data and computing resources, e.g. split into batches for running.

Thanks!
Yancong

yuzie · April 7, 2022, 2:11am

Hello Yancong, thanks for your helpful replying.

The running command is as followed:
metawibele preprocess --input fastq/ --output preprocess/ --output-basename batch_22 --extension-paired “_1.fastq,_2.fastq” --extension “.fastq”
(We only used default without changing your metawibele.cfg except threads option – we change to 30)
Our raw reads have been cleaned up with Kneaddata to exclude human genome and possible contamination.
We retain the log files and information from standard output in the first run and I will share with you.
We also tried your test reads on your github website, and we still have to run twice crushing in the same step.

Regards,
Yuzie
batch_22_combined_gene_log.txt (767 Bytes)
preprocess-1_log.txt (44.5 KB)

YancongZhang · April 7, 2022, 10:03pm

Hi Yuzie,

Thanks for sharing this information. By testing the MetaWIBELE-process module with different parameter settings, it seems like there is a tiny communication delay between gradable tasks (i.e. tasks performed in parallel) and series tasks when the applied number of jobs is not specified in MetaWIBELE v0.4.4. In theory, when running tasks in parallel, we may need to apply at least two jobs for running, otherwise it might cause some crushing.

A dirty-and-quick way to solve this issue: you could try to use the ‘--local-jobs’ parameter in the running command to specify how many jobs are applied to run (e.g. --local-jobs 2). I successfully run it with the demo data in the Docker image (metawibele preprocess --input raw_reads/ --output preprocess/ --extension-paired "_R1.fastq.gz,R2.fastq.gz" --extension ".fastq.gz" --local-jobs 2).

Alternatively, I have tweaked the modules to solve this kind of communication delay in our latest developing version that is deposited in Github. If you are interested, you could try to use this developing version instead.

Thanks!
Yancong

Topic		Replies	Views
featureCounts running error in metwibele preprocess MetaWIBELE	2	161	January 16, 2024
Wrong result in output data metawibele preprocess MetaWIBELE	2	256	July 8, 2023
Preprocess Fail MetaWIBELE	1	413	November 12, 2021
Metawibele preprocessing error MetaWIBELE	2	29	September 3, 2024
Error in database preparing: the python script first downloads uniprot_sprot.dat.gz and uniprot_trembl.dat.gz, then combines them into a file called "uniprot.dat.gz", but the command line for combining two files does not work well MetaWIBELE	3	30	June 28, 2024

Task combine_gene_sequences crushing

Related topics