Task combine_gene_sequences crushing

Hi Yancong,

We now can use your tool, but we always have to rerun preprocess step twice in your tool.
The step is ‘conbine_gene_sequences.py’


Command ran successfully for our test_set when we ran again but the first try that failed and we didn’t change anything.
However, we need to merge your tool to our pipeline and if such situation we have to manually build soft link or re-run every time, which is not so ideal for us.
Do you have any idea or solution ?

Thanks in advance,
Yuzie

Hi Yuzie
To help me further debug, could you fill me in more about your running context:

  1. Could you tell me your running command? I’d like to know which parameters you used.
  2. Did you check the log file about the failed first run? I am curious about whether there are some errors/warnings reported in the logs (you can check both the global log file and local logs for each step).
  3. Did you try to test a small set of data to see whether it works? If it can be run successfully, something like resource distribution in your computing environment may be related to this failure. In this case, you may want to balance the size of your input data and computing resources, e.g. split into batches for running.

Thanks!
Yancong

Hello Yancong, thanks for your helpful replying.

  1. The running command is as followed:
    metawibele preprocess --input fastq/ --output preprocess/ --output-basename batch_22 --extension-paired “_1.fastq,_2.fastq” --extension “.fastq”
    (We only used default without changing your metawibele.cfg except threads option – we change to 30)
    Our raw reads have been cleaned up with Kneaddata to exclude human genome and possible contamination.
  2. We retain the log files and information from standard output in the first run and I will share with you.
  3. We also tried your test reads on your github website, and we still have to run twice crushing in the same step.

Regards,
Yuzie
batch_22_combined_gene_log.txt (767 Bytes)
preprocess-1_log.txt (44.5 KB)

Hi Yuzie,

Thanks for sharing this information. By testing the MetaWIBELE-process module with different parameter settings, it seems like there is a tiny communication delay between gradable tasks (i.e. tasks performed in parallel) and series tasks when the applied number of jobs is not specified in MetaWIBELE v0.4.4. In theory, when running tasks in parallel, we may need to apply at least two jobs for running, otherwise it might cause some crushing.

A dirty-and-quick way to solve this issue: you could try to use the ‘--local-jobs’ parameter in the running command to specify how many jobs are applied to run (e.g. --local-jobs 2). I successfully run it with the demo data in the Docker image (metawibele preprocess --input raw_reads/ --output preprocess/ --extension-paired "_R1.fastq.gz,R2.fastq.gz" --extension ".fastq.gz" --local-jobs 2).

Alternatively, I have tweaked the modules to solve this kind of communication delay in our latest developing version that is deposited in Github. If you are interested, you could try to use this developing version instead.

Thanks!
Yancong

1 Like