Wmgx_wmtx worfklow rna_dna_norm.py failure + why is metagenome needed in metatranscriptome analysis?

hilasha2 · December 1, 2021, 3:57pm

I have two questions -

I have tried to run the metagenome-metatranscriptome (wmgx_wmtx) workflow in biobakery_worfklows.
The workflow fails whenever it reaches to the pipeline rna_dna_norm.py.
I ran the worfklow on samples from the human microbiome project 2. I used some metatranscriptome samples and matching metagenome samples.
I ran the workflow once using a tsv mapping file matching the metagenome and metranscriptome samples (I tried to follow closely the tutorial):

# wts	wms
CSM67UGO	CSM67UGO
CSM79HI3	CSM79HI3
CSM79HP4	CSM79HP4
ESM5ME9U	ESM5ME9U
HSM6XRVK	HSM6XRVK
HSM67VD2	HSM67VD2
HSMA33OT	HSMA33OT
MSM6J2RS	MSM6J2RS
MSM9VZMI	MSM9VZMI
PSM7J154	PSM7J154
PSM7J182	PSM7J182
PSMA264U	PSMA264U

This is the command I have used at first:

biobakery_workflows wmgx_wmtx --input-metagenome $INPUT_PATH/HMP2_samples_metagenomics --input-metatranscriptome $INPUT_PATH/HMP2_samples_metatranscriptomics --input-mapping $INPUT_PATH/mapping_samples_file.tsv --output $OUTPUT_PATH --threads 10 --local-jobs 10 --pair-identifier _R1 --qc-options="--trimmomatic ~/miniconda3/envs/$ENV_NAME/share/trimmomatic-0.39-2/ -db $INPUT_PATH/biobakery_workflows_databases/kneaddata_db/human_genome_bowtie2 -db $INPUT_PATH/biobakery_workflows_databases/kneaddata_db/human_transcriptome_bowtie2 -db $INPUT_PATH/biobakery_workflows_databases/kneaddata_db/ribosomal_RNA_bowtie2" --remove-intermediate-output --bypass-strain-profiling

However, I got the following error message:

  Error executing action 0. Original Exception: 
  Traceback (most recent call last):
    File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/runners.py", line 201, in _run_task_locally
      action_func(task)
    File "/~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/helpers.py", line 89, in actually_sh
      ret = _sh(s, **kwargs)
    File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
      raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
  anadama2.util.ShellException: [Errno 1] Command `rna_dna_norm.py --input-dna $OUTPUT_PATH/whole_metagenome_shotgun/humann/merged/pathabundance.tsv --input-rna $OUTPUT_PATH/whole_metatranscriptome_shotgun/humann/merged/pathabundance.tsv --output $OUTPUT_PATH/humann/rna_dna_norm/paths --reduce-sample-name --mapping $INPUT_PATH/mapping_samples_file.tsv' failed. 
  Out: b'Reading RNA table\nReading DNA table\n'
  Err: b'The rna/dna sample names do not match. Please check the formatting of the mapping file.\n'

As mentioned above, the run fails when calling the rna_dna_norm.py command.
According to the error message, I thought that the problem was with the mapping file. I checked it, and I don’t think it had any mistakes. It is a tab delimited file and matches exactly the paired metagenome and metatranscriptome samples (I do have paired end metagenome samples if it matters).
At any rate, I realized I might not need the mapping file.
The order of the samples’ names in the tables of ecs/genefamilies/pathabundance.tsv in the metagenome and metatranscriptome analyses are the same. These are the inputs for rna_dna_norm.py and since the order of the samples and names match, I thought the mapping file was unnecessary.

So I ran the command again, this time without the mapping table.
However, I got the following error message:

Error executing action 0. Original Exception: 
  Traceback (most recent call last):
    File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/runners.py", line 201, in _run_task_locally
      action_func(task)
    File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/helpers.py", line 89, in actually_sh
      ret = _sh(s, **kwargs)
    File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
      raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
  anadama2.util.ShellException: [Errno 1] Command `rna_dna_norm.py --input-dna $OUTPUT_PATH/whole_metagenome_shotgun/humann/merged/genefamilies.tsv --input-rna $OUTPUT_PATH/whole_metatranscriptome_shotgun/humann/merged/genefamilies.tsv --output $OUTPUT_PATH/humann/rna_dna_norm/genes --reduce-sample-name' failed. 
  Out: b'Reading RNA table\nReading DNA table\nCompute unstratified features\nNormalize DNA\nNormalize RNA\nCompute stratified features\nNormalize DNA\nNormalize RNA\nCompute only classified features\nNormalize DNA\nNormalize RNA\nWriting unstratified table\n'
  Err: b'Traceback (most recent call last):\n  File "~/miniconda3/envs/$ENV_NAME/bin/rna_dna_norm.py", line 272, in <module>\n    main()\n  File "~/miniconda3/envs/$ENV_NAME/bin/rna_dna_norm.py", line 258, in main\n    output_unstrat_file)\n  File "~/miniconda3/envs/$ENV_NAME/bin/rna_dna_norm.py", line 180, in write_file\n    file_handle.write("\\t".join(column_labels)+"\\n")\nTypeError: a bytes-like object is required, not \'str\'\n'

I presume that I’ll have to debug the code in order to find out what’s the problem, but before I do that I thought I’d ask you whether this is a known issue or whether I am doing something wrong.

As for my second question - Why does the metagnome analysis run together with metatranscriptome analysis? Is there a way to run the metranscriptome analysis without any metagenome samples?

hilasha2 · December 6, 2021, 5:16pm

I think I found one problem,
In line 179 in the file biobakery_workflows/scripts/rna_dna_norm.py, in the function:

def write_file(column_labels, row_labels, data, file):

You open a file to write the results to with the mode ‘wb’ (binary format) instead of ‘w’:

with open(file, "wb") as file_handle:

I changed this line to:

with open(file, "w") as file_handle:

I did manage to run the workflow that way.
However, I got tables with many NaNs. I presume it’s a result of a division of 0/0. I also got Inf, and again I presume it’s the result of diving a number by zero. Is that correct?

For the pathways file I get:

While the gene families file looks like this:

(samples were taken from the HMP2 database).

I hope that these results are typical.

Topic		Replies	Views
Wmgx_wmtx visualization rna_dna_norm.py failure bioBakery workflows	0	260	July 15, 2022
Error in rna_dna_norm.py code when mapping samples during wmgx_wmtx worfklow bioBakery workflows	0	224	July 19, 2022
Metatranscriptomic pre-processing bioBakery workflows	3	332	June 13, 2024
Errors from running demo data bioBakery workflows	0	357	April 15, 2021
BioBakery 3 tutorial questions bioBakery workflows	7	2189	April 29, 2021

Wmgx_wmtx worfklow rna_dna_norm.py failure + why is metagenome needed in metatranscriptome analysis?

Related topics