The bioBakery help forum

Wmgx_wmtx worfklow rna_dna_norm.py failure + why is metagenome needed in metatranscriptome analysis?

I have two questions -

  1. I have tried to run the metagenome-metatranscriptome (wmgx_wmtx) workflow in biobakery_worfklows.
    The workflow fails whenever it reaches to the pipeline rna_dna_norm.py.
    I ran the worfklow on samples from the human microbiome project 2. I used some metatranscriptome samples and matching metagenome samples.
    I ran the workflow once using a tsv mapping file matching the metagenome and metranscriptome samples (I tried to follow closely the tutorial):
# wts	wms
CSM67UGO	CSM67UGO
CSM79HI3	CSM79HI3
CSM79HP4	CSM79HP4
ESM5ME9U	ESM5ME9U
HSM6XRVK	HSM6XRVK
HSM67VD2	HSM67VD2
HSMA33OT	HSMA33OT
MSM6J2RS	MSM6J2RS
MSM9VZMI	MSM9VZMI
PSM7J154	PSM7J154
PSM7J182	PSM7J182
PSMA264U	PSMA264U

This is the command I have used at first:

biobakery_workflows wmgx_wmtx --input-metagenome $INPUT_PATH/HMP2_samples_metagenomics --input-metatranscriptome $INPUT_PATH/HMP2_samples_metatranscriptomics --input-mapping $INPUT_PATH/mapping_samples_file.tsv --output $OUTPUT_PATH --threads 10 --local-jobs 10 --pair-identifier _R1 --qc-options="--trimmomatic ~/miniconda3/envs/$ENV_NAME/share/trimmomatic-0.39-2/ -db $INPUT_PATH/biobakery_workflows_databases/kneaddata_db/human_genome_bowtie2 -db $INPUT_PATH/biobakery_workflows_databases/kneaddata_db/human_transcriptome_bowtie2 -db $INPUT_PATH/biobakery_workflows_databases/kneaddata_db/ribosomal_RNA_bowtie2" --remove-intermediate-output --bypass-strain-profiling 

However, I got the following error message:

  Error executing action 0. Original Exception: 
  Traceback (most recent call last):
    File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/runners.py", line 201, in _run_task_locally
      action_func(task)
    File "/~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/helpers.py", line 89, in actually_sh
      ret = _sh(s, **kwargs)
    File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
      raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
  anadama2.util.ShellException: [Errno 1] Command `rna_dna_norm.py --input-dna $OUTPUT_PATH/whole_metagenome_shotgun/humann/merged/pathabundance.tsv --input-rna $OUTPUT_PATH/whole_metatranscriptome_shotgun/humann/merged/pathabundance.tsv --output $OUTPUT_PATH/humann/rna_dna_norm/paths --reduce-sample-name --mapping $INPUT_PATH/mapping_samples_file.tsv' failed. 
  Out: b'Reading RNA table\nReading DNA table\n'
  Err: b'The rna/dna sample names do not match. Please check the formatting of the mapping file.\n'

As mentioned above, the run fails when calling the rna_dna_norm.py command.
According to the error message, I thought that the problem was with the mapping file. I checked it, and I don’t think it had any mistakes. It is a tab delimited file and matches exactly the paired metagenome and metatranscriptome samples (I do have paired end metagenome samples if it matters).
At any rate, I realized I might not need the mapping file.
The order of the samples’ names in the tables of ecs/genefamilies/pathabundance.tsv in the metagenome and metatranscriptome analyses are the same. These are the inputs for rna_dna_norm.py and since the order of the samples and names match, I thought the mapping file was unnecessary.

So I ran the command again, this time without the mapping table.
However, I got the following error message:

Error executing action 0. Original Exception: 
  Traceback (most recent call last):
    File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/runners.py", line 201, in _run_task_locally
      action_func(task)
    File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/helpers.py", line 89, in actually_sh
      ret = _sh(s, **kwargs)
    File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
      raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
  anadama2.util.ShellException: [Errno 1] Command `rna_dna_norm.py --input-dna $OUTPUT_PATH/whole_metagenome_shotgun/humann/merged/genefamilies.tsv --input-rna $OUTPUT_PATH/whole_metatranscriptome_shotgun/humann/merged/genefamilies.tsv --output $OUTPUT_PATH/humann/rna_dna_norm/genes --reduce-sample-name' failed. 
  Out: b'Reading RNA table\nReading DNA table\nCompute unstratified features\nNormalize DNA\nNormalize RNA\nCompute stratified features\nNormalize DNA\nNormalize RNA\nCompute only classified features\nNormalize DNA\nNormalize RNA\nWriting unstratified table\n'
  Err: b'Traceback (most recent call last):\n  File "~/miniconda3/envs/$ENV_NAME/bin/rna_dna_norm.py", line 272, in <module>\n    main()\n  File "~/miniconda3/envs/$ENV_NAME/bin/rna_dna_norm.py", line 258, in main\n    output_unstrat_file)\n  File "~/miniconda3/envs/$ENV_NAME/bin/rna_dna_norm.py", line 180, in write_file\n    file_handle.write("\\t".join(column_labels)+"\\n")\nTypeError: a bytes-like object is required, not \'str\'\n'

I presume that I’ll have to debug the code in order to find out what’s the problem, but before I do that I thought I’d ask you whether this is a known issue or whether I am doing something wrong.

  1. As for my second question - Why does the metagnome analysis run together with metatranscriptome analysis? Is there a way to run the metranscriptome analysis without any metagenome samples?

I think I found one problem,
In line 179 in the file biobakery_workflows/scripts/rna_dna_norm.py, in the function:

def write_file(column_labels, row_labels, data, file):

You open a file to write the results to with the mode ‘wb’ (binary format) instead of ‘w’:

with open(file, "wb") as file_handle:

I changed this line to:

with open(file, "w") as file_handle:

I did manage to run the workflow that way.
However, I got tables with many NaNs. I presume it’s a result of a division of 0/0. I also got Inf, and again I presume it’s the result of diving a number by zero. Is that correct?

For the pathways file I get:

While the gene families file looks like this:

(samples were taken from the HMP2 database).

I hope that these results are typical.