I have two questions -
- I have tried to run the metagenome-metatranscriptome (wmgx_wmtx) workflow in biobakery_worfklows.
The workflow fails whenever it reaches to the pipeline rna_dna_norm.py.
I ran the worfklow on samples from the human microbiome project 2. I used some metatranscriptome samples and matching metagenome samples.
I ran the workflow once using a tsv mapping file matching the metagenome and metranscriptome samples (I tried to follow closely the tutorial):
# wts wms
CSM67UGO CSM67UGO
CSM79HI3 CSM79HI3
CSM79HP4 CSM79HP4
ESM5ME9U ESM5ME9U
HSM6XRVK HSM6XRVK
HSM67VD2 HSM67VD2
HSMA33OT HSMA33OT
MSM6J2RS MSM6J2RS
MSM9VZMI MSM9VZMI
PSM7J154 PSM7J154
PSM7J182 PSM7J182
PSMA264U PSMA264U
This is the command I have used at first:
biobakery_workflows wmgx_wmtx --input-metagenome $INPUT_PATH/HMP2_samples_metagenomics --input-metatranscriptome $INPUT_PATH/HMP2_samples_metatranscriptomics --input-mapping $INPUT_PATH/mapping_samples_file.tsv --output $OUTPUT_PATH --threads 10 --local-jobs 10 --pair-identifier _R1 --qc-options="--trimmomatic ~/miniconda3/envs/$ENV_NAME/share/trimmomatic-0.39-2/ -db $INPUT_PATH/biobakery_workflows_databases/kneaddata_db/human_genome_bowtie2 -db $INPUT_PATH/biobakery_workflows_databases/kneaddata_db/human_transcriptome_bowtie2 -db $INPUT_PATH/biobakery_workflows_databases/kneaddata_db/ribosomal_RNA_bowtie2" --remove-intermediate-output --bypass-strain-profiling
However, I got the following error message:
Error executing action 0. Original Exception:
Traceback (most recent call last):
File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/runners.py", line 201, in _run_task_locally
action_func(task)
File "/~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/helpers.py", line 89, in actually_sh
ret = _sh(s, **kwargs)
File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
anadama2.util.ShellException: [Errno 1] Command `rna_dna_norm.py --input-dna $OUTPUT_PATH/whole_metagenome_shotgun/humann/merged/pathabundance.tsv --input-rna $OUTPUT_PATH/whole_metatranscriptome_shotgun/humann/merged/pathabundance.tsv --output $OUTPUT_PATH/humann/rna_dna_norm/paths --reduce-sample-name --mapping $INPUT_PATH/mapping_samples_file.tsv' failed.
Out: b'Reading RNA table\nReading DNA table\n'
Err: b'The rna/dna sample names do not match. Please check the formatting of the mapping file.\n'
As mentioned above, the run fails when calling the rna_dna_norm.py command.
According to the error message, I thought that the problem was with the mapping file. I checked it, and I don’t think it had any mistakes. It is a tab delimited file and matches exactly the paired metagenome and metatranscriptome samples (I do have paired end metagenome samples if it matters).
At any rate, I realized I might not need the mapping file.
The order of the samples’ names in the tables of ecs/genefamilies/pathabundance.tsv in the metagenome and metatranscriptome analyses are the same. These are the inputs for rna_dna_norm.py and since the order of the samples and names match, I thought the mapping file was unnecessary.
So I ran the command again, this time without the mapping table.
However, I got the following error message:
Error executing action 0. Original Exception:
Traceback (most recent call last):
File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/runners.py", line 201, in _run_task_locally
action_func(task)
File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/helpers.py", line 89, in actually_sh
ret = _sh(s, **kwargs)
File "~/miniconda3/envs/$ENV_NAME/lib/python3.7/site-packages/anadama2/util/__init__.py", line 320, in sh
raise ShellException(proc.returncode, msg.format(cmd, ret[0], ret[1]))
anadama2.util.ShellException: [Errno 1] Command `rna_dna_norm.py --input-dna $OUTPUT_PATH/whole_metagenome_shotgun/humann/merged/genefamilies.tsv --input-rna $OUTPUT_PATH/whole_metatranscriptome_shotgun/humann/merged/genefamilies.tsv --output $OUTPUT_PATH/humann/rna_dna_norm/genes --reduce-sample-name' failed.
Out: b'Reading RNA table\nReading DNA table\nCompute unstratified features\nNormalize DNA\nNormalize RNA\nCompute stratified features\nNormalize DNA\nNormalize RNA\nCompute only classified features\nNormalize DNA\nNormalize RNA\nWriting unstratified table\n'
Err: b'Traceback (most recent call last):\n File "~/miniconda3/envs/$ENV_NAME/bin/rna_dna_norm.py", line 272, in <module>\n main()\n File "~/miniconda3/envs/$ENV_NAME/bin/rna_dna_norm.py", line 258, in main\n output_unstrat_file)\n File "~/miniconda3/envs/$ENV_NAME/bin/rna_dna_norm.py", line 180, in write_file\n file_handle.write("\\t".join(column_labels)+"\\n")\nTypeError: a bytes-like object is required, not \'str\'\n'
I presume that I’ll have to debug the code in order to find out what’s the problem, but before I do that I thought I’d ask you whether this is a known issue or whether I am doing something wrong.
- As for my second question - Why does the metagnome analysis run together with metatranscriptome analysis? Is there a way to run the metranscriptome analysis without any metagenome samples?