Humann in two steps

Hello:

I tried to run biobakery workflows and it worked with kneadData and MetaPhlAn. HUMAnN gave me an error inside workflows, so I tried to run the program separately.

When I used MetaPhlAn sam and tsv files I received all UNMAPED.

humann --input EC1_bowtie2.sam --taxonomic-profile EC1_taxonomic_profile.tsv --output /media/microviable/e/EC1/humann --threads 4 --o-log /media/microviable/e/EC1/humann/EC1.log
Creating output directory: /media/microviable/e/EC1/humann
Output files will be written to: /media/microviable/e/EC1/humann
Process the sam mapping results …
Computing gene families …
Computing pathways abundance and coverage …
Output files created:
/media/microviable/e/EC1/humann/EC1_bowtie2_genefamilies.tsv
/media/microviable/e/EC1/humann/EC1_bowtie2_pathabundance.tsv
/media/microviable/e/EC1/humann/EC1_bowtie2_pathcoverage.tsv

When I use kneadData fastq, I received another error

humann --input EC1.fastq --output /media/microviable/e/EC1/humann --threads 4 --o-log /media/microviable/e/EC1/humann/EC1.log
Output files will be written to: /media/microviable/e/EC1/humann
Running metaphlan …
Found g__Lachnospiraceae_unclassified.s__Eubacterium_rectale : 34.83% of mapped reads

Total species selected from prescreen: 71
Selected species explain 99.94% of predicted community composition
Creating custom ChocoPhlAn database …
Running bowtie2-build …
Running bowtie2 …
Total bugs from nucleotide alignment: 71
g__Roseburia.s__Roseburia_intestinalis: 81376 hits

Total gene families from nucleotide alignment: 145662
Unaligned reads after nucleotide alignment: 45.3554444501 %
Running diamond …
Aligning to reference database: uniref90_201901b_full.dmnd
CRITICAL ERROR: Error executing: /home/microviable/miniconda3/envs/humann3/bin/diamond blastx --query /media/microviable/e/EC1/humann/EC1_humann_temp/EC1_bowtie2_unaligned.fa --evalue 1.0 --threads 4 --top 1 --outfmt 6 --db /media/microviable/e/biobakery_workflows_databases/humann/uniref/uniref90_201901b_full --out /media/microviable/e/EC1/humann/EC1_humann_temp/tmpo2hu4iuj/diamond_m8_4nfyskk2 --tmpdir /media/microviable/e/EC1/humann/EC1_humann_temp/tmpo2hu4iuj
Error message returned from diamond :
diamond v0.9.36.137 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
#CPU threads: 4
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: /media/microviable/e/EC1/humann/EC1_humann_temp/tmpo2hu4iuj
Opening the database… [0.562s]
Percentage range of top alignment score to report hits: 1
Reference = /media/microviable/e/biobakery_workflows_databases/humann/uniref/uniref90_201901b_full.dmnd
Sequences = 87296736
Letters = 29247941583
Block size = 2000000000
Opening the input file… [0.33s]
Opening the output file… [0.001s]
Loading query sequences… [16.939s]
Masking queries… [23.073s]
Building query seed set… [0.057s]
Algorithm: Double-indexed
Building query histograms… [6.458s]
Allocating buffers… [0s]
Loading reference sequences… [0s]
Error: Unexpected end of input.

So I tried to use sam and tsv files from this process and it worked.

I don’t know what is the problem.

Any help?

The first process you’re describing, i.e. using the MetaPhlAn SAM output as input to HUMAnN, is not going to do what you want. The MetaPhlAn SAM output only includes mappings of reads to MetaPhlAn marker genes, but for HUMAnN you want to be starting with a SAM file that covers ALL your reads (assuming you are starting HUMAnN with SAM input, which is mostly for 1. restarting a HUMAnN run that already finished mapping your reads to pangenomes or 2. using HUMAnN to process the alignment output of another program - both atypical use cases).

The second run (starting from FASTQ) looks OK in principle though. Are you sure the input file is OK / not corrupted? Are you able to run the HUMAnN demo successfully? The latter would be a good test that your software installation is working OK.

demo gives me the same error:

humann -i demo.fastq.gz -o sample_results
Creating output directory: /media/microviable/e/Bike/sample_results
Output files will be written to: /media/microviable/e/Bike/sample_results
Decompressing gzipped file …

Running metaphlan …

Found g__Bacteroides.s__Bacteroides_dorei : 57.96% of mapped reads
Found g__Bacteroides.s__Bacteroides_vulgatus : 42.04% of mapped reads

Total species selected from prescreen: 2

Selected species explain 100.00% of predicted community composition

Creating custom ChocoPhlAn database …

Running bowtie2-build …

Running bowtie2 …

Total bugs from nucleotide alignment: 2
g__Bacteroides.s__Bacteroides_vulgatus: 1274 hits
g__Bacteroides.s__Bacteroides_dorei: 1318 hits

Total gene families from nucleotide alignment: 548

Unaligned reads after nucleotide alignment: 87.6571428571 %

Running diamond …

Aligning to reference database: uniref90_201901b_full.dmnd

CRITICAL ERROR: Error executing: /home/microviable/miniconda3/envs/humann3/bin/diamond blastx --query /media/microviable/e/Bike/sample_results/demo_humann_temp/demo_bowtie2_unaligned.fa --evalue 1.0 --threads 1 --top 1 --outfmt 6 --db /media/microviable/e/biobakery_workflows_databases/humann/uniref/uniref90_201901b_full --out /media/microviable/e/Bike/sample_results/demo_humann_temp/tmpy1jh_2do/diamond_m8_4pm0nhda --tmpdir /media/microviable/e/Bike/sample_results/demo_humann_temp/tmpy1jh_2do

Error message returned from diamond :
diamond v0.9.36.137 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org

#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: /media/microviable/e/Bike/sample_results/demo_humann_temp/tmpy1jh_2do
Opening the database… [0.554s]
Percentage range of top alignment score to report hits: 1
Reference = /media/microviable/e/biobakery_workflows_databases/humann/uniref/uniref90_201901b_full.dmnd
Sequences = 87296736
Letters = 29247941583
Block size = 2000000000
Opening the input file… [0.003s]
Opening the output file… [0s]
Loading query sequences… [0.018s]
Masking queries… [0.159s]
Building query seed set… [0.011s]
Algorithm: Double-indexed
Building query histograms… [0.029s]
Allocating buffers… [0s]
Loading reference sequences… [0s]
Error: Unexpected end of input.