Disk space to run Humann (v3.0.0.alpha.3)

lsaona · July 21, 2020, 7:16pm

Hi!

I am trying to run Humann3 using:
MetaPhlAn version 3.0.1 (25 Jun 2020)
Bowtie2 version 2.3.5.1

When I ran the pipeline using demo.fastq file the process is complete without problem but now I am using my own data and the process is killed while bowtie2 is running. Is a disk space problem but I have near to 500 gb of free space. How much space I need to run Humann?

The command I used is:

humann --threads 16 -i myfastq.fastq.gz -o humann_results/

(myfast.fastq.gz = ca. 16 gb)

The files created before the process is killed are:

-rw-rw-r-- 1 ubuntu ubuntu 753 Jul 21 15:31 35_merge.log
-rw-rw-r-- 1 ubuntu ubuntu 94636370802 Jul 17 20:52 35_merge_bowtie2_aligned.sam
-rw-rw-r-- 1 ubuntu ubuntu 699428317 Jul 21 14:22 35_merge_bowtie2_aligned.tsv
-rw-rw-r-- 1 ubuntu ubuntu 33883528 Jul 17 17:38 35_merge_bowtie2_index.1.bt2
-rw-rw-r-- 1 ubuntu ubuntu 12709036 Jul 17 17:38 35_merge_bowtie2_index.2.bt2
-rw-rw-r-- 1 ubuntu ubuntu 515843 Jul 17 17:38 35_merge_bowtie2_index.3.bt2
-rw-rw-r-- 1 ubuntu ubuntu 12709029 Jul 17 17:38 35_merge_bowtie2_index.4.bt2
-rw-rw-r-- 1 ubuntu ubuntu 33883528 Jul 17 17:39 35_merge_bowtie2_index.rev.1.bt2
-rw-rw-r-- 1 ubuntu ubuntu 12709036 Jul 17 17:39 35_merge_bowtie2_index.rev.2.bt2
-rw-rw-r-- 1 ubuntu ubuntu 32357776654 Jul 17 21:49 35_merge_bowtie2_unaligned.fa
-rw-rw-r-- 1 ubuntu ubuntu 63596850 Jul 17 17:38 35_merge_custom_chocophlan_database.ffn
-rw-rw-r-- 1 ubuntu ubuntu 18442082 Jul 17 17:37 35_merge_metaphlan_bowtie2.txt
-rw-rw-r-- 1 ubuntu ubuntu 8841 Jul 17 17:38 35_merge_metaphlan_bugs_list.tsv

Besides a directory called tmp844ry3z_ is created with huge files:
-rw------- 1 ubuntu ubuntu 23 Jul 17 17:38 bowtie2_stderr_gz0mtlu9
-rw------- 1 ubuntu ubuntu 5850 Jul 17 17:39 bowtie2_stdout_n5w2u0yg
-rw------- 1 ubuntu ubuntu 1844074034 Jul 17 21:49 temp_alignmentsz5gocnra
-rw------- 1 ubuntu ubuntu 84166608864 Jul 17 15:58 tmp1kqg2zip
-rw------- 1 ubuntu ubuntu 84394629336 Jul 17 15:45 tmpr7r_f6w6

Thanks!!

lauren.j.mciver · July 22, 2020, 4:20pm

Hello, Thanks for the detailed file listing! It looks like the run you posted ran through the nucleotide search portion with bowtie2 as I see the *_aligned.[sam/tsv] files but likely got stuck in the next step in the workflow. I think those large tmp output files that you see are from the translated search portion with diamond. With a compressed input file at 16gb that is a lot of reads which is great! However, you might have a lot of alignments in the translated search portion of the run which could make those files very large (looks like they are ~80Gb). With 500gb of disk space if you would just run a few jobs at a time and also set the option to remove the intermediate output files, --remove-temp-output, if you do not need these intermediate alignment files for future reference, this should solve the issue with running out of disk space.

Thanks,
Lauren

lsaona · July 22, 2020, 11:01pm

Hi Lauren thank you for your answer. Unfortunatelly I achieve the same error.
I used:
humann --threads 16 -i myfastq.fastq.gz -o humann_results/ --remove-temp-output

and this I see on the screen:

Output files will be written to: /home/ubuntu/sample/35/humann_results
Decompressing gzipped file …

Removing spaces from identifiers in input file …

Running metaphlan …

Found g__Dactylococcopsis.s__Dactylococcopsis_salina : 54.70% of mapped reads
Found g__Paraburkholderia.s__Paraburkholderia_fungorum : 10.53% of mapped reads
Found g__Halorubrum.s__Halorubrum_sp_AJ67 : 8.76% of mapped reads
Found g__Paraburkholderia.s__Paraburkholderia_insulsa : 8.30% of mapped reads
Found g__Halorubrum.s__Halorubrum_tebenquichense : 7.33% of mapped reads
Found g__Cutibacterium.s__Cutibacterium_acnes : 7.09% of mapped reads
Found g__Halorubrum.s__Halorubrum_hochstenium : 1.09% of mapped reads
Found g__Phormidium.s__Phormidium_willei : 0.64% of mapped reads
Found g__Phormidium.s__Phormidium_sp_OSCR : 0.62% of mapped reads
Found g__Coleofasciculus.s__Coleofasciculus_chthonoplastes : 0.51% of mapped reads
Found g__Halothece.s__Halothece_sp_PCC_7418 : 0.45% of mapped reads

Total species selected from prescreen: 11

Selected species explain 100.00% of predicted community composition

Creating custom ChocoPhlAn database …

Running bowtie2-build …

Running bowtie2 …

Killed.

Despite I used --remove-temp-outout, the temp files anyway were created:
$ ls humann_results/
35_merge_humann_temp_mir4tggp

$ ls -l humann_results/35_merge_humann_temp_mir4tggp/
total 130091300
-rw-rw-r-- 1 ubuntu ubuntu 11895 Jul 22 21:47 35_merge.log
-rw-rw-r-- 1 ubuntu ubuntu 94636370834 Jul 22 21:17 35_merge_bowtie2_aligned.sam
-rw-rw-r-- 1 ubuntu ubuntu 6030850248 Jul 22 21:42 35_merge_bowtie2_aligned.tsv
-rw-rw-r-- 1 ubuntu ubuntu 33883528 Jul 22 20:50 35_merge_bowtie2_index.1.bt2
-rw-rw-r-- 1 ubuntu ubuntu 12709036 Jul 22 20:50 35_merge_bowtie2_index.2.bt2
-rw-rw-r-- 1 ubuntu ubuntu 515843 Jul 22 20:50 35_merge_bowtie2_index.3.bt2
-rw-rw-r-- 1 ubuntu ubuntu 12709029 Jul 22 20:50 35_merge_bowtie2_index.4.bt2
-rw-rw-r-- 1 ubuntu ubuntu 33883528 Jul 22 20:51 35_merge_bowtie2_index.rev.1.bt2
-rw-rw-r-- 1 ubuntu ubuntu 12709036 Jul 22 20:51 35_merge_bowtie2_index.rev.2.bt2
-rw-rw-r-- 1 ubuntu ubuntu 32357779725 Jul 22 22:13 35_merge_bowtie2_unaligned.fa
-rw-rw-r-- 1 ubuntu ubuntu 63596850 Jul 22 20:50 35_merge_custom_chocophlan_database.ffn
-rw-rw-r-- 1 ubuntu ubuntu 18442082 Jul 22 20:49 35_merge_metaphlan_bowtie2.txt
-rw-rw-r-- 1 ubuntu ubuntu 8878 Jul 22 20:50 35_merge_metaphlan_bugs_list.tsv
drwx------ 2 ubuntu ubuntu 137 Jul 22 21:47 tmpkkhj2mqe

$ ls -l humann_results/35_merge_humann_temp_mir4tggp/tmpkkhj2mqe/
total 166411460
-rw------- 1 ubuntu ubuntu 23 Jul 22 20:50 bowtie2_stderr_ughmyd5s
-rw------- 1 ubuntu ubuntu 5904 Jul 22 20:51 bowtie2_stdout_121156w0
-rw------- 1 ubuntu ubuntu 1844080034 Jul 22 22:13 temp_alignmentsxfvg2ax5
-rw------- 1 ubuntu ubuntu 84394629336 Jul 22 19:35 tmp9vlxgji_
-rw------- 1 ubuntu ubuntu 84166608864 Jul 22 19:49 tmpo9mtdn_f

Can you help me?
Thank you!

lsaona · July 22, 2020, 11:05pm

Maybe is not a disk space problem since after the process is killed I still have space:

$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/nvme1n1 1048064000 610119504 437944496 59% /home/ubuntu/sample

lauren.j.mciver · July 23, 2020, 4:06pm

Hi - Thanks for the follow up info! I agree with you in that I don’t think it is a disk space issue. I think possibly your run is being killed because it is running out of memory when it is processing the bowtie2 results. If you are running a couple runs at once try just running one at a time and see if this helps!

Thanks,
Lauren

Topic		Replies	Views
Humann program running was killed in the middle way HUMAnN	5	696	August 6, 2020
Humann3 hanging during run, no error messages but no changes to output, log or temp files HUMAnN	20	969	October 14, 2021
Humann2 failing after temp files produced HUMAnN	59	3699	December 16, 2020
Bowtie2 unaligned reads slow HUMAnN	14	2029	November 8, 2024
Humann3 running indefinitely! HUMAnN	6	175	August 29, 2024

Disk space to run Humann (v3.0.0.alpha.3)

Related topics