Hi there,
I have been running Humann3 and all seems to have been working great until about a week ago (who knows what the naughty coding fairies must have done ).
Humann 3 is running, but now takes super long and generates multiple Bowtie 2 index files. This is how I have been running the Humann portion of my batch jobs:
humann --protein-database /projects/emye7956/software/anaconda/envs/humann_env/uniref \
--nucleotide-database /projects/emye7956/software/anaconda/envs/humann_env/chocophlan/ \
--input "$fpathc" \
--output "$output_dir" -v && echo "ALL DONE WITH ${foutput} AT LAST :D"
The metaphlan databases I have are in /projects/emye7956/software/anaconda/envs/humann_env/lib/python3.7/site-packages/metaphlan/metaphlan_databases
And look like this:
mpa_latest mpa_vOct22_CHOCOPhlAnSGB_202212.pkl
mpa_vOct22_CHOCOPhlAnSGB_202212.1.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212.rev.1.bt2l
mpa_vOct22_CHOCOPhlAnSGB_202212.2.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212.rev.2.bt2l
mpa_vOct22_CHOCOPhlAnSGB_202212.3.bt2l mpa_vOct22_CHOCOPhlAnSGB_202212_VINFO.csv
mpa_vOct22_CHOCOPhlAnSGB_202212.4.bt2l README.txt
mpa_vOct22_CHOCOPhlAnSGB_202212.fna
An example of an output temp dir for a file that ran to completion but took half a day looks like this (note the multiple bowtie2 index files that take long to run):
MG773_humann_temp:
MG773_bowtie2_aligned.sam
MG773_bowtie2_aligned.tsv
MG773_bowtie2_index.1.bt2
MG773_bowtie2_index.2.bt2
MG773_bowtie2_index.3.bt2
MG773_bowtie2_index.4.bt2
MG773_bowtie2_index.rev.1.bt2
MG773_bowtie2_index.rev.2.bt2
MG773_custom_chocophlan_database.ffn
MG773_cleancombined.log
MG773_metaphlan_bowtie2.txt
MG773_metaphlan_bugs_list.tsv
And my config file looks like this:
[database_folders]
nucleotide = data/chocophlan_DEMO
protein = data/uniref_DEMO
utility_mapping = data/misc
[run_modes]
resume = True
verbose = False
bypass_prescreen = False
bypass_nucleotide_index = False
bypass_nucleotide_search = False
bypass_translated_search = False
threads = 40
[alignment_settings]
evalue_threshold = 1.0
prescreen_threshold = 0.01
translated_subject_coverage_threshold = 50.0
translated_query_coverage_threshold = 90.0
nucleotide_subject_coverage_threshold = 50.0
nucleotide_query_coverage_threshold = 90.0
[output_format]
output_max_decimals = 10
remove_stratified_output = False
remove_column_description_output = False
Any help would be very much appreciated!
Thanks so much in advance