Problem in new biobakery workflows installing

Hello.

I am trying to reinstall biobakery workflows with this code:

conda create --name biobakery
conda activate biobakery
conda install -c biobakery biobakery_workflows
conda install -c biobakery leveldb    ### This did not solved leveldb dependency lack
pip install leveldb  ### This did
biobakery_workflows wmgx --help

The last code generated next output:

sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 1
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 2
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 3
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 4
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 5
sinfo: error: get_addr_info: getaddrinfo() fAiled: Temporary failure in name resolution
sinfo: error: slurm_set_addr: Unable to resolve "localcluster"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: Resource temporarily unavailable
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 1
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 2
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 3
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 4
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 5
sinfo: error: get_addr_info: getaddrinfo() fAiled: Temporary failure in name resolution
sinfo: error: slurm_set_addr: Unable to resolve "localcluster"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: Resource temporarily unavailable
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 1
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 2
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 3
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 4
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 5
sinfo: error: get_addr_info: getaddrinfo() fAiled: Temporary failure in name resolution
sinfo: error: slurm_set_addr: Unable to resolve "localcluster"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: Resource temporarily unavailable
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 1
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 2
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 3
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 4
sinfo: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 5
sinfo: error: get_addr_info: getaddrinfo() fAiled: Temporary failure in name resolution
sinfo: error: slurm_set_addr: Unable to resolve "localcluster"
sinfo: error: Unable to establish control machine address
slurm_load_partitions: Resource temporarily unavailable
usage: wmgx.py [-h] [--version]
               [--input-extension {fastq.gz,fastq,fq.gz,fq,fasta,fasta.gz,fastq.bz2,fq.bz2,bam}]
               [--barcode-file BARCODE_FILE]
               [--dual-barcode-file DUAL_BARCODE_FILE]
               [--index-identifier INDEX_IDENTIFIER]
               [--min-pred-qc-score MIN_PRED_QC_SCORE] [--threads THREADS]
               [--pair-identifier PAIR_IDENTIFIER] [--interleaved]
               [--bypass-quality-control]
               [--contaminate-databases CONTAMINATE_DATABASES]
               [--qc-options QC_OPTIONS] [--qc-scratch QC_SCRATCH]
               [--functional-profiling-options FUNCTIONAL_PROFILING_OPTIONS]
               [--remove-intermediate-output] [--bypass-functional-profiling]
               [--bypass-strain-profiling] [--run-strain-gene-profiling]
               [--bypass-taxonomic-profiling] [--run-assembly]
               [--strain-profiling-options STRAIN_PROFILING_OPTIONS]
               [--taxonomic-profiling-options TAXONOMIC_PROFILING_OPTIONS]
               [--max-strains MAX_STRAINS] [--strain-list STRAIN_LIST]
               [--assembly-options ASSEMBLY_OPTIONS] -o OUTPUT [-i INPUT]
               [--config CONFIG] [--local-jobs JOBS] [--grid-jobs GRID_JOBS]
               [--grid GRID] [--grid-partition GRID_PARTITION]
               [--grid-benchmark {on,off}] [--grid-options GRID_OPTIONS]
               [--grid-submit-sleep GRID_SUBMIT_SLEEP]
               [--grid-environment GRID_ENVIRONMENT]
               [--grid-scratch GRID_SCRATCH] [--grid-time-max GRID_TIME_MAX]
               [--grid-mem-max GRID_MEM_MAX] [--dry-run] [--skip-nothing]
               [--quit-early] [--until-task UNTIL_TASK]
               [--exclude-task EXCLUDE_TASK] [--target TARGET]
               [--exclude-target EXCLUDE_TARGET]
               [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]

A workflow for whole metagenome shotgun sequences

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --input-extension {fastq.gz,fastq,fq.gz,fq,fasta,fasta.gz,fastq.bz2,fq.bz2,bam}
                        the input file extension
                        [default: fastq.gz]
  --barcode-file BARCODE_FILE
                        the barcode file
                        [default: ]
  --dual-barcode-file DUAL_BARCODE_FILE
                        the string to identify the dual barcode file
                        [default: ]
  --index-identifier INDEX_IDENTIFIER
                        the string to identify the index files
                        [default: _I1_001]
  --min-pred-qc-score MIN_PRED_QC_SCORE
                        the min phred quality score to use for demultiplexing
                        [default: 2]
  --threads THREADS     number of threads/cores for each task to use
                        [default: 1]
  --pair-identifier PAIR_IDENTIFIER
                        the string to identify the first file in a pair, must proceed the file extension (ie R1_001.fastq.gz)
                        [default: .R1]
  --interleaved         indicates whether or not sequence files are interleaved
                        [default: False]
  --bypass-quality-control
                        do not run the quality control tasks
  --contaminate-databases CONTAMINATE_DATABASES
                        the path (or comma-delimited paths) to the contaminate
                        reference databases for QC
                        [default: /home/microviable/biobakery_workflows_databases/kneaddata_db_human_genome]
  --qc-options QC_OPTIONS
                        additional options when running the QC step
                        [default: ]
  --qc-scratch QC_SCRATCH
                        scratch space to be used when running the QC step
                        [default: ]
  --functional-profiling-options FUNCTIONAL_PROFILING_OPTIONS
                        additional options when running the functional profiling step
                        [default: ]
  --remove-intermediate-output
                        remove intermediate output files
  --bypass-functional-profiling
                        do not run the functional profiling tasks
  --bypass-strain-profiling
                        do not run the strain profiling tasks (StrainPhlAn)
  --run-strain-gene-profiling
                        run the gene-based strain profiling tasks (PanPhlAn)
  --bypass-taxonomic-profiling
                        do not run the taxonomic profiling tasks (a tsv profile for each sequence file must be included in the input folder using the same sample name)
  --run-assembly        run the assembly and annotation tasks
  --strain-profiling-options STRAIN_PROFILING_OPTIONS
                        additional options when running the strain profiling step
                        [default: ]
  --taxonomic-profiling-options TAXONOMIC_PROFILING_OPTIONS
                        additional options when running the taxonomic profiling step
                        [default: ]
  --max-strains MAX_STRAINS
                        the max number of strains to profile
                        [default: 20]
  --strain-list STRAIN_LIST
                        input file with list of strains to profile
                        [default: ]
  --assembly-options ASSEMBLY_OPTIONS
                        additional options when running the assembly step
                        [default: ]
  -o OUTPUT, --output OUTPUT
                        Write output to this directory
  -i INPUT, --input INPUT
                        Find inputs in this directory 
                        [default: /media/microviable/g/DATOSSECUENCIACION/Script test]
  --config CONFIG       Find workflow configuration in this folder 
                        [default: only use command line options]
  --local-jobs JOBS     Number of tasks to execute in parallel locally 
                        [default: 1]
  --grid-jobs GRID_JOBS
                        Number of tasks to execute in parallel on the grid 
                        [default: 0]
  --grid GRID           Run gridable tasks on this grid type 
                        [default: slurm]
  --grid-partition GRID_PARTITION
                        Partition/queue used for gridable tasks.
                        Provide a single partition or a comma-delimited list
                        of short/long partitions with a cutoff.
                        [default: serial_requeue,serial_requeue,240]
  --grid-benchmark {on,off}
                        Benchmark gridable tasks 
                        [default: on]
  --grid-options GRID_OPTIONS
                        Grid specific options that will be applied to each grid task
  --grid-submit-sleep GRID_SUBMIT_SLEEP
                        Number of seconds to wait between job submissions on grid 
                        [default: 5]
  --grid-environment GRID_ENVIRONMENT
                        Commands that will be run before each grid task to set up environment
  --grid-scratch GRID_SCRATCH
                        The folder to write intermediate scratch files for grid jobs
  --grid-time-max GRID_TIME_MAX
                        The max time allowed for a grid task (in minutes)
  --grid-mem-max GRID_MEM_MAX
                        The max memory allowed for a grid task (in MB)
  --dry-run             Print tasks to be run but don't execute their actions 
  --skip-nothing        Run all tasks. Rerun tasks that have already been run.
  --quit-early          Stop if a task fails. By default,
                        all tasks (except sub-tasks of failed tasks) will run.
  --until-task UNTIL_TASK
                        Stop after running this task. Use task name or number.
  --exclude-task EXCLUDE_TASK
                        Don't run these tasks. Add multiple times to append.
  --target TARGET       Only run tasks that generate these targets.
                        Add multiple times to append.
                        Patterns with ? and * are allowed.
  --exclude-target EXCLUDE_TARGET
                        Don't run tasks that generate these targets.
                        Add multiple times to append.
                        Patterns with ? and * are allowed.
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Set the level of output for the log 
                        [default: INFO]

Any idea about this?

When I tried to install wmgx databases I received this

biobakery_workflows_databases --install wmgx --location /media/microviable/e/bwfdb
Installing humann utility mapping database
Download URL: http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz
CRITICAL ERROR: Unable to download and extract from URL: http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz
WARNING: Unable to install database. Error running command: humann_databases --download utility_mapping full /media/microviable/e/bwfdb/humann
Unable to find strainphlan install.

Strainphlan is installed

strainphlan --version
Mon Dec 18 17:39:20 2023: StrainPhlAn version 4.0.6 (1 Mar 2023)

I remove slurm by “sudo apt purge slurm” and the first Warning/Error dissapeared. But I still can not install the wmgx databases even without put --location param

biobakery_workflows_databases --install wmgx
Installing humann utility mapping database
Download URL: http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz
CRITICAL ERROR: Unable to download and extract from URL: http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz
WARNING: Unable to install database. Error running command: humann_databases --download utility_mapping full /home/microviable/biobakery_workflows_databases/humann
Unable to find strainphlan install.

Trying to install databases manually I received this error

kneaddata_database --download mouse_C57BL bowtie2 /media/microviable/e/bwfdb/kneaddata_db_mouse_genome
Download URL: http://huttenhower.sph.harvard.edu/kneadData_databases/mouse_C57BL_6NJ_Bowtie2_v0.1.tar.gz
CRITICAL ERROR: Unable to download and extract from URL: http://huttenhower.sph.harvard.edu/kneadData_databases/mouse_C57BL_6NJ_Bowtie2_v0.1.tar.gz

going tothe url:

Forbidden

You don’t have permission to access this resource.

Apache/2.4.52 (Ubuntu) Server at huttenhower.sph.harvard.edu Port 443

The same with humann databases

humann_databases --download uniref uniref90_ec_filtered_diamond /media/microviable/e/bwfdb/
Creating subdirectory to install database: /media/microviable/e/bwfdb/uniref
Download URL: http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_201901b_subset.tar.gz
CRITICAL ERROR: Unable to download and extract from URL: http://huttenhower.sph.harvard.edu/humann_data/uniprot/uniref_ec_filtered/uniref90_ec_filtered_201901b_subset.tar.gz

Forbidden

You don’t have permission to access this resource.

Apache/2.4.52 (Ubuntu) Server at huttenhower.sph.harvard.edu Port 443

Hello, Thank you for the detailed posts. Our database hosting server was down recently for maintenance. If you would please try the downloads again the errors should be resolved. Please post if you have any other issues!

Thanks!
Lauren

I am having a similar problem attempting to install biobakery databases. When running the command:

biobakery_workflows_databases --install wmgx

It prints:
Installing humann utility mapping database
Download URL: http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz

and then hangs. Nothing ever happens, nothing is downloaded. The process never completes (until I kill it).
I tried to wget the file:
wget http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz
with a similar result:
–2024-01-26 11:39:02-- http://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz
Resolving huttenhower.sph.harvard.edu (huttenhower.sph.harvard.edu)… 199.94.60.28
Connecting to huttenhower.sph.harvard.edu (huttenhower.sph.harvard.edu)|199.94.60.28|:80… connected.
HTTP request sent, awaiting response… 302 Found
Location: https://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz [following]
–2024-01-26 11:39:02-- https://huttenhower.sph.harvard.edu/humann_data/full_mapping_v201901b.tar.gz
Connecting to huttenhower.sph.harvard.edu (huttenhower.sph.harvard.edu)|199.94.60.28|:443… connected.
Then it hangs until I kill it.
Any assistance with this would be appreciated.
Sincerely,
Cicada Dennis
Research Technologies
Indiana University