Metaphlan step not working on cluster: CRITICAL ERROR

Hello,

I installed humann3 with conda on our cluster. I tested it on the logging server and it worked fine. However, when I try to run the job on the cluster, it fails at the metaphlan step:


12/08/2020 12:20:41 AM - humann.utilities - INFO: Execute command: /trinity/home/miniconda3/envs/metagenomes/bin/metaphlan /mnt/beegfs/PhlanHumann3_out/BAB01a_humann_temp/tmpfp_bnaq_/tmpgoz6i7lw -t rel_ab -o /mnt/beegfs/PhlanHumann3_out/BAB01a_humann_temp/BAB01a_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /mnt/beegfs/PhlanHumann3_out/BAB01a_humann_temp/BAB01a_metaphlan_bowtie2.txt --nproc 25
12/08/2020 12:22:49 AM - humann.utilities - CRITICAL: Error executing: /trinity/home/miniconda3/envs/metagenomes/bin/metaphlan /mnt/beegfs/PhlanHumann3_out/BAB01a_humann_temp/tmpfp_bnaq_/tmpgoz6i7lw -t rel_ab -o /mnt/beegfs/PhlanHumann3_out/BAB01a_humann_temp/BAB01a_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /mnt/beegfs/PhlanHumann3_out/BAB01a_humann_temp/BAB01a_metaphlan_bowtie2.txt --nproc 25

Error message returned from metaphlan :
Traceback (most recent call last):
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/urllib/request.py”, line 1350, in do_open
encode_chunked=req.has_header(‘Transfer-encoding’))
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/http/client.py”, line 1262, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/http/client.py”, line 1308, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/http/client.py”, line 1257, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/http/client.py”, line 1028, in _send_output
self.send(msg)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/http/client.py”, line 968, in send
self.connect()
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/http/client.py”, line 940, in connect
(self.host,self.port), self.timeout, self.source_address)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/socket.py”, line 728, in create_connection
raise err
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/socket.py”, line 716, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/trinity/home/miniconda3/envs/metagenomes/bin/metaphlan”, line 10, in
sys.exit(main())
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/site-packages/metaphlan/metaphlan.py”, line 925, in main
pars[‘index’] = check_and_install_database(pars[‘index’], pars[‘bowtie2db’], pars[‘bowtie2_build’], pars[‘nproc’], pars[‘force_download’])
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/site-packages/metaphlan/init.py”, line 258, in check_and_install_database
if urllib.request.urlopen(“http://cmprod1.cibio.unitn.it/biobakery3/metaphlan_databases/mpa_latest”).getcode() != 200:
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/urllib/request.py”, line 222, in urlopen
return opener.open(url, data, timeout)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/urllib/request.py”, line 525, in open
response = self._open(req, data)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/urllib/request.py”, line 543, in _open
‘_open’, req)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/urllib/request.py”, line 503, in _call_chain
result = func(*args)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/urllib/request.py”, line 1378, in http_open
return self.do_open(http.client.HTTPConnection, req)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/urllib/request.py”, line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>

12/08/2020 12:22:49 AM - humann.utilities - CRITICAL: TRACEBACK:
Traceback (most recent call last):
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/site-packages/humann/utilities.py”, line 744, in execute_command
p_out = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/subprocess.py”, line 411, in check_output
**kwargs).stdout
File “/trinity/home/miniconda3/envs/metagenomes/lib/python3.7/subprocess.py”, line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command ‘[’/trinity/home/miniconda3/envs/metagenomes/bin/metaphlan’, ‘/mnt/beegfs/PhlanHumann3_out/BAB01a_humann_temp/tmpfp_bnaq_/tmpgoz6i7lw’, ‘-t’, ‘rel_ab’, ‘-o’, ‘/mnt/beegfs/PhlanHumann3_out/BAB01a_humann_temp/BAB01a_metaphlan_bugs_list.tsv’, ‘–input_type’, ‘fastq’, ‘–bowtie2out’, ‘/mnt/beegfs/PhlanHumann3_out/BAB01a_humann_temp/BAB01a_metaphlan_bowtie2.txt’, ‘–nproc’, ‘25’]’ returned non-zero exit status 1.

The same files work perfectly well on the login server, I also tried to run the exact command from the log file and it worked. But it doesn’t on the cluster and I do not know how to deal with this issue.

Thanks in advance for your help,
G

If you already have the MetaPhlAn database downloaded and built, you can execute HUMAnN using metaphlan_options set with --index mpa_v30_CHOCOPhlAn_201901 and this will skip the check for a new version.

Thanks for your help, it works now :).
However, I still don’t understand why it works locally but not on the cluster? There is no need to specify the database when I run it locally.

The downloaded files are only called chocophlan, without any version. So without you telling me the name of the database, where can I see the version that I’m using? Thanks a lot.

Probably has downloaded the database and built it, the name is automatically resolved in order to have the latest version.
The files named only chocophlan are the HUMAnN databases, the MetaPhlAn one is named mpa_v30_CHOCOPhlAn_201901, you can find it under the metaphlan_databases folder located in /trinity/home/miniconda3/envs/metagenomes/lib/python3.7/site-packages/metaphlan/.

I experience the same error using HUMAnN 3 and MetaPhlAn 3. It appears that --metaphlan_options and --index don’t skip the version check. How to avoid the error?

$ humann --metaphlan-options '--index /verona/biostat/databases/bacteria/mpa_v30_CHOCOPhlAn_201901' --input /verona//biostat/datasets/HeadNeckCancer/DNAseq/Oral_1-N_unmapped_R1.fastq.gz --output /tmp/ --taxonomic-profile /verona/biostat/datasets/HeadNeckCancer/DNAseq/Oral_1-Nmetagenome.txt
Output files will be written to: /tmp
Decompressing gzipped file ...
CRITICAL ERROR: The directory provided for ChocoPhlAn contains files ( mpa_latest ) that are not of the expected version. Please install the latest version of the database: 201901b

Why does HUMAnN 3 demand database version 201901b? If most users will execute HUMAnN after MetaPhlAn, should the MetaPhlAn user guide recommend to download the full 16 GB database the first time to work in both scenarios?

--index should report just the name of the database, its location should specified using --bowtie2db.

201901b is an updated version of the 201901 database, maybe @franzosa can add more here, but I don’t see the point of recommending downloading the HUMAnN database in the MetaPhlAn user guide.

Agreed with @fbeghini - many users likely wish to use MetaPhlAn but not HUMAnN, so I would not put HUMAnN recommendations in the MetaPhlAn documentation (whereas HUMAnN is more dependent on MetaPhlAn). The database version check is important because changes in the HUMAnN software are tied to changes in the underlying database, and you could experience unexpected behaviors if the software and databases were mismatched.