Error in metaphlan 2.96.1 -- ValueError: invalid literal for int() with base 10: 'Traceback (most recent call last)

We have been using metaphlan 2.96.1 successfully on most samples, but on some samples it fails with the following error message

Traceback (most recent call last):
File “/data/balajiak/conda/envs/metaphlan2/bin/metaphlan2.py”, line 1442, in
metaphlan2()
File “/data/balajiak/conda/envs/metaphlan2/bin/metaphlan2.py”, line 1254, in metaphlan2
markers2reads, n_metagenome_reads = map2bbh(pars[‘inp’], pars[‘min_mapq_val’], pars[‘input_type’], pars[‘min_alignment_len’])
File “/data/balajiak/conda/envs/metaphlan2/bin/metaphlan2.py”, line 1051, in map2bbh
n_metagenoges_reads = int©
ValueError: invalid literal for int() with base 10: ‘Traceback (most recent call last):’

The bowtie result does get generated, and trying to use this as input (rather than the fastq) using --input_type bowtie2out results in the same error. Any ideas on how to resolve this?

Thanks,
Jonathan Badger

Hi Jonathan,
can you do a tail on your bowtie2out file and post the output?

SRR9033751.24350043_SN1035:399:CBK49ACXX:3:2113:4686:11747_length=101 823__A0A174VHT1__DHP27_04775
SRR9033751.24350093_SN1035:399:CBK49ACXX:3:2113:6506:11615_length=101 216816__C2GSX7__DD678_03555
SRR9033751.24350116_SN1035:399:CBK49ACXX:3:2113:7260:11543_length=101 39492__D4JVC7__yfgF
SRR9033751.24350165_SN1035:399:CBK49ACXX:3:2113:8881:11512_length=101 1150298__A0A174HYM9__ERS852406_01664
SRR9033751.24350202_SN1035:399:CBK49ACXX:3:2113:10154:11708_length=101 160404__R6PW10__B5G26_08785
SRR9033751.24350267_SN1035:399:CBK49ACXX:3:2113:12318:11691_length=101 239935__A0A2N8ITW6__CXT91_06265
SRR9033751.24350340_SN1035:399:CBK49ACXX:3:2113:15288:11524_length=101 214856__A0A174A1J3__ERS852447_01000
SRR9033751.24350393_SN1035:399:CBK49ACXX:3:2113:17440:11521_length=101 853__A8S882__C4N23_07600
SRR9033751.24350397_SN1035:399:CBK49ACXX:3:2113:17443:11608_length=101 39492__D4JVZ6__truB
#nreads Traceback (most recent call last):

(there is nothing after that last line)

I have the same problem.

n_metagenoges_reads = int(c)
ValueError: invalid literal for int() with base 10: 'Traceback (most recent call last):'

when I change int© to float©, it give me another problem

 n_metagenoges_reads = float(c)
ValueError: could not convert string to float: 'Traceback (most recent call last):'

Have you checked if the input files are corrupted? This can mean that read_fastx.py failed parsing the files.
Can you try running read_fastx.py -l 70 <metagenomes> | tail

Is the content of the bowtie2out file non empty?

I use the demo example: CCMD34381688ST-21-0.fastq and CCMD34381688ST-21-0.sam.bz2
it create a bowtie2 file, but not contents in it

$ ls -lhrt
-rw-r--r-- 1 ckzhu ckzhu 80 Mar 13 20:18 CCMD34381688ST-21-0.bowtie2.bz2

From were have you downloaded the file? Is it a metagenome from the ZellerG dataset?

Here, from StrainPhlAn2 tutorial
https://bitbucket.org/biobakery/biobakery/src/default/demos/biobakery_demos/data/strainphlan2/reads/

I use the following command
Step 1: Run MetaPhlAn2

The first step is to run MetaPhlAn2 to obtain the sam output files. The sam files contain the alignment information from mapping the reads of each sample against the MetaPhlAn2 marker database. For that, run the following Script:

mkdir -p sams
mkdir -p bowtie2
mkdir -p profiles
for f in fastq/*
do
    echo "Running metaphlan2 on ${f}"
    bn=$(basename ${f})
    python metaphlan2.py $f --index mpa_v294_CHOCOPhlAn_201901 --input_type fastq -s sams/${bn}.sam.bz2 --bowtie2out bowtie2/${bn}.bowtie2.bz2 -o profiles/profiled_${bn}.txt
done
$  read_fastx.py -l 70 CCMD34381688ST-21-0.fastq | tail
52455+
ADDDDDBBDDECEEFFFFHHFHJJIHHHGJIJJIJJJJJJJJIJJJJJJJJJIJJIIGHIJJJJJJHFJJJJJIJJJIIJJJJJJJJJJJIH
@ERR480964.1133440_D3FCO8P1:1:2316:6376:92785#0_length=70
GGACAGAAAATCGCTGTGGTCTTGTCTATATGCCTATAAGTTTGATAATAACGATATAGGATATGCGCGT
+
CCDDCCDDBDBDDDDDDCCDDDDEEECECDEDFDB?=77GGJIIJIIIHIHHCEGIGIIIIJJHFGIHEC
@ERR480964.1135330_D3FCO8P1:1:2316:15917:97105#0_length=87
CGCACAGATAGGGGCAAGCTATTATACAAGCACAGATTTCTTCAATCAGCAGCTGAAGTATGAGCCGTATTCTCACTATGGCATCGG
+
HIIIIIIHIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGGGIIIIHHEHHHHEFFCCCECDFCECC@>>CBCCCC?

I do not have any problem in running that command

$ metaphlan2.py --input_type fastq CCMD34381688ST-21-0.fastq.bz2 --index mpa_v294_CHOCOPhlAn_201901 --bowtie2db /shares/CIBIO-Storage/CM/scratch/users/francesco.beghini/hg/chocophlan/export_201901/metaphlan2/mpa_v294_CHOCOPhlAn_201901 --force > CCMD34381688ST-21-0_profile.tsv
Elapsed time to run MetaPhlAn2: 79.95703434944153 s

How have you installed MetaPhlAn? Are you using Python 3?

I’m very sorry, I’m too careless, my catalog is wrong.
Thank you for solving my problem!!!

$ metaphlan2.py CCMD34381688ST-21-0.fastq --index mpa_v296_CHOCOPhlAn_201901 --input_type fastq -s sams/CCMD34381688ST-21-0.sam.bz2 --bowtie2out bowtie2/CCMD34381688ST-21-0.bowtie2.bz2 -o profiles/profiled_CCMD34381688ST-21-0.txt
Traceback (most recent call last):
  File "/miniconda3/envs/metaphlan2_v296/bin/metaphlan2.py", line 1442, in <module>
    metaphlan2()
  File "/miniconda3/envs/metaphlan2_v296/bin/metaphlan2.py", line 1254, in metaphlan2
    markers2reads, n_metagenome_reads = map2bbh(pars['inp'], pars['min_mapq_val'], pars['input_type'], pars['min_alignment_len'])
  File "/miniconda3/envs/metaphlan2_v296/bin/metaphlan2.py", line 1051, in map2bbh
    n_metagenoges_reads = int(c)
ValueError: could not convert string to float: 'Traceback (most recent call last):'

$ metaphlan2.py reads/CCMD34381688ST-21-0.fastq --index mpa_v296_CHOCOPhlAn_201901 --input_type fastq -s sams/CCMD34381688ST-21-0.sam.bz2 --bowtie2out bowtie2/CCMD34381688ST-21-0.bowtie2.bz2 -o profiles/profiled_CCMD34381688ST-21-0.txt
Elapsed time to run MetaPhlAn2: 66.39937257766724 s


1 Like

Yes, I think this is the problem. read_fastx.py returns

Traceback (most recent call last):
File “/usr/local/Caskroom/miniconda/base/envs/metaphlan2/bin/read_fastx.py”, line 155, in
nreads += read_and_write_raw(f, opened=False, min_len=min_len)
File “/usr/local/Caskroom/miniconda/base/envs/metaphlan2/bin/read_fastx.py”, line 119, in read_and_write_raw
nreads = read_and_write_raw_int(inf, min_len=min_len)
File “/usr/local/Caskroom/miniconda/base/envs/metaphlan2/bin/read_fastx.py”, line 89, in read_and_write_raw_int
for idx, record in enumerate(parser(fd),2):
File “/usr/local/Caskroom/miniconda/base/envs/metaphlan2/lib/python3.7/site-packages/Bio/SeqIO/QualityIO.py”, line 938, in FastqGeneralIterator
raise ValueError(“End of file without quality information.”)
ValueError: End of file without quality information.

and looking at the fastq, it does seem to be truncated mid-record.

Probably there’s a sequence in the fastq file that does not have the corrisponding quality. If you run read_fastx.py | tail, the last line returned should be the sequence without the QUAL line

The error message invalid literal for int() with base 10 would seem to indicate that you are passing a string that’s not an integer to the int() function . In other words it’s either empty, or has a character in it other than a digit.

You can solve this error by using Python isdigit() method to check whether the value is number or not. The returns True if all the characters are digits, otherwise False .

if val.isdigit():

The other way to overcome this issue is to wrap your code inside a Python try…except block to handle this error.

Python2.x and Python3.x

Sometimes the difference between Python2.x and Python3.x that leads to this ValueError: invalid literal for int() with base 10 .

With Python2.x , int(str(3/2)) gives you “1”. With Python3.x , the same gives you (“1.5”): ValueError: invalid literal for int() with base 10: “1.5”.