The bowtie result does get generated, and trying to use this as input (rather than the fastq) using --input_type bowtie2out results in the same error. Any ideas on how to resolve this?
Have you checked if the input files are corrupted? This can mean that read_fastx.py failed parsing the files.
Can you try running read_fastx.py -l 70 <metagenomes> | tail
I use the following command
Step 1: Run MetaPhlAn2
The first step is to run MetaPhlAn2 to obtain the sam output files. The sam files contain the alignment information from mapping the reads of each sample against the MetaPhlAn2 marker database. For that, run the following Script:
mkdir -p sams
mkdir -p bowtie2
mkdir -p profiles
for f in fastq/*
do
echo "Running metaphlan2 on ${f}"
bn=$(basename ${f})
python metaphlan2.py $f --index mpa_v294_CHOCOPhlAn_201901 --input_type fastq -s sams/${bn}.sam.bz2 --bowtie2out bowtie2/${bn}.bowtie2.bz2 -o profiles/profiled_${bn}.txt
done
$ metaphlan2.py --input_type fastq CCMD34381688ST-21-0.fastq.bz2 --index mpa_v294_CHOCOPhlAn_201901 --bowtie2db /shares/CIBIO-Storage/CM/scratch/users/francesco.beghini/hg/chocophlan/export_201901/metaphlan2/mpa_v294_CHOCOPhlAn_201901 --force > CCMD34381688ST-21-0_profile.tsv
Elapsed time to run MetaPhlAn2: 79.95703434944153 s
How have you installed MetaPhlAn? Are you using Python 3?
Yes, I think this is the problem. read_fastx.py returns
Traceback (most recent call last):
File “/usr/local/Caskroom/miniconda/base/envs/metaphlan2/bin/read_fastx.py”, line 155, in
nreads += read_and_write_raw(f, opened=False, min_len=min_len)
File “/usr/local/Caskroom/miniconda/base/envs/metaphlan2/bin/read_fastx.py”, line 119, in read_and_write_raw
nreads = read_and_write_raw_int(inf, min_len=min_len)
File “/usr/local/Caskroom/miniconda/base/envs/metaphlan2/bin/read_fastx.py”, line 89, in read_and_write_raw_int
for idx, record in enumerate(parser(fd),2):
File “/usr/local/Caskroom/miniconda/base/envs/metaphlan2/lib/python3.7/site-packages/Bio/SeqIO/QualityIO.py”, line 938, in FastqGeneralIterator
raise ValueError(“End of file without quality information.”)
ValueError: End of file without quality information.
and looking at the fastq, it does seem to be truncated mid-record.
Probably there’s a sequence in the fastq file that does not have the corrisponding quality. If you run read_fastx.py | tail, the last line returned should be the sequence without the QUAL line
The error message invalid literal for int() with base 10 would seem to indicate that you are passing a string that’s not an integer to the int() function . In other words it’s either empty, or has a character in it other than a digit.
You can solve this error by using Python isdigit() method to check whether the value is number or not. The returns True if all the characters are digits, otherwise False .
if val.isdigit():
The other way to overcome this issue is to wrap your code inside a Python try…except block to handle this error.