Humann3 out of range error reading Metaphlan3 output

I have run metaphlan3 without errors. The output seemed different, but it seems to be correct considering the new output format, as described in this thread Unexpected output (format)

However, running humann3 gives the following error:

$  humann  --input $reads_cat --output output_dir/humann/  --output-basename $name             --threads 2  --taxonomic-profile output_dir/metaphlan/${name}.metaphlan.txt 

Output files will be written to: /ceph/projects/006_MiMens/fastq_process/stag-mwc/output_dir/humann
Decompressing gzipped file ...

WARNING: Can not call software version for bowtie2

Traceback (most recent call last):
  File "/ceph/home/luhugerth/.conda/envs/humann3/bin/humann", line 10, in <module>
    sys.exit(main())
  File "/ceph/home/luhugerth/.conda/envs/humann3/lib/python3.6/site-packages/humann/humann.py", line 975, in main
    custom_database = prescreen.create_custom_database(config.nucleotide_database, bug_file)
  File "/ceph/home/luhugerth/.conda/envs/humann3/lib/python3.6/site-packages/humann/search/prescreen.py", line 102, in create_custom_database
    read_percent=float(data[-2])
IndexError: list index out of range

This seems to come from these lines in the prescreen.py code:

if re.search("s__", line):
                # check threshold
                try:
                    data=line.split("\t")
                    if data[-1].replace(".","").replace("e-","").isdigit():
                        read_percent=float(data[-1])
                    else:
                        read_percent=float(data[-2])

Although I can’t see why -2 would be out of range, since the output file seems correct. I’ve even tried awking NF, and except for the 3 header lines beginning in #, all rows have 4 tab-separated fields.

At this point it might just be easier for me to awk away the final column, but surely there’s something I’m missing?

Hi again, I just realised what the problem is… All my filenames contain the name of my project followed by __ and then the name of the sample. The problem is that the project is called MiMens and humann detects MiMens__ as a species line. Probably better to explicitly skip lines starting with # in future versions!

Thanks for the follow up post! Glad to hear you solved it!