Humann3: tax profile parsing error is a bit misleading

The docs at GitHub - biobakery/humann: HUMAnN 3.0 is the next generation of HUMAnN 1.0 (HMP Unified Metabolic Analysis Network). state that:

g__Bacteroides|s__Bacteroides_thetaiotaomicron	12.16326
g__Bacteroides|s__Bacteroides_cellulosilyticus	12.02768
g__Bacteroides|s__Bacteroides_caccae	11.43394
g__Dialister|s__Dialister_invisus	10.52286
g__Bacteroides|s__Bacteroides_stercoris	10.42227

is an example of a valid taxonomic profile.

If one includes a blank line at the end of the profile (or a similar slight difference from the prescribed format, then the error generated by humann3 is:

ERROR: The MetaPhlAn taxonomic profile provided was not generated with the expected database version. Please update your version of MetaPhlAn to at least v3.0.

The code is:

def get_abundance(line):
    """
    Read in the abundance value from the taxonomy file
    """
    try:
        data=line.split("\t")
        if data[-1].replace(".","").replace("e-","").isdigit():
            read_percent=float(data[-1])
        else:
            read_percent=float(data[-2])
    except ValueError:
        message="The MetaPhlAn taxonomic profile provided was not generated with the expected database version. Please update your version of MetaPhlAn to at least v3.0."
        logger.error(message)
        sys.exit("\n\nERROR: "+message)

It would help to include a more clear error message (e.g., acknowledging that the profile might not originate from metaphlan).
It would also help to be more flexible with the formatting (e.g., skipping blank lines during the file parsing).

It appears that there is no rstrip() to deal with line returns. If I print the data object, I get:

['g__Bacteroides|s__Bacteroides_thetaiotaomicron', '12.16326\n']

I’m guessing that metaphlan normally includes more columns, so a line return is not necessary… unless one uses a custom tax_profile table with only 2 columns.

If I include an extra column to the tsv, such as:

g__Bacteroides|s__Bacteroides_thetaiotaomicron	12.16326	X
g__Bacteroides|s__Bacteroides_cellulosilyticus	12.02768	X

…then the file is parsed correctly.

It appears that the tax_profile table must also include a header line with the metaphlan version:

if line.startswith("#") and (config.metaphlan_v3_db_version in line or config.metaphlan_v4_db_version in line):
    version_found = True

Apparently, one must use either of these values:

metaphlan_v3_db_version="v3"
metaphlan_v4_db_version="vOct22"

This info would also be good to include in the GitHub - biobakery/humann: HUMAnN 3.0 is the next generation of HUMAnN 1.0 (HMP Unified Metabolic Analysis Network). docs