Discrepancies in prescreen.py script

Hi,

I installed HUMAnN version 3.9 yesterday using the following command:

pip install humann==3.9 --no-binary :all:

However, I noticed that the prescreen.py script under the search folder differs from the version in the Git repo (according to the commit history from May 23, 2023).

In the pip-installed version, the script appears to return the value from the last column. When I use a MetaPhlAn output generated with the -t rel_ab_w_read_stats parameter, it retrieves the value from the estimated_number_of_reads_from_the_clade column.

In contrast, based on the May 23 2023 commit history, it should return the relative_abundance value instead.

Could you please confirm whether this discrepancy might affect the HUMAnN output using MetaPhlAn output generated with the -t rel_ab_w_read_stats parameter?

Thank you

This is related to an expected difference between the HUMAnN v3 and v4 lineages. V3 calls MetaPhlAn in the default way and uses the resulting last column of the output, which contains relative abundance information, for species selection. V4 calls MetaPhlAn with the additional read stats option and then uses the coverage column for species selection. V3 would not be expecting MetaPhlAn output with read stats included.

That said, it’s not clear to me from your message if what you’re seeing is an error vs. the expected structure/behavior of the two lineages of code?