Sgb_to_gtdb_profile.py count estimation, my own calculation

Hello,

The sgb_to_gtdb_profile.py script doesn’t provide the count, only the relative abundance. There is a topic about this:

Fortunately, a future update will give us access to the count.

So, while waiting this update, I have modifed the Sgb_to_gtdb_profile.py script to retain the count with the relative abundance. I’m asking if I have done that correctly:

I add into the sgb_to_gtdb_profile.py script the faculty to read the 4th column “estimated_number_of_reads_from_the_clade” when a metaphlan results with the “rel_ab_w_read_stats” option is in input.

I do that with a new variable:

abundances_relative_count[‘s’][gtdb_tax] += float(line[4])

This new variable “abundances_relative_count” is like the variable “abundances”

            if gtdb_tax not in abundances['s']:
                abundances['s'][gtdb_tax] = 0
            abundances['s'][gtdb_tax] += float(line[2])
            if gtdb_tax not in abundances_relative_count['s']:
                abundances_relative_count['s'][gtdb_tax] = 0
            abundances_relative_count['s'][gtdb_tax] += float(line[4])	

I add the write of this variable with the “abundances” variable:

wf.write(‘{}\t{}\t{}\n’.format(tax, abundances[tax_level][tax] * 100 / total4level[tax_level],abundances_relative_count[tax_level][tax] ))

Did you think it’s ok to do that ?

Cheers,
Jérémy Tournayre