Hello,
The sgb_to_gtdb_profile.py script doesn’t provide the count, only the relative abundance. There is a topic about this:
Fortunately, a future update will give us access to the count.
So, while waiting this update, I have modifed the Sgb_to_gtdb_profile.py script to retain the count with the relative abundance. I’m asking if I have done that correctly:
I add into the sgb_to_gtdb_profile.py script the faculty to read the 4th column “estimated_number_of_reads_from_the_clade” when a metaphlan results with the “rel_ab_w_read_stats” option is in input.
I do that with a new variable:
abundances_relative_count[‘s’][gtdb_tax] += float(line[4])
This new variable “abundances_relative_count” is like the variable “abundances”
if gtdb_tax not in abundances['s']: abundances['s'][gtdb_tax] = 0 abundances['s'][gtdb_tax] += float(line[2]) if gtdb_tax not in abundances_relative_count['s']: abundances_relative_count['s'][gtdb_tax] = 0 abundances_relative_count['s'][gtdb_tax] += float(line[4])
I add the write of this variable with the “abundances” variable:
wf.write(‘{}\t{}\t{}\n’.format(tax, abundances[tax_level][tax] * 100 / total4level[tax_level],abundances_relative_count[tax_level][tax] ))
Did you think it’s ok to do that ?
Cheers,
Jérémy Tournayre