I am running MetaPhlAn4 v. 4.0.6 with addition of the parameter -t rel_ab_w_read_stats to obtain the estimate of the number of reads coming from each clade. I have been testing the sgb_to_gtdb_profile.py script and found that it only preserves the relative abundances, not the read count estimation. Is there currently a way to preserve the read counts or is this something you would consider making an option with this script?
Thank you in advance and thank you for developing these very nice programs!
I think my calculation don’t take into account the length of each marker. Fortunately the metaphlan option “rel_ab_w_read_stats” does.
I add into the sgb_to_gtdb_profile.py script the faculty to read the 4th column “estimated_number_of_reads_from_the_clade” when a metaphlan results with the “rel_ab_w_read_stats” option is in input.
This new variable “abundances_relative_count” is like the variable “abundances”
if gtdb_tax not in abundances['s']:
abundances['s'][gtdb_tax] = 0
abundances['s'][gtdb_tax] += float(line[2])
if gtdb_tax not in abundances_relative_count['s']:
abundances_relative_count['s'][gtdb_tax] = 0
abundances_relative_count['s'][gtdb_tax] += float(line[4])
I add the write of this variable with the “abundances” variable: