Sgb_to_gtdb_profile.py keep read count estimation from -t rel_ab_w_read_stats

Dear MetaPhlAn developers,

I am running MetaPhlAn4 v. 4.0.6 with addition of the parameter -t rel_ab_w_read_stats to obtain the estimate of the number of reads coming from each clade. I have been testing the sgb_to_gtdb_profile.py script and found that it only preserves the relative abundances, not the read count estimation. Is there currently a way to preserve the read counts or is this something you would consider making an option with this script?

Thank you in advance and thank you for developing these very nice programs!

Best regards

Dear @sarmol
In the current version of the script this is not possible, but we will make it available in the next version of the code

Dear @aitor.blancomiguez
That sounds great, thank you.

Hello,

Can the count be obtained using the number of reads processed provided by Metaphlan (#***reads processed)?
Example:
From Metaphlan result file:

#51636226 reads processed

From sgb_to_gtdb result file:

#clade_name relative_abundance
d__Bacteria;p__Bacteroidota 45.47

So d__Bacteria;p__Bacteroidota = 45.47 * 51636226 / 100 / 2 (if this is paired end) = 11739495 count

Jeremy Tournayre

Hello,

I think my calculation don’t take into account the length of each marker. Fortunately the metaphlan option “rel_ab_w_read_stats” does.

I add into the sgb_to_gtdb_profile.py script the faculty to read the 4th column “estimated_number_of_reads_from_the_clade” when a metaphlan results with the “rel_ab_w_read_stats” option is in input.

I do that with a new variable:

abundances_relative_count[‘s’][gtdb_tax] += float(line[4])

This new variable “abundances_relative_count” is like the variable “abundances”

                if gtdb_tax not in abundances['s']:
                    abundances['s'][gtdb_tax] = 0
                abundances['s'][gtdb_tax] += float(line[2])
                if gtdb_tax not in abundances_relative_count['s']:
                    abundances_relative_count['s'][gtdb_tax] = 0
                abundances_relative_count['s'][gtdb_tax] += float(line[4])	

I add the write of this variable with the “abundances” variable:

wf.write(‘{}\t{}\t{}\n’.format(tax, abundances[tax_level][tax] * 100 / total4level[tax_level],abundances_relative_count[tax_level][tax] ))

Did you think it’s ok to do that ?

Jérémy Tournayre