Sgb_to_gtdb_profile.py keep read count estimation from -t rel_ab_w_read_stats

sarmol · August 4, 2023, 8:54am

Dear MetaPhlAn developers,

I am running MetaPhlAn4 v. 4.0.6 with addition of the parameter -t rel_ab_w_read_stats to obtain the estimate of the number of reads coming from each clade. I have been testing the sgb_to_gtdb_profile.py script and found that it only preserves the relative abundances, not the read count estimation. Is there currently a way to preserve the read counts or is this something you would consider making an option with this script?

Thank you in advance and thank you for developing these very nice programs!

Best regards

aitor.blancomiguez · September 1, 2023, 12:02pm

Dear @sarmol
In the current version of the script this is not possible, but we will make it available in the next version of the code

sarmol · September 18, 2023, 11:50am

Dear @aitor.blancomiguez
That sounds great, thank you.

Jeremy_Tournayre · January 10, 2024, 2:02pm

Hello,

Can the count be obtained using the number of reads processed provided by Metaphlan (#***reads processed)?
Example:
From Metaphlan result file:

#51636226 reads processed

From sgb_to_gtdb result file:

#clade_name relative_abundance
d__Bacteria;p__Bacteroidota 45.47

So d__Bacteria;p__Bacteroidota = 45.47 * 51636226 / 100 / 2 (if this is paired end) = 11739495 count

Jeremy Tournayre

Jeremy_Tournayre · January 11, 2024, 9:42am

Hello,

I think my calculation don’t take into account the length of each marker. Fortunately the metaphlan option “rel_ab_w_read_stats” does.

I add into the sgb_to_gtdb_profile.py script the faculty to read the 4th column “estimated_number_of_reads_from_the_clade” when a metaphlan results with the “rel_ab_w_read_stats” option is in input.

I do that with a new variable:

abundances_relative_count[‘s’][gtdb_tax] += float(line[4])

This new variable “abundances_relative_count” is like the variable “abundances”

                if gtdb_tax not in abundances['s']:
                    abundances['s'][gtdb_tax] = 0
                abundances['s'][gtdb_tax] += float(line[2])
                if gtdb_tax not in abundances_relative_count['s']:
                    abundances_relative_count['s'][gtdb_tax] = 0
                abundances_relative_count['s'][gtdb_tax] += float(line[4])

I add the write of this variable with the “abundances” variable:

wf.write(‘{}\t{}\t{}\n’.format(tax, abundances[tax_level][tax] * 100 / total4level[tax_level],abundances_relative_count[tax_level][tax] ))

Did you think it’s ok to do that ?

Jérémy Tournayre

Topic		Replies	Views
Sgb_to_gtdb_profile.py count estimation, my own calculation MetaPhlAn	0	181	February 7, 2024
Sgb_to_gtdb_profile.py bacterial abundances raised to 100% MetaPhlAn	0	225	January 16, 2024
Merge_metaphlan_tables.py with rel_ab_w_read_stats MetaPhlAn	3	1361	December 7, 2022
Metaphlan4 discordance between relative_abundance and estimated_number_of_reads MetaPhlAn	6	734	October 26, 2022
Can we add a column with the number of reads for each taxon next to relative_abundance? MetaPhlAn	6	435	September 26, 2024

Sgb_to_gtdb_profile.py keep read count estimation from -t rel_ab_w_read_stats

Related topics