Sgb_to_gtdb_profile.py keep read count estimation from -t rel_ab_w_read_stats

Dear MetaPhlAn developers,

I am running MetaPhlAn4 v. 4.0.6 with addition of the parameter -t rel_ab_w_read_stats to obtain the estimate of the number of reads coming from each clade. I have been testing the sgb_to_gtdb_profile.py script and found that it only preserves the relative abundances, not the read count estimation. Is there currently a way to preserve the read counts or is this something you would consider making an option with this script?

Thank you in advance and thank you for developing these very nice programs!

Best regards

Dear @sarmol
In the current version of the script this is not possible, but we will make it available in the next version of the code

Dear @aitor.blancomiguez
That sounds great, thank you.

Hello,

Can the count be obtained using the number of reads processed provided by Metaphlan (#***reads processed)?
Example:
From Metaphlan result file:

#51636226 reads processed

From sgb_to_gtdb result file:

#clade_name relative_abundance
d__Bacteria;p__Bacteroidota 45.47

So d__Bacteria;p__Bacteroidota = 45.47 * 51636226 / 100 / 2 (if this is paired end) = 11739495 count

Jeremy Tournayre

Hello,

I think my calculation don’t take into account the length of each marker. Fortunately the metaphlan option “rel_ab_w_read_stats” does.

I add into the sgb_to_gtdb_profile.py script the faculty to read the 4th column “estimated_number_of_reads_from_the_clade” when a metaphlan results with the “rel_ab_w_read_stats” option is in input.

I do that with a new variable:

abundances_relative_count[‘s’][gtdb_tax] += float(line[4])

This new variable “abundances_relative_count” is like the variable “abundances”

                if gtdb_tax not in abundances['s']:
                    abundances['s'][gtdb_tax] = 0
                abundances['s'][gtdb_tax] += float(line[2])
                if gtdb_tax not in abundances_relative_count['s']:
                    abundances_relative_count['s'][gtdb_tax] = 0
                abundances_relative_count['s'][gtdb_tax] += float(line[4])	

I add the write of this variable with the “abundances” variable:

wf.write(‘{}\t{}\t{}\n’.format(tax, abundances[tax_level][tax] * 100 / total4level[tax_level],abundances_relative_count[tax_level][tax] ))

Did you think it’s ok to do that ?

Jérémy Tournayre

Dear @aitor.blancomiguez

Any update on this? I would still very much like to be able to use the read counts with the GTDB taxonomy.

Thanks in advance

Sarah

Alternatively, you can run MetaPhlAn to generate the default relative abundance output (rel_ab) and save the Bowtie2 output. Then, using the Bowtie2 output, generate rel_ab_w_read_stats. Next, run sgb_to_gtdb_profile.py using the default output and finally merge it with the read statistics.
A bit of gymnastic can save some time !

Hi, thanks for the suggestion. I am not sure I follow you though, wont I end up with two outputs with a different set of organisms/species (standard/GTDB) which cannot be merged?

Hi, good point. Give it a try !

I already have both those outputs, and the taxa don’t match so it wont work :slight_smile:

1 Like