I am running MetaPhlAn4 v. 4.0.6 with addition of the parameter -t rel_ab_w_read_stats to obtain the estimate of the number of reads coming from each clade. I have been testing the sgb_to_gtdb_profile.py script and found that it only preserves the relative abundances, not the read count estimation. Is there currently a way to preserve the read counts or is this something you would consider making an option with this script?
Thank you in advance and thank you for developing these very nice programs!
I think my calculation donât take into account the length of each marker. Fortunately the metaphlan option ârel_ab_w_read_statsâ does.
I add into the sgb_to_gtdb_profile.py script the faculty to read the 4th column âestimated_number_of_reads_from_the_cladeâ when a metaphlan results with the ârel_ab_w_read_statsâ option is in input.
This new variable âabundances_relative_countâ is like the variable âabundancesâ
if gtdb_tax not in abundances['s']:
abundances['s'][gtdb_tax] = 0
abundances['s'][gtdb_tax] += float(line[2])
if gtdb_tax not in abundances_relative_count['s']:
abundances_relative_count['s'][gtdb_tax] = 0
abundances_relative_count['s'][gtdb_tax] += float(line[4])
I add the write of this variable with the âabundancesâ variable:
Alternatively, you can run MetaPhlAn to generate the default relative abundance output (rel_ab) and save the Bowtie2 output. Then, using the Bowtie2 output, generate rel_ab_w_read_stats. Next, run sgb_to_gtdb_profile.py using the default output and finally merge it with the read statistics.
A bit of gymnastic can save some time !
Hi, thanks for the suggestion. I am not sure I follow you though, wont I end up with two outputs with a different set of organisms/species (standard/GTDB) which cannot be merged?