Metaphlan Read count output instead of relative abundances

DEEPCHANDA7 · July 22, 2020, 6:18am

Hi @fbeghini. I have ran MetaPhlAn 3.0 with -t rel_ab_w_read_stats option to get read counts along with relative abundances. But, I am seeing descrepancy in many cases.

For example: relative abundance of Prevotella_copri is 54.10114. But, total mapped read count is 7569053 and read count for Prevotella copri is 4558937. So, it was supposed to be (4558937/7569053)*100 = 60.231273318.
Why I’m seeing such difference? Is there any normalization or something? If yes, how can I get absolute read counts of each clade?

Thanks,
DC7

fbeghini · July 22, 2020, 8:13am

See Metaphlan3 relative abundance

DEEPCHANDA7 · July 22, 2020, 9:22am

Considering your reply in that link, I think there’s no way to get the absolute read counts of each of the clades. Right?
In that case, is it impossible to calculate alpha diversity or do rarefaction analysis? Or, is there any way out?

Thanks,
DC7

fbeghini · July 22, 2020, 2:45pm

No. Alpha diversity can be calculated in terms of present species, a tally of species with non-zero relative abundance. As mentioned in Rarefy metagenome sequence data before MetaPhlAn3 analysis , rarefaction should be done on the raw reads or the bowtie2out file.

DEEPCHANDA7 · July 22, 2020, 2:51pm

regarding alpha diversity calculation it is not possible with phyloseq. Because it does not accept relative abundance for the calculation. It takes the read counts of the species. So, considering that, can I multiply the relative abundance of each of the species with number of reads in bowtie2out.txt file and then analyse with phyloseq?
Thanks,
DC7

fbeghini · July 22, 2020, 3:34pm

It you want to use the estimated number of reads, you can re-run the analysis using -t rel_ab_w_read_stats. It is not necessary to use phyloseq to calculate the alpha diversity if you use a richness index, just count the species present.

DEEPCHANDA7 · July 22, 2020, 3:38pm

Thanks @fbeghini. I know I am asking a lot of questions that may bother you. Please don’t mind and bear with me as I am facing a lot of confusions.
Actually, I told you regarding the multiplication with bowtie2out mapped read numbers because here Dr. Segata says:

There are however two ways to get an estimate:

just multiply the total number of reads by the relative abundance of each species. This works well, but you are likely overestimating the number of reads in each species because you cannot count the number of reads that would not map against any reference genome

use the “-t rel_ab_w_read_stats” which estimates the number of reads that should come from a given species by considering the coverage of the species’ markers and the length of the genome (taken from reference genomes).

(However, the overestimation stated by Dr. Segata can be avoided with using bowtie2out mapped read number.)

Now, I am confused among the two ways what to follow because, both of them will give different read counts. And also, I want to do all the statistical analysis (e.g. wilcoxon ranksum, correlation, regression, etc.) with the read counts rather than relative abundance. So, what will be better?

Again, I am apologetic for asking too many questions.

Thanks,
DC7

DEEPCHANDA7 · August 3, 2020, 12:35pm

So, are you saying to calculate only the Observed alpha diversity which only takes into account the richness not the evenness? In case of Shannon diversity index both richness and evenness is considered. So, I think the count number is must.
Not?

fbeghini · August 5, 2020, 8:55am

No, you could also calculate the Shannon Index or the Gini Simpson index considering the relative abundances,but without using phyloseq since it requires to use count data.
You can just write a code snippet that performs the calculations for each considered sample.

DEEPCHANDA7 · August 5, 2020, 9:12am

Sir, you did not respond to one of my previous question. Can I multiply the “#nreads” number from “.bowtie2out.txt” file with the relative abundances of each of the taxa to get their respective read count?

Thanks,
DC7

fbeghini · August 5, 2020, 9:40am

Yes, but the number will be the same as the one reported by rel_ab_w_read_stats

DEEPCHANDA7 · August 11, 2020, 5:13am

But, in that case, the merge_metaphlan_tables.py command does not work.

Topic		Replies	Views
Metaphlan3 relative abundance MetaPhlAn	13	6735	July 31, 2023
To get the absolute counts of each taxa MetaPhlAn	3	2678	May 18, 2021
Metaphlan4 discordance between relative_abundance and estimated_number_of_reads MetaPhlAn	6	704	October 26, 2022
Relative clade abundance with low number of reads MetaPhlAn	1	422	April 4, 2022
Calculating Alpha and Beta Diversity with Metaphlan3 Rel Abund MetaPhlAn	1	2125	May 27, 2022

Metaphlan Read count output instead of relative abundances

Related Topics