Thanks @fbeghini. I know I am asking a lot of questions that may bother you. Please don’t mind and bear with me as I am facing a lot of confusions.
Actually, I told you regarding the multiplication with bowtie2out mapped read numbers because here Dr. Segata says:
There are however two ways to get an estimate:
just multiply the total number of reads by the relative abundance of each species. This works well, but you are likely overestimating the number of reads in each species because you cannot count the number of reads that would not map against any reference genome
use the “-t rel_ab_w_read_stats” which estimates the number of reads that should come from a given species by considering the coverage of the species’ markers and the length of the genome (taken from reference genomes).
(However, the overestimation stated by Dr. Segata can be avoided with using bowtie2out mapped read number.)
Now, I am confused among the two ways what to follow because, both of them will give different read counts. And also, I want to do all the statistical analysis (e.g. wilcoxon ranksum, correlation, regression, etc.) with the read counts rather than relative abundance. So, what will be better?
Again, I am apologetic for asking too many questions.