Hello,
I want a clarification of this topic:
I only use Humann3 so I don’t have the humann_read_and_species_count_table.tsv file from the Biobakery workflow.
To get the raw counts for the use of DESEQ2, I need to remove the length normalisation of the RPK.
To do this, I need to get the total reads as described in the quoted topic above.
From KneadData before Humann I can get the number of reads here:
04/10/2024 02:46:07 AM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( out_paired_1.fastq ): 18018168.0
According to the biobakery script, the script retrieves the number of reads in the humann.log :
04/10/2024 03:22:43 AM - humann.utilities - DEBUG: b’36036336 reads; of these:
I don’t know if I really need to use 36036336, I think it’s better to use 18018168, maybe it won’t change much.
36036336 is the number of reads in the out_paired_1 file multiplied by two because it’s in paired end (there’s also an out_paired_2.fastq file with the same number of reads).
For example on this file in CPM:
Pathway humann_Abundance-CPM
UNMAPPED 804106
UNINTEGRATED 174839
UNINTEGRATED|unclassified 168038
UNINTEGRATED|g__Prevotella.s__Prevotella_sp_tc2_28 3679.22
PWY-1042: glycolysis IV 2030.87
GLUCONEO-PWY: gluconeogenesis I 814.428
For GLUCONEO-PWY: gluconeogenesis I do:
814.428 / 1 000 000 * 18018168 = 14674.5005279
=> 14674.5005279
So my count is a decimal, is that OK?
Do you think I should round this count ?
Cheers,
Jérémy Tournayre