Hello,
I’m trying biobakery (latest docker image).
I would like to understand how the read count information in output/kneaddata/merged/kneaddata_read_count_table.tsv and in output/humann/counts/humann_read_and_species_count_table.tsv compare to each other.
Example from one of my sample:
kneaddata_read_count_table.tsv
|raw pair1|250000|
|raw pair2|250000|
|trimmed pair1|230495|
|trimmed pair2|230495|
|trimmed orphan1|7075|
|trimmed orphan2|8162|
|decontaminated hg37dec_v0.1 pair1|228858|
|decontaminated hg37dec_v0.1 pair2|228858|
|decontaminated hg37dec_v0.1 orphan1|7064|
|decontaminated hg37dec_v0.1 orphan2|8104|
|final pair1|228858|
|final pair2|228858|
|final orphan1|7064|
|final orphan2|8104|
and in humann_read_and_species_count_table.tsv:
|humann_total_reads|470978|
If I do final pair1 + final pair2 + final orphan1 + final orphan2, I get 472884, which is different (higher) than what humann reports. If I exclude orphans, I only get 457716, which is still different (lower) than what humann reports.
Any hint of where this could stem from?
Thanks again,
Theo