Number of Reads Metagenomic Data for Maaslin3

young_doktor · January 28, 2025, 3:55pm

I am wondering how the Huttenhower lab team recommends getting the “number of reads” from metagenomic data for maaslin3. When I search this, all I see is perhaps getting this in kneaddata.

I also get a readout of subsample-raw reads from the illumina-basic QC file. But this # of reads also includes human reads, which are eventually removed for bacterial shotgun analysis—so I assume this is not the correct way to obtain the bacterial “number of reads”.

Thoughts?

Thanks!

young_doktor · January 28, 2025, 4:59pm

Would it happen to be the counts from this file in kneaddata? SAMPLENAME.r1_kneaddata_hg37dec_v0.1_bowtie2_unmatched_1_clean.fastq : 406541.0

If so, I think i have my answer

WillNickols · January 28, 2025, 5:24pm

Assuming you have paired samples, there should be an unmatched_1, unmatched_2, paired_1, and paired_2 for the final cleaned reads. We recommend summing these for each sample and using the sum as the total reads. You could also count the reads from the files directly (and then sum them across files corresponding to the same sample).

young_doktor · January 28, 2025, 6:08pm

@WillNickols I do have paired samples, but I am running only the .r1_kneaddata_fastq file through humann.

Therefore, would you recommend only summing the unmatched_1 and paired_1 files?

Therefore sum of:

kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( SAMPLE_NAME.r1_kneaddata_paired_1.fastq ): 71665345.0
+
kneaddata.utilities - INFO: READ COUNT: final orphan1 : Total reads after merging results from multiple databases ( SAMPLENAME.r1_kneaddata_unmatched_1.fastq ): 2983640.0

Numbers at the end are the actual counts.
?

Thank you for your help! This is something I have been wondering about for a long time!

WillNickols · January 28, 2025, 7:02pm

Correct - in this case, if you’ve only run the r1 file through functional profiling, you should just use the sum of those reads.

Will

young_doktor · January 28, 2025, 8:19pm

@WillNickols Awesome! Thank you for the speedy response!

young_doktor · May 15, 2025, 2:47pm

Hi Everyone, just a PSA. If you are running the r1.kneaddata_fastq file through humann, you should sum up the following for a read counts like Will mentioned: unmatched_1, unmatched_2, paired_1, and paired_2 counts

Topic		Replies	Views
[wmgx workflow] Discrepancy in total read counts between kneaddata and humann KneadData	3	449	December 15, 2022
Kneaddata final pair1 : Total reads after merging results from multiple databases KneadData	0	261	August 22, 2023
Interpreting read_count_table Numbers? KneadData	0	340	February 9, 2023
Deseq2 analysis of Humann3 outputs - clarification HUMAnN	3	98	October 30, 2024
Questions about the read count table pulled from kneaddata logs KneadData	1	557	February 8, 2023

Number of Reads Metagenomic Data for Maaslin3

Related topics