I am wondering how the Huttenhower lab team recommends getting the “number of reads” from metagenomic data for maaslin3. When I search this, all I see is perhaps getting this in kneaddata.
I also get a readout of subsample-raw reads from the illumina-basic QC file. But this # of reads also includes human reads, which are eventually removed for bacterial shotgun analysis—so I assume this is not the correct way to obtain the bacterial “number of reads”.
Thoughts?
Thanks!
Would it happen to be the counts from this file in kneaddata? SAMPLENAME.r1_kneaddata_hg37dec_v0.1_bowtie2_unmatched_1_clean.fastq : 406541.0
If so, I think i have my answer
Assuming you have paired samples, there should be an unmatched_1, unmatched_2, paired_1, and paired_2 for the final cleaned reads. We recommend summing these for each sample and using the sum as the total reads. You could also count the reads from the files directly (and then sum them across files corresponding to the same sample).
@WillNickols I do have paired samples, but I am running only the .r1_kneaddata_fastq file through humann.
Therefore, would you recommend only summing the unmatched_1 and paired_1 files?
Therefore sum of:
kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( SAMPLE_NAME.r1_kneaddata_paired_1.fastq ): 71665345.0
+
kneaddata.utilities - INFO: READ COUNT: final orphan1 : Total reads after merging results from multiple databases ( SAMPLENAME.r1_kneaddata_unmatched_1.fastq ): 2983640.0
Numbers at the end are the actual counts.
?
Thank you for your help! This is something I have been wondering about for a long time! 
Correct - in this case, if you’ve only run the r1 file through functional profiling, you should just use the sum of those reads.
Will
@WillNickols Awesome! Thank you for the speedy response! 