Hi I have preprocessed rawdata with kneaddata for whole meta transcriptome analysis. After that, I browsed the log file to see the read counts at each step.
Here’s where I found something I don’t understand. The description of the resulting final pair file read “Total reads after merging results from multiple databases”.
Does this mean that only Reads that can be merged in the results from both databases have been sorted?
For example, in the table below, with 24801812 reads from the human mRNA DB and 840881 reads from the rRNA DB, would the overlapping read count be 840505?
Below are the files I identified and their read counts.:
|raw pair1||Initial number of reads||34378835|
|trimmed pair1||Total reads after trimming (meta_test_R1_kneaddata.trimmed.1.fastq )||28814562|
|decontaminated human_hg38_refMrna pair1||Total reads after removing those found in reference database (meta_test_R1_kneaddata_human_hg38_refMrna_bowtie2_paired_clean_1.fastq )||24801812|
|decontaminated SILVA_128_LSUParc_SSUParc_ribosomal_RNA pair1||Total reads after removing those found in reference database (meta_test_R1_kneaddata_SILVA_128_LSUParc_SSUParc_ribosomal_RNA_bowtie2_paired_clean_1.fastq )||840881|
|final pair1||Total reads after merging results from multiple databases (meta_test_R1_kneaddata_paired_1.fastq )||840505|
I used a total of two databases, which are shown below:
06/20/2023 02:23:37 PM - kneaddata.knead_data - INFO: Running kneaddata v0.10.0
06/20/2023 02:23:37 PM - kneaddata.knead_data - INFO: Output files will be written to: /data/test/bstest/metatrans_test/kneaddata
06/20/2023 02:23:37 PM - kneaddata.knead_data - DEBUG: Running with the following arguments:
verbose = False
input = /data/test/bstest/metatrans_test/meta_test_R1.fastq.gz /data/test/bstest/metatrans_test/meta_test_R2.fastq.gz
output_dir = /data/test/bstest/metatrans_test/kneaddata
reference_db = /data/References/bowtie_human_transcriptome/human_hg38_refMrna /data/References/kneaddata_db_ribosomal_RNA/SILVA_128_LSUParc_SSUParc_ribosomal_RNA