KneadData for dual-transcriptome RNA-seq data

Hi Biobakery team,

I am using KneadData for processing dual-transcriptome RNA-seq data (host transcriptome and microbial metatranscriptome). My RNA-seq data were from paired-end (PE) sequencing run, thus for each sample, I have two fastq files, forward and reverse direction. In addition, both host and microbial sequences were integrated together in the same fastq file. Thus, I am wondering what the best way for processing such integrated data, and only extract microbial part?

I am using the following command, I have two additional questions:

kneaddata --input sample1.R1.fastq --input sample1.R2.fastq -db human_rna_db --output seq_out.

  1. In this command, I followed the user manual to include two fastq input options, one for forward direction, the other for the reverse direction, I am wondering if these are same thing as the first and second mate as described in the user manual? If not, how should I include forward and reverse fastq files?

  2. As shown in the command, I only used the human transcriptome database option. Since I only set up human_rna_db, does it mean the reads does not belong to human_rna_db is the bacterial reads? i.e., I should use (c) and (d) from the following description? Or is there a way that I can separate human and microbial sequences from the same fastq file?

Files for just the user manual human_rna_db database:
(a). seq_kneaddata_paired_human_rna_db_bowtie2_contam_1.fastq: Reads from the first mate in situation (1) above that were identified as belonging to the human_rna_db database.
(b). seq_kneaddata_paired_human_rna_db_bowtie2_contam_2.fastq: Reads from the second mate in situation (1) above that were identified as belonging to the human_rna_db database.
(c). seq_kneaddata_paired_human_rna_db_bowtie2_clean_1.fastq: Reads from the first mate in situation (1) above that were identified as NOT belonging to the human_rna_db database.
(d). seq_kneaddata_paired_human_rna_db_bowtie2_clean_2.fastq: Reads from the second mate in situation (1) above that were identified as NOT belonging to the human_rna_db database.

Thank you so much!

Zhaozhong

Hi @hellofuture,

Apologies for the late reply.

  1. The two fastq input files can be passed as a parameter in kneaddata like this : --input sample1.R1.fastq --input sample1.R2.fastq
  2. Yes, the reads that do not belong to human_rna_db are the microbial reads that you would be interested in based on your use case. - (c) and (d) from the above description.

Regards,
Sagun