KneadData for dual-transcriptome RNA-seq data

hellofuture · April 22, 2021, 12:56am

Hi Biobakery team,

I am using KneadData for processing dual-transcriptome RNA-seq data (host transcriptome and microbial metatranscriptome). My RNA-seq data were from paired-end (PE) sequencing run, thus for each sample, I have two fastq files, forward and reverse direction. In addition, both host and microbial sequences were integrated together in the same fastq file. Thus, I am wondering what the best way for processing such integrated data, and only extract microbial part?

I am using the following command, I have two additional questions:

kneaddata --input sample1.R1.fastq --input sample1.R2.fastq -db human_rna_db --output seq_out.

In this command, I followed the user manual to include two fastq input options, one for forward direction, the other for the reverse direction, I am wondering if these are same thing as the first and second mate as described in the user manual? If not, how should I include forward and reverse fastq files?
As shown in the command, I only used the human transcriptome database option. Since I only set up human_rna_db, does it mean the reads does not belong to human_rna_db is the bacterial reads? i.e., I should use (c) and (d) from the following description? Or is there a way that I can separate human and microbial sequences from the same fastq file?

Files for just the user manual human_rna_db database:
(a). seq_kneaddata_paired_human_rna_db_bowtie2_contam_1.fastq: Reads from the first mate in situation (1) above that were identified as belonging to the human_rna_db database.
(b). seq_kneaddata_paired_human_rna_db_bowtie2_contam_2.fastq: Reads from the second mate in situation (1) above that were identified as belonging to the human_rna_db database.
(c). seq_kneaddata_paired_human_rna_db_bowtie2_clean_1.fastq: Reads from the first mate in situation (1) above that were identified as NOT belonging to the human_rna_db database.
(d). seq_kneaddata_paired_human_rna_db_bowtie2_clean_2.fastq: Reads from the second mate in situation (1) above that were identified as NOT belonging to the human_rna_db database.

Thank you so much!

Zhaozhong

sagunmaharjann · June 3, 2021, 4:16pm

Hi @hellofuture,

Apologies for the late reply.

The two fastq input files can be passed as a parameter in kneaddata like this : --input sample1.R1.fastq --input sample1.R2.fastq
Yes, the reads that do not belong to human_rna_db are the microbial reads that you would be interested in based on your use case. - (c) and (d) from the above description.

Regards,
Sagun

Topic		Replies	Views
Which reference DB should I use? KneadData	0	527	July 4, 2022
Options for paired input files KneadData	0	667	May 20, 2022
Kneaddata paired output ordering or interleaving KneadData	1	1066	March 24, 2021
Can I run Kneaddata with catenated forward and reverse reads file? KneadData	5	1048	December 24, 2020
Single-end or Paired-end? KneadData	0	298	October 20, 2023

KneadData for dual-transcriptome RNA-seq data

Related topics