Strange output from paired end kneaddata input

Hi @lauren.j.mciver
I have run kneaddata with paired-end data but i am getting some strange output. I have a forward and a reverse read file each of size 780MB. But when I run kneaddata it gives two output files of size 1.3 GB and 148MB. One file increased in size drastically and another decreased drastically.
I have checked the read counts of two input and catenated output file. I have seen that total read counts in the catenated output file is only few thousand less (among millions) than total read counts from forward and reverse files.
So, should I depend on the data and catenate the outputs followed by HUMAnN analysis? or, do not rely into the outputs?

Thanks,
DC7

Hi DC7, I think it might be best to double check what might be up with the kneaddata runs and get the outputs to look as expected before continuing on to running HUMAnN. From what you describe I think kneaddata is not tracking the read pairs correctly. This is usually due to a sequence identifier (the first line in a read set in fastq format) being of an unexpected format (eg having spaces or not having a pair identifier). Would you double check the format of the sequence identifiers (just check the first line of a few of the fastq input files) and see if it might fall into any of the cases of an unexpected format?

Thank you,
Lauren

1 Like

please refer to this query

many thanks