Default minimum read length change

Hey @

Summary

This text will be hidden

we are doing analysis of metagenome samples of varying sizes. the sequence yield range is from 2.68Megabasepairs to 39 megabasepairs. so I have a list of questions again for the processing of the samples using metaphlan 3:
• you have mentioned the reads less than 70bp would by default be discarded by metaphlan 3 but we have a little bit of issue here that we have trimmed our files having minimum read length at 35bp so is there a way that we can keep those reads without getting discarded for the analysis?
• second question is that since our metagenome files range from 2.68 Mbp to 39Mbp per file, what method would you suggest to normalize the samples? since the range is having a large difference so it would affect the output definitely, so is it a good idea to subsample all the files with reads equal to the file that has minimum reads so that every sample has equal reads?

Thanks

Hi,

Yes, you can use the option --read_min_len 34 to allow reads longer than 34bp to pass the filter.

Maybe subsampling to 2.68Mbp would be too much, maybe you can discard the smallest metagenomes and then subsample at higher depth.

1 Like
  • What if no subsampling done?

Thank You for the response but what if we do not subsample at all and keep everything we have in all the samples range from 2.68Gb to 39 Gbp of our data, then how is that going to affect the diversity results and upto what extent? Would it be a bad idea or is there a way we can justify by keeping everything for the analysis that our results are reliable?

  • Is there a way that can normalize the profiles that we have obtained from metaphlan3?
    Also I am really confused because I do not find a proper explanation for the normalization issue anywhere. Generally the fastq files from the sequencing machine do not have the same coverage or read depth so how to process these kind of samples? Please help me on this if you are able to understand my problem.

Thank you
Saraswati Awasthi