Sam file & number of reads

Hi !

I am writing this message because I have some questions regarding MetaPhlAn3, and I was hoping you could help me with that.

I am running MetaPhlAn3 on fastq files using the following parameters: --read_min_len 70, --min_mapq_val 5, --add_viruses, to obtain bowtie2 and sam outputs.

Then, I used the bowtie2 output to do other analyses (like “-t reads_map”).

I was first surprised to see that the SAM files still contain reads with MAPQ < 5. Are the parameters not applied to the SAM output ? Yet, it seems that it excluded reads smaller than 70bp.

Secondly, when I discarded manually reads with MAPQ<5 in the SAM, I did not find the same numbers of reads than the one in the “reads_map” file and I am not sure to understand why.

For example, to give you an order of the difference, for one individual, I obtain 149352 reads in the SAM file, 104497 reads​ when filtering on the MAPQ<5, while I have 105108​ reads in the “reads_map” file.

If you have any explanation for that, I would appreciate it !
Thank you in advance for your answer
Anthony

When --add_viruses is added, alignments produced by viral markers are not filtered by their MAPQ value, this is a reason why you still find those alignments in the bowtie2out.
All the alignment are reported in the SAM file, then the filtering is done on the fly internally by MetaPhlAn and the “good” alignments are reported in the bowtie2out file. Reads shorted than 70bp are not present since the FASTQ/A parser would discard them while reading and feeding the input file to Bowtie2.