Is "read_fastx.py" wrong in computing multi-file avg_read_length?

cquxiaoy · December 2, 2022, 3:08am

As you see in below picture(“metaphlan3/utils/read_fastx.py” about line 141):

the value “f_avg_read_length” stores the average read length in each file.
The “avg_read_length” equals to f_avg_read_length / the input file number.
However, line numbers may be different in each file. As a result, I think a weighted calculation is correct for this:

If I were wrong, is it for additional considerations?

cquxiaoy · December 2, 2022, 3:18am

Hi.
I found the same Script in MetaPhlAn4.0.2:

The computing method was changed like what I thought before. However, might avg_read_length be less than normal if you don’t multiplicate nreads in the high-light position ?

aitor.blancomiguez · December 6, 2022, 8:44am

Hi @cquxiaoy
The f_avg_read_length reported by the read_and_write_raw function is not really the avg_read_length of the file but a sumup of the read lenghts:
MetaPhlAn/read_fastx.py at master · biobakery/MetaPhlAn · GitHub lines 98 and 106

cquxiaoy · December 6, 2022, 8:56am

Sorry to make such a mistake!

Topic		Replies	Views
Nreads of bowtie2 file MetaPhlAn	5	316	October 1, 2024
Metaphlan4 estimated_number_of_reads MetaPhlAn	1	366	March 5, 2024
Discrepancy Between Input Reads and Estimated Read Counts in MetaPhlAn 4 MetaPhlAn	3	64	July 10, 2025
Can we add a column with the number of reads for each taxon next to relative_abundance? MetaPhlAn	7	489	July 25, 2025
Recommended align two fastq files that have different amount of reads MetaPhlAn	1	29	November 8, 2024

Is "read_fastx.py" wrong in computing multi-file avg_read_length?

Related topics