Is "read_fastx.py" wrong in computing multi-file avg_read_length?

As you see in below picture(“metaphlan3/utils/read_fastx.py” about line 141):


the value “f_avg_read_length” stores the average read length in each file.
The “avg_read_length” equals to f_avg_read_length / the input file number.
However, line numbers may be different in each file. As a result, I think a weighted calculation is correct for this:

If I were wrong, is it for additional considerations?

Hi.
I found the same Script in MetaPhlAn4.0.2:


The computing method was changed like what I thought before. However, might avg_read_length be less than normal if you don’t multiplicate nreads in the high-light position ?

Hi @cquxiaoy
The f_avg_read_length reported by the read_and_write_raw function is not really the avg_read_length of the file but a sumup of the read lenghts:
MetaPhlAn/read_fastx.py at master · biobakery/MetaPhlAn · GitHub lines 98 and 106

Sorry to make such a mistake! :melting_face: