MetaPhlAn 4.x Support for Long Reads?

Hi! :blush:

I’m trying to use MetaPhlAn 4.1.1 (as part of HUMAnN 3.9) on some very long Oxford Nanopore reads (10,000 up to 100,000bp and larger).

Unfortunately, it is consuming an enormous amount of RAM. A 28mb file containing reads no larger than 100,000bp (the smallest fastq.gz in the dataset) consumed 70GB of RAM to pass the Bowtie2 step. Other files in my dataset (<100GB) reported OOM errors at this stage.

I have seen that Bowtie2 performs poorly on long reads. Is there any interest in integrating a long-read aligner like Minimap2 to circumvent this issue?

Also: Is there any other way I can use HUMAnN for my dataset? I am willing to bypass using MetaPhlAn entirely, if you guys have any ideas… :grinning_face_with_smiling_eyes:

Hi @PileOfAmoebas! We are testing MetaPhlAn on long reads with Minimap2 and it will be available in some weeks with the next MetaPhlAn release. The code we are testing is in the code_refactor branch of the MetaPhlAn repo if you are interested