Using a single "general" MetaPhlAn bugs list for all samples

Camila_Gazolla · July 12, 2023, 12:11am

Hello,

I am interested in running HUMAnN 3.0 on a collection of over 12K metagenomes, so it is a huge run of jobs…

I was wondering how using a single “general” MetaPhlAn bugs list (generated from a sampling in the metagenome collection) would affect the results.

Any comment about this would be appreciated!

franzosa · July 13, 2023, 8:09pm

This is not something we usually do, but other people have tried it in the past. It mostly hurts your computational efficiency since you’re mapping each sample against one large database instead of smaller sample-specific databases.

If you want to do this, I would profile all of the samples with MetaPhlAn first, then combine the resulting profiles into a joint list of “species of interest.” This could be any species that was seen in any sample, or maybe more conservatively any species that was seen at >0.1% abundance in at least one sample. After you have that species list, you would concatenate their pangenomes from the HUMAnN ChocoPhlAn database into one big FASTA file and then index that file for use with bowtie2 inside of HUMAnN. You can then pass that custom index to HUMAnN to profile each sample against.

Note that you just want to build that big index once and use it for all samples. If you passed the combined taxonomic profile to HUMAnN for each sample it would rebuild the big index over and over, which would be very time consuming.

Camila_Gazolla · July 20, 2023, 2:03am

Dear Eric Franzosa,

Thank you so much for your answer!

I was wondering if you have any other general advice on how to lower the computational time for HUMAnN 3.0.

Best,

franzosa · July 20, 2023, 7:56pm

Most of the runtime is spent in translated search. You can bypass translated search to speed up the runs, but that’s really only admissible for very well characterized environment types (where you expect most of the taxa to be known), and even then you could be losing out on important unclassified signals.

Another option is to use the EC-filtered translated search database instead of the full database. This restricts translated search to proteins with an EC annotation (~10% of the total). This is faster and still allows you to explore unclassified enzyme and pathway contributions, but you lose coverage of less-well-annotated proteins.

Topic		Replies	Views
Query regarding HUMAnN2 HUMAnN	2	637	April 6, 2020
Discrepancy between metaphlan3 and Humann3 in profiling micriobiota HUMAnN	8	1111	September 12, 2020
Need for human sequences in chocophlan database HUMAnN	2	324	August 11, 2023
HUManN3 functional annotation doubts HUMAnN	6	1335	June 29, 2022
ChocoPhlAn/UniRef 201901b vs 201901 HUMAnN	3	1139	September 3, 2021

Using a single "general" MetaPhlAn bugs list for all samples

Related topics