Validating with Critical Assessment of Metagenome Interpretation (CAMI) challenge datasets

Nazmul_Huda · June 28, 2022, 8:31pm

I have installed MetaPhlAn 3. Now, I want to validate it with the Critical Assessment of Metagenome Interpretation (CAMI) challenge datasets. The format of the CAMI gold standard result is different from the MetaPhlAn 3. My understanding is that the CAMI data has included plasmid and other sequences in the final relative abundance. I was wondering if removing the unidentified portion from strain-level data and then converting it to 100%, would make the CAMI result equivalent to MetaPhlAn 3.

aitor.blancomiguez · June 29, 2022, 8:48am

Hi @Nazmul_Huda
You can use the --CAMI_format_output parameter to produce a CAMI-like results for MetaPhlAn 3.

Nazmul_Huda · June 29, 2022, 7:47pm

Hello @aitor.blancomiguez,

Thank you so much for your reply. Using “–CAMI_format_output” produces the taxonomy column similar to the CAMI gold standard. However, the percentage is different. Please see the attached files “goldstandard_high_1.profile.txt” and “RH_S001__insert_270.profile.txt”, respectively for CAMI gold standard and MetaPhlAn3. My understanding is that the CAMI included plasmid and others in the relative abundance calculation whereas MetaPhlAn does not include them. Therefore, I was wondering if removing the unidentified portion from strain-level data, converting it to 100%, and agglomerating at the species level, would make the CAMI result equivalent to MetaPhlAn 3.

Regards,
Nazmul
RH_S001__insert_270.profile.txt (98.5 KB)
goldstandard_high_1.profile.txt (253.3 KB)

aitor.blancomiguez · June 30, 2022, 9:55am

Hi @Nazmul_Huda , if CAMI includes plasmid and others in the relative abundance calculation, you are right and the percentages cannot be directly compared to MetaPhlAn3. However, I think your proposal would work!

Nazmul_Huda · June 30, 2022, 3:44pm

Thank you so much for confirming.

Nazmul_Huda · August 6, 2022, 1:11am

Open-community Profiling Assessment tool (OPAL) has all the required tools to evaluate CAMI results. Additionally, CAMI2 sequence data is suggested instead of CAMI1 since CAMI1 is several years old. I used the mouse gut dataset which is available here and here.

Topic		Replies	Views
Metaphlan marker level analysis MetaPhlAn	6	841	July 22, 2020
There have significant difference of abundance estimate using database_202103 or 202307 MetaPhlAn	2	90	March 5, 2024
Question about metagenome profile output concerning "aditional species" MetaPhlAn	3	1190	July 24, 2020
Metaphlan output question MetaPhlAn	4	771	August 25, 2022
Calculating Alpha and Beta Diversity with Metaphlan3 Rel Abund MetaPhlAn	1	1805	May 27, 2022

Validating with Critical Assessment of Metagenome Interpretation (CAMI) challenge datasets

Related Topics