Validating with Critical Assessment of Metagenome Interpretation (CAMI) challenge datasets

I have installed MetaPhlAn 3. Now, I want to validate it with the Critical Assessment of Metagenome Interpretation (CAMI) challenge datasets. The format of the CAMI gold standard result is different from the MetaPhlAn 3. My understanding is that the CAMI data has included plasmid and other sequences in the final relative abundance. I was wondering if removing the unidentified portion from strain-level data and then converting it to 100%, would make the CAMI result equivalent to MetaPhlAn 3.

Hi @Nazmul_Huda
You can use the --CAMI_format_output parameter to produce a CAMI-like results for MetaPhlAn 3.

Hello @aitor.blancomiguez,

Thank you so much for your reply. Using “–CAMI_format_output” produces the taxonomy column similar to the CAMI gold standard. However, the percentage is different. Please see the attached files “goldstandard_high_1.profile.txt” and “RH_S001__insert_270.profile.txt”, respectively for CAMI gold standard and MetaPhlAn3. My understanding is that the CAMI included plasmid and others in the relative abundance calculation whereas MetaPhlAn does not include them. Therefore, I was wondering if removing the unidentified portion from strain-level data, converting it to 100%, and agglomerating at the species level, would make the CAMI result equivalent to MetaPhlAn 3.

Regards,
Nazmul
RH_S001__insert_270.profile.txt (98.5 KB)
goldstandard_high_1.profile.txt (253.3 KB)

Hi @Nazmul_Huda , if CAMI includes plasmid and others in the relative abundance calculation, you are right and the percentages cannot be directly compared to MetaPhlAn3. However, I think your proposal would work!

Thank you so much for confirming.

Open-community Profiling Assessment tool (OPAL) has all the required tools to evaluate CAMI results. Additionally, CAMI2 sequence data is suggested instead of CAMI1 since CAMI1 is several years old. I used the mouse gut dataset which is available here and here.