I have installed MetaPhlAn 3. Now, I want to validate it with the Critical Assessment of Metagenome Interpretation (CAMI) challenge datasets. The format of the CAMI gold standard result is different from the MetaPhlAn 3. My understanding is that the CAMI data has included plasmid and other sequences in the final relative abundance. I was wondering if removing the unidentified portion from strain-level data and then converting it to 100%, would make the CAMI result equivalent to MetaPhlAn 3.
You can use the
--CAMI_format_output parameter to produce a CAMI-like results for MetaPhlAn 3.
Thank you so much for your reply. Using “–CAMI_format_output” produces the taxonomy column similar to the CAMI gold standard. However, the percentage is different. Please see the attached files “goldstandard_high_1.profile.txt” and “RH_S001__insert_270.profile.txt”, respectively for CAMI gold standard and MetaPhlAn3. My understanding is that the CAMI included plasmid and others in the relative abundance calculation whereas MetaPhlAn does not include them. Therefore, I was wondering if removing the unidentified portion from strain-level data, converting it to 100%, and agglomerating at the species level, would make the CAMI result equivalent to MetaPhlAn 3.
Hi @Nazmul_Huda , if CAMI includes plasmid and others in the relative abundance calculation, you are right and the percentages cannot be directly compared to MetaPhlAn3. However, I think your proposal would work!
Thank you so much for confirming.