Compare Metaphlan4 output with CuratedMetagenomicData?

Minuzzi · July 13, 2023, 3:24pm

Hello everyone,
I am analysing some stool samples from a non-westernized population and I would like to compare my profile with that reported in the CuratedMetagenomicData dataset.
I run metaphlan4 on my samples and other samples from NCBI, with default parameters, and I merged the profile of my samples in a unique table. These are the commands used:

metaphlan SAMPLE --input_type fastq --nproc 3 --bowtie2out Metaphlan4/Sample.bowtie2.bz2 --bt2_ps very-sensitive --read_min_len 30 > Sample_profile.txt

merge_metaphlan_tables.py Metaphlan4/*.txt > DatasetMERGED_Metaphlan4.txt

Than I started working on curated metagenomic data selecting stool samples from healthy adults in this way:

cs<-sampleMetadata |>
filter(disease==“healthy”)|> filter(body_site ==“stool”) |> filter(age_category==“adult”)|> filter(country==c(“ITA”,“GBR”,“IRL”,“DNK”,“TZA”,“CMR”,“ETH”,“IDN”,“PER”,“FJI”))|>
returnSamples(“relative_abundance”,rownames=“short”)
assayNames(cs) <-“counts”
altExps(cs) <-splitByRanks(cs)
cs.ps<-makePhyloseqFromTreeSummarizedExperiment(cs,abund_values=“relative_abundance”)

cs.species<-tax_glom(cs.ps,taxrank=“species”)

Than I introduced my dataset to R and phyloseq using this function (Import a table of MetaPhlAn taxonomic abundances into phyloseq · GitHub)

mphlanin ← read.csv(“DatasetMERGED_Metaphlan4.txt”, sep = “\t”, strip.white = T, stringsAsFactors = F, row.names = 1)
metadata ← read.delim(“MappingDataset.txt”, header=TRUE, sep = “\t”)
metadatadf ← data.frame(metadata)
row.names(metadatadf) ← metadatadf$X
sample ← sample_data(metadatadf)
ps= metaphlanToPhyloseq(mphlanin)
ps.species<-tax_glom(ps,taxrank=“species”)
pseq ← transform(ps.species, “compositional”)
ps.all<-merge_phyloseq(pseq,cs.ps)

But when I perform some beta-diversity analysis I get that all the samples I run with metaphlan4 create a separate clusters with respected to that of curatedMetagenomicData, which is not expected to me since I am analysing close populations to that present to the database and I got the idea that there is some inner bias in che merging part of the two dataset. Has anyone experienced the same issue and how can I solve that? Which is the right way to compare my data to that of curatedMetagenomicData ?
thanks to everyone

franzosa · July 13, 2023, 8:12pm

I believe curatedMetagenomicData (cMD) currently contains profiles from MetaPhlAn 3 (but an update to MetaPhlAn 4 is in progress). Since MetaPhlAn 4 outputs profiles using a totally different (SGB-based) taxonomy, it makes sense that those profiles would spuriously look totally different from the cMD profiles. If you did the comparison at the genus level you would probably see something more sensible, although there would still be a big batch effect from combining v3 and v4 profiles.

Minuzzi · July 17, 2023, 9:06am

Yes, indeed I have tried also at Genus level and it is more or less the same, the batch effect is still very visible. Ok so i have to wait for the update,
thank you a lot for your reply

Asier_Fernandez · February 27, 2024, 12:24pm

Hi,

I would like to ask if the Metaphlan4 profiles of curatedMetagenomicData (cMD) are already available.

Thanks a lot in advance!
Asier

Topic		Replies	Views
Conceptual Gap?: diversity metrics from metaphlan metagenomic output MetaPhlAn	3	2904	March 5, 2021
Announcing MetaPhlAn 4.1.1 release MetaPhlAn	5	930	March 30, 2025
MetaPhlAn4 issue with example files MetaPhlAn	3	246	August 25, 2023
MetaPhlan output MetaPhlAn	6	1233	December 10, 2020
Cannot reproduce results of MetaPhlAn 4.0 tutorial MetaPhlAn	4	678	September 1, 2023

Compare Metaphlan4 output with CuratedMetagenomicData?

Related topics