Constructing phylogenetic tree

I am now trying to construct phylogenetic tree with the output of MetaPhlAn.
I extract mapped SGB id then i could successfully take sequences coresonding SGB id.
At this point, I am wondering whether it is ok to make phylogenetic tree with these sequences since I found that there are a lot of differences of length among sequences.
So I think that sequences would not be well aligned.
Is it fine? or should I find other way to make it.

I am not sure if I understand your question completely. What would you like to achieve with such a phylogenetic tree? Depending on your goal, you could try different strategies. However, in almost every case using the clade-specific marker genes used by MetaPhlAn to profile the taxa in a sample won’t work for building a tree.

If you would like to have a tree of life of all taxa identified in your sample, e.g. for visualisation or UniFrac analyses, you could build such a tree using PhyloPhlAn3 using 400 universal marker genes. With this you would guarantee that there is sufficient overlap in the multi-sequence alignment across taxa that you can build a tree with high confidence.

1 Like

I just want to make an input of Phyloseq in R for further analysis, not for the special purpose,
and I understand why it is not reliable to build a tree using SGB information.

So I tried another way. I used a phylogenetic tree generated by PhyloPhlAn, provided by MetaPhlAn as a reference tree.
Then, I extract mapped SGB ids by my result (using MetaPhlAn) using Biopython.

For my aspect, I think it is not wrong way to build a tree because I just remove branches and nodes that are not included to my result. But I am not sure this is right way.

If you want a tree for generating PhyloSeq objects, that’s the way to go. In case you are running MetaPhlAn3, there is such a readily available tree in the R package curatedMetagenomicData that contains the vast majority of the taxa that you can detect in human-associated microbiome samples.

For MetaPhlAn4, there is no such tree available, yet.

1 Like

Thank you:)
It was enough to solve my current issue.

Hi @Jilim97,
I was wondering how did you generate phylogenetic tree using PhyloPhlAn using MetaPhlAn as a reference tree. I am using metaphlan 4.1 which already has phylophlan installed with it.

Thanks!

hello, I used the the tree that Metaphlan team already made.
You can find this tree, which contains all of the SGB id, on their Github.
And I just remove SGB Ids that are not found in my sample. In other word, I keep the original tree and eliminating ids not present in my samples using python code.

Hello,
Thanks @Jilim97!
I guess you are talking about this one -

I guess you have used ete3 python package to remove SGB Ids that are not found in your sample. And for my dataset, I have also found some SGB ids in my sample that are not present in the metaphlan nwk tree. Did this happen in your case?