PhyloPhlAn version 3.0.67 (24 August 2022)
I am using 222 genomes to create a phylogenetic tree using Phylophlan3. I wanted to get marker DNA for each genome (to use it for other analysis) but I am only getting 202 makers files, meaning for 20 genomes, it’s missing. On checking, even the final tree also has only those 202 genomes. I then checked the intermediate files, only clean_dna and map_dna has files for all 222 genomes, from next steps, only 202 genomes are considered. Also, since all my genomes come from the same genus, I expected them to have similar (number of) markers. Please help me with this. In the meantime, I will rerun Phylophlan3 on the same inputs.
Hi @Adarsh_Singh, thank you for using PhyloPhlAn. Do you have the log output from your PhyloPhlAn analysis? Which database of markers did you use?
I think in your case you meant that 20 of your 222 genomes were discarded (not markers, right?).
If you used as database -d phylophlan
that’s automatically setting the --min_num_markers 100
, so you can check from the log output of PhyloPhlAn if your 20 genomes were discarded because they did not contain at least 100 out of the 400 universal markers. if that’s the case, you might want to consider specifying the --min_num_markers
parameter in the command line, lowering the minimum number of markers each input genome should map against, although I would suggest not lowering it too much as you might end up with a very happy alignment for some genomes and not be confident in their phylogenetic placement that will be driven by only a small subset of positions.