Hello Mick, thank you for your message.
We do have a part of the wiki trying to explain the output here: Home · biobakery/phylophlan Wiki · GitHub
But let me try to explain it again here as this could also be found by others in the future.
your output file should resemble the following:
my_bin (k|u)SGB_ID:taxa_level:taxonomy:average_mash_distance [(k|u)SGB_ID:taxa_level:taxonomy:average_mash_distance]
And you’ll find as many
columns as specified by the
-n/--how_many param (10 is the default).
This list of columns are sorted by their
average_mash_distance, so the first one will be the closest.
Now, for the other parts:
my_bin will be the name of your input genome/MAG
(k|u)SGB_ID: will tell you the SGB ID (
u indicate whether it is a known (
k) or an unknown (
u) SGB. This follows the rule we used in this work, where
kSGB are those that contain a reference genome deposited in public databases (filtered a bit, not just all genomes in NCBI) and
uSGB will only contain MAGs.
taxa_level: can be either
Other, depending at which taxonomic level the SGB has been assigned to. Where
Species will only be used for kSGBs practically because only those will have a taxonomic label assigned at the species level within the SGB.
Other are for the uSGB to indicate (to put it very simply) how “far” a reference genome is found.
Genus means that the GGB that SGB belongs to contains a reference genomes and hence its taxonomic label is used up-to the genus level.
Family similarly but the reasoning is done at the FGB level up-to the family taxonomic level.
Other instead it means that both GGB and FGB assigned to that SGB are both unknown (hence an uGGB and uFGB, respectively). In this case, we report it as
Other and the taxonomic label is retrieved by taking the one assigned to the closest reference genome.
taxonomy: is the full taxonomic label assigned to the SGB
average_mash_distance: is the average Mash distance of the input bin w.r.t. all the genomes in the SGB. Like in your case, about 2.5% average Mash distance from all the MAGs in the uSGB_48157.
In your case, both uSGB_48156 and uSGB_48155 are too distant to consider your MAG as a potential new member of them because their Mash average distance is >5%. While it appears that your MAG is a new member of uSGB_48157 with an average Mash distance of ~2.5%.
Sorry for the long message, but I hope this helps.