I’m been trying to use phylophlan_metagenomic on my non-human sample bins, but finding that its struggling to categorise most samples at a species/genera level (because of SGB20 being a human dataset). What I couldn’t figure out is whether/how it would be possible to an alternative dataset for taxonomic assignment?
Hi @jo240 and thanks for using PhyloPhlAn. Which database are you using?
The latest one available SGB.Sep20 should contain several SGBs coming from non-human hosts, but maybe your non-human samples are not covered. Which non-human samples do you have? (so that I can check if we do have genomes and MAGs covering that host type).
Also, soon I’ll make available the SGB.Jan21 that will contain more SGBs and should better cover non-human hosts.
Hi @jo240,
the Jan20 version of the database contains for sure SGBs coming from sheep rumen samples. Consequently, Sep20 also contains the same SGBs (plus something more because it is incremental).
Could you please share with us some more info about your MAGs so we can try to better understand what is happening here?
Could you maybe also share the output of phylophlan metagenomic?
Sure. There were 6 samples from approx 30 week old sheep rumen from a
gnotobiotic trial of three strains of bacteria and archaea.
The mags were generated from a co-assembly using metabat2, in which we
got 99 bins, then removed 53 of these for completion/contamination qc (less than 50% completion and more than 10% contaminatiom removed). 25 of the 46 were then identified to a species level by phylophlan.
Sure, see attached.
Hi @jo240,
I mapped the closest SGBs (Sep20) in your phylophlan metagenomic output table with the same SGBs in Jan21.
In your table, 21 bins resulted close to unknown SGBs. In Jan21, 10 out of these 21 uSGBs are now kSGBs.
Additionally, 12 out of these 21 uSGBs are defined as Other in your table. This means that the taxonomy of the SGBs is defined up to the phylum level. In Jan21 instead, the remaining 11 uSGBs are all defined at least up to the family level, except the uSGBs 14124 and 36962 that are still defined as Other.
Also, the uSGBs 47046 and 53414 do not exist anymore in Jan21. They have been merged respectively into the kSGBs 6939 and 5765.
So, as you can see, the overall situation would substantially improve with the Jan21 version.
However, I would strongly suggest to avoid using this mapping for your analysis and run phylophlan metagenomic again as soon as @f.asnicar will make Jan21 available, because the result could potentially change.
Hope this will help and thanks again for using PhyloPhlAn!