Hi everyone,
I was wondering if anyone would be able to help clarify a potential concern I have with using the bt2 preset very-sensitive-local particularly in regards to canine gut samples.
I am aware that recent versions of the MetaPhlAn4 database has been updated to include a large number of new MAGs from non-human animals (I am currently using vJan25_2025_03). However, with default settings I am seeing about 35% of these canine samples having about 20% unclassified reads with some being as high as 75% unclassified.
Based on discussion in other posts I was able to greatly reduce the number of unclassified reads by changing the bt2 preset to very-sensitive-local. This reduced the number of samples having unclassified reads to 4% with most samples having no unclassified. It is important to note here that this is relatively shallow dataset with ~3,400,000 ± 1,400,000 (2x150) reads per sample.
I performed prevalence and abundance filtering of 10% at 0.01% relative abundance, to reduce the chance of false positives, and the profiles I am seeing appear plausible.
Has anyone observed similar proportions of unclassified reads in non-human, non-mouse gut metagenomes? I noticed in a prior thread that use of the very-sensitive-local preset was discouraged for mouse samples. Is there any general guidance or consensus on the use of the very-sensitive-local preset in these contexts, and how concerned should one be about increased false positives after conservative filtering?
Thanks so much for all the work on these amazing tools!