Dear MetaPhlAn Developers and Community,
First, thank you for developing and maintaining MetaPhlAn4; it has been incredibly useful for our research.
I have recently been using MetaPhlAn4 for the taxonomic profiling of our metagenomic cohorts and have identified several specific SGBs (including uncharacterized uSGBs) that show potential as diagnostic biomarkers for a specific disease. To move forward with clinical validation, I am planning to develop qPCR assays to specifically detect and quantify these target SGBs.
Since these are SGBs, I am looking into using the MetaPhlAn4 marker database (mpa_vOct22_CHOCOPhlAnSGB_202212_SGB.fna.bz2) as the source for my target sequences. I have a few questions regarding the best practices for this approach:
1. Suitability of the .fna database: Can I directly extract the sequences associated with my target SGBs from the mpa_vOct22_CHOCOPhlAnSGB_202212_SGB.fna.bz2 file to design my qPCR primers?
2. Nature of the sequences: Do the nucleotide sequences in this .fna file represent actual genomic contigs/fragments of the corresponding SGBs from which they were derived, or are there any artificial concatenations I should be aware of before running them through primer design software like Primer3?
3. Marker selection strategy: For any given SGB in the database, there are often hundreds of associated marker sequences (typically labeled with UniRef90 or UNK identifiers). Since a multiplex qPCR assay only requires 1 to 2 highly specific targets per species, what would be your recommended strategy to select the most robust and specific marker(s) out of these hundreds? Are there specific criteria (e.g., sequence length, coreness, lack of horizontal gene transfer risk) that I should prioritize to ensure the qPCR assay mimics the specificity of the MetaPhlAn4 algorithm?
Any guidance, insights, or recommended pipelines for translating MetaPhlAn4 SGB markers into wet-lab qPCR targets would be highly appreciated.
Thank you very much for your time and assistance!