Hello there,
First, I’d like to thanks the Biobakery team for their workflows. I used Metaphlan and succeded in everything I wanted to do, mainly thanks to the tutorials and forums available.
I’d expose briefly my issue : I ran Metaphlan on clean shotgun sequencing fastq files, exported the absolute abundance table to Phyloseq, ran alpha, beta diversity, and differental abundance analysis. Metaphlan allowed me to identify micro-organisms differently abundant between my control and case groups.
The next step in my research is to investigate whether those micro-organisms can be detected as well by qPCR. This is what I did :
- I downloaded the database used by Metaphlan containing all the marker genes (MetaPhlAn 3 | Zenodo)
- I extracted the 150 marker genes to identify the bacterium Oscillobacter_sp_57_20.
- I looked for some of the marker genes on Uniprot (BHW41_05790, for instance).
- I landed on ENSEMBL, where I found the gene sequence, for instance :
 https://bacteria.ensembl.org/Oscillibacter_sp_57_20_gca_001916835/Gene/Sequence?g=BHW41_03065;r=Ley3_66761_scaffold_296:32041-33135;t=OLA40997
- Then, I blasted the obtained sequence to check whether this gene is specific to this bacterium of interest, Oscillobacter_sp_20.
- For one of the 150 marker gene, I found a very low e-value for Oscillobacter sp. For another gene, there were multiple bacterium for which the sequence hit. For a third gene, no bacterium was associated with the gene sequence, but only viruses !
Thus, I am wondering whether I misunderstood something in my protocole, or if I understood correctly. In the later, it implies that there are multiple false positives when aligning against Metaphlan database ?
Any insight would be helpful 