Hi while using the new Metaphlan3 on my samples with high Bacillus amyloliquefaciens presence i noticed that the program didnt report this species at all. I checked the database mpa-v30 https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAAlyQITZuUCtBUJxpxhIroIa/mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2?dl=1 and all I could find was some marker genes of Lentibacillus amyloliquefaciens. And when I checked the database mpa-v20 (Metaphlan 2) https://www.dropbox.com/s/nhhx7i7glwdahru/mpa_v20_m200_marker_info.txt.bz2?dl=1 there were some Bacillus amyloliquefaciens marker genes.
Could you shed some light as why it is missing in the new database?
Best,
Adrian
Hi,
Bacillus amyloliquefaciens is profiled using the markers for the Bacillus subtilis group since markers for most of the species present under that group did not have a sufficient number of species-specific markers and this lead to the spurious identification of most of the species present in the B. subtilis group
Hi,
I have a similar problem. Bacillus subtilis cannot be detected in Metaphlan3 with mpa_v30_CHOCOPhlAn_201901 database. I downloaded the B.subtilis genomes(NC_000964.3) from NCBI and generated simulated reads. Metaphlan2 with mpa_v20_m200 works well, but Metaphlan3 with mpa_v30_CHOCOPhlAn_201901 incorrectly reports Bacillus_murimartini and Bacillus intestinalis that don’t exist.
curl -o Bacillus_subtilis.fna.gz https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/009/045/GCF_000009045.1_ASM904v1/GCF_000009045.1_ASM904v1_genomic.fna.gz
gzip -d Bacillus_subtilis.fna.gz
#conda install -c bioconda insilicoseq
iss generate -g Bacillus_subtilis.fna --n_reads 1M -m Hiseq --output B.subtilis --cpus 4
metaphlan B.subtilis_R1.fastq,B.subtilis_R2.fastq --bowtie2db /ldfssz1/ST_META/share/User/tianliu/database/metaphlan --index mpa_v30_CHOCOPhlAn_201901 --nproc 4 --input_type fastq --bowtie2out B.subtilis.bowtie2.bz2 -t rel_ab_w_read_stats > B.subtilis.mp3.profile
The results in B.subtilis.mp3.profile are as follows:
#clade_name clade_taxid relative_abundance coverage estimated_number_of_reads_from_the_clade
s__Bacillus_murimartini 2|1239|91061|1385|186817|1386|171685 84.94604 0.12687 528925
s__Bacillus_intestinalis 2|1239|91061|1385|186817|1386|1963032 15.05396 0.02248 90901
I checked the database file and the MD5 values showed that it is complete.
1a342b73df3ff8e534775557b0d4924b mpa_v30_CHOCOPhlAn_201901.tar
B.subtilis is a common bacterium so I worry about it will affect the abundance of other species. Could you please help to check why it is missing?
Best,
Liu
Hi,
I’d like to second TianLiu’s request. I did pretty much the same test before I stumbled upon this thread and got exactly the same result.
While I understand that species-specific markers are scarce, the results as they stand are misleading. It would be really helpful if the markers for that clade could be revisited.
Thank you in advance
1 Like
Hey -
Would be great to get an update on this, especially with regards to metaphlan4. We are seeing a similar issue when analysing the ZymoBIOMICS Microbial Community Standard that has Bacillus subtilis at a theoretical abundance of 12% (gDNA). Kraken2 gave us exactly 12% Bacillus (genus level), whereas metaphlan4 gave us Bacillus vallismortis at 0.01909%.
All the best
Tom
I have the same issue with Thomas_Sewell. Really like the tool of metaphlan4, but dramatically underestimating or missing the identification of Bacillus subtilis is a little disappointing. Bacillus subtilis is a species that has diverse members. I wonder if that is the reason. If so, would other “super-species” have a similar issue? It will be really appreciated if someone can look into this.
Thanks!
Shuiquan
Hi @Thomas_Sewell @shuiquan_tang
In the current version of the metaphlan 4 database, we had multiple SGBs labelled as B. subtilis (e.g.
SGB7788 and SGB82812). Unfortuntely, the strain employed in the Zymo mock belong to one of the B. subtilis SGB for which we were not able to find enough marker genes
1 Like
Great, thank you for addressing that.