The bioBakery help forum

Missing Bacillus species in Metaphlan3

Hi while using the new Metaphlan3 on my samples with high Bacillus amyloliquefaciens presence i noticed that the program didnt report this species at all. I checked the database mpa-v30 https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAAlyQITZuUCtBUJxpxhIroIa/mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2?dl=1 and all I could find was some marker genes of Lentibacillus amyloliquefaciens. And when I checked the database mpa-v20 (Metaphlan 2) https://www.dropbox.com/s/nhhx7i7glwdahru/mpa_v20_m200_marker_info.txt.bz2?dl=1 there were some Bacillus amyloliquefaciens marker genes.

Could you shed some light as why it is missing in the new database?

Best,
Adrian

Hi,
Bacillus amyloliquefaciens is profiled using the markers for the Bacillus subtilis group since markers for most of the species present under that group did not have a sufficient number of species-specific markers and this lead to the spurious identification of most of the species present in the B. subtilis group

Hi,
I have a similar problem. Bacillus subtilis cannot be detected in Metaphlan3 with mpa_v30_CHOCOPhlAn_201901 database. I downloaded the B.subtilis genomes(NC_000964.3) from NCBI and generated simulated reads. Metaphlan2 with mpa_v20_m200 works well, but Metaphlan3 with mpa_v30_CHOCOPhlAn_201901 incorrectly reports Bacillus_murimartini and Bacillus intestinalis that don’t exist.

curl -o Bacillus_subtilis.fna.gz https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/009/045/GCF_000009045.1_ASM904v1/GCF_000009045.1_ASM904v1_genomic.fna.gz
gzip -d Bacillus_subtilis.fna.gz

#conda install -c bioconda insilicoseq
iss generate -g Bacillus_subtilis.fna --n_reads 1M -m Hiseq --output B.subtilis --cpus 4

metaphlan B.subtilis_R1.fastq,B.subtilis_R2.fastq --bowtie2db /ldfssz1/ST_META/share/User/tianliu/database/metaphlan --index mpa_v30_CHOCOPhlAn_201901 --nproc 4 --input_type fastq --bowtie2out B.subtilis.bowtie2.bz2 -t rel_ab_w_read_stats > B.subtilis.mp3.profile

The results in B.subtilis.mp3.profile are as follows:
#clade_name clade_taxid relative_abundance coverage estimated_number_of_reads_from_the_clade
s__Bacillus_murimartini 2|1239|91061|1385|186817|1386|171685 84.94604 0.12687 528925
s__Bacillus_intestinalis 2|1239|91061|1385|186817|1386|1963032 15.05396 0.02248 90901

I checked the database file and the MD5 values showed that it is complete.
1a342b73df3ff8e534775557b0d4924b mpa_v30_CHOCOPhlAn_201901.tar

B.subtilis is a common bacterium so I worry about it will affect the abundance of other species. Could you please help to check why it is missing?

Best,
Liu

Hi,
I’d like to second TianLiu’s request. I did pretty much the same test before I stumbled upon this thread and got exactly the same result.
While I understand that species-specific markers are scarce, the results as they stand are misleading. It would be really helpful if the markers for that clade could be revisited.
Thank you in advance