Cannot reproduce results of MetaPhlAn 4.0 tutorial

Our results, both bowtie2out and profile, of the MetaPhlan4.0 tutorial differ from the sample output files; even the sum of the abundances is not 100. Please help me identify where I did wrong. We use the biobakery/metaphlan:4.0.2 docker image and create a database using the --bowtie2db flag to specify the folder.

metaphlan --install --bowtie2db /mnt/share/metaphlan_bowtie2db/vOct22

Then, I ran metaphlan on the sample SRS014476-Supragingival_plaque.fasta.gz file, which was complete without error.

root@c0f5e1794384:/mnt/share/metaphlan_analysis# metaphlan SRS014476-Supragingival_plaque.fasta.gz --input_type fasta > SRS014476-Supragingival_plaque_profile.txt --bowtie2db /mnt/share/metaphlan_bowtie2db/vOct22
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.

Here are the first 12 lines of the profile result. 19048 reads processed are at least the same, but the abundances differ from the sample.

#mpa_vOct22_CHOCOPhlAnSGB_202212
#/usr/local/bin/metaphlan SRS014476-Supragingival_plaque.fasta.gz --input_type fasta --bowtie2db /mnt/share/metaphlan_bowtie2db/vOct22
#19048 reads processed
#SampleID	Metaphlan_Analysis
#clade_name	NCBI_tax_id	relative_abundance	additional_species
k__Bacteria	2	100.0	
k__Bacteria|p__Actinobacteria	2|201174	55.36506	
k__Bacteria|p__Firmicutes	2|1239	44.63494	
k__Bacteria|p__Actinobacteria|c__Actinomycetia	2|201174|1760	55.36506	
k__Bacteria|p__Firmicutes|c__Bacilli	2|1239|91061	44.63494	
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales	2|1239|91061|186826	44.63494	
k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Corynebacteriales|f__Corynebacteriaceae	2|201174|1760|85007|1653	55.36506

Sample result:

#mpa_vJan21_CHOCOPhlAnSGB_202103
#<metaphlan command>
#19048 reads processed
#SampleID	Metaphlan_Analysis
#clade_name	NCBI_tax_id	relative_abundance	additional_species
k__Bacteria	2	100.0	
k__Bacteria|p__Actinobacteria	2|201174	94.8922	
k__Bacteria|p__Proteobacteria	2|1224	5.1078	
k__Bacteria|p__Actinobacteria|c__Actinobacteria	2|201174|1760	94.8922	
k__Bacteria|p__Proteobacteria|c__Betaproteobacteria	2|1224|28216	5.1078	
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales	2|201174|1760|85007	53.56955	
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales	2|201174|1760|85006	40.15913	

I attached our bowtie2out and profile result files.

SRS014476-Supragingival_plaque.fasta.gz.bowtie2out.txt (53.6 KB)
SRS014476-Supragingival_plaque_profile.txt (2.9 KB)

The compiled database contains these files. They may be correctly installed.

 -rw-r--r-- 1 root   root         32 Aug 23 10:29 mpa_latest
 -rw-rw-r-- 1 root   root 3978538203 Aug 23 12:36 mpa_vOct22_CHOCOPhlAnSGB_202212.1.bt2l
 -rw-rw-r-- 1 root   root 5311606700 Aug 23 12:36 mpa_vOct22_CHOCOPhlAnSGB_202212.2.bt2l
 -rw-rw-r-- 1 root   root  100610041 Aug 23 10:59 mpa_vOct22_CHOCOPhlAnSGB_202212.3.bt2l
 -rw-rw-r-- 1 root   root 2655803348 Aug 23 10:59 mpa_vOct22_CHOCOPhlAnSGB_202212.4.bt2l
 -rw-r--r-- 1 root   root         70 Aug 23 10:50 mpa_vOct22_CHOCOPhlAnSGB_202212.md5
 -rw-rw-r-- 1 196515 9930   73781868 Feb 27 03:45 mpa_vOct22_CHOCOPhlAnSGB_202212.pkl
 -rw-rw-r-- 1 root   root 3978538203 Aug 23 14:12 mpa_vOct22_CHOCOPhlAnSGB_202212.rev.1.bt2l
 -rw-rw-r-- 1 root   root 5311606700 Aug 23 14:12 mpa_vOct22_CHOCOPhlAnSGB_202212.rev.2.bt2l
 -rw-r--r-- 1 root   root 3025049600 Aug 23 10:50 mpa_vOct22_CHOCOPhlAnSGB_202212.tar
 -rw-rw-r-- 1 196515 9930      44092 Feb 22  2023 mpa_vOct22_CHOCOPhlAnSGB_202212_VINFO.csv
 -rw-r--r-- 1 root   root  881484571 Aug 23 10:50 mpa_vOct22_CHOCOPhlAnSGB_202212_VSG.fna

Hi @Makio, it seems that I have encountered the same issue. My output from MetaPhlAn version 4.0.6 (1 Mar 2023) returns the same profile as your example. Only o__Lactobacillales was reported at the order level and not the others such as o__Corynebacteriales and o__Micrococcales. Order abundances for SRS014464-Anterior_nares also do not sum to 100.

Hi @chx, as you mentioned in another thread, I also find that running on the vJan21 produces the same result as the example. However, I need to manually modify the mpa_latest in the database folder to set mpa_vJan21_CHOCOPhlAnSGB_202103 from mpa_vOct22_CHOCOPhlAnSGB_202212 even though I created the database from the command using --index mpa_vJan21_CHOCOPhlAnSGB_202103 flag.

The eclipse of the abundance not summing to 100 on vOct22 still suggests something become wrong.

Hi @Makio the problem of not summing up to 100% is related to this issue: Metaphlan genus level relative abundance not summing up to 100% and possible database problem - #2 by Michal_Puncochar
We are fixing it in the next database release!