Metaphlan4 merge_metaphlan_tables.py fails

merge_metaphlan_tables.py which was packaged with metaphlan4 conda environment does not work. It fails:

command I used
merge_metaphlan_tables.py output_merged_sam/metaphlan/*.txt

error:
merge_metaphlan_tables: wrong header format for “output_merged_sam/metaphlan/MSA-1003_CI_S201_profile.txt”, please check your profiles.

Although if I use merge_metaphlan_tables.py from metaphlan3 version it works perfectly fine. Just wanted to point out that there is a bug.

Thanks!
Hena

Hi @Hena
Could you share the beginning of the output_merged_sam/metaphlan/MSA-1003_CI_S201_profile.txt file? Might it be that the file is corruped?

I have tried this on multiple files and the error was the same.
Here is the first few lines of the profile.txt:

#mpa_vJan21_CHOCOPhlAnSGB_202103
#.snakemake/conda/87013b619fc67c1a9da33c332c64e376/bin/metaphlan -t rel_ab_w_read_stats --unclassified_estimation output_merged_sam/merged_data/MSA-1003_CI_S201.fastq --add_viruses --input_type fastq --bowtie2db /bulk/IMCshared_bulk/shared/dbs/metaphlan4 --bowtie2out output_merged_sam/metaphlan/MSA-1003_CI_S201_bowtie2.bz2 --nproc 8 -o output_merged_sam/metaphlan/MSA-1003_CI_S201_profile.txt
#SampleID Metaphlan_Analysis
#estimated_reads_mapped_to_known_clades:29764056
#clade_name clade_taxid relative_abundance coverage estimated_number_of_reads_from_the_clade
UNCLASSIFIED -1 8.76663 - 0
k__Bacteria 2 91.23337 1.16585 29764056
k__Bacteria|p__Firmicutes 2|1239 47.26269 0.60396 10760680
k__Bacteria|p__Proteobacteria 2|1224 25.96954 0.33186 11517786
k__Bacteria|p__Bacteroidetes 2|976 17.89849 0.22872 3741850
k__Bacteria|p__Actinobacteria 2|201174 0.10265 0.00131 23233
k__Bacteria|p__Firmicutes|c__Bacilli 2|1239|91061 44.97135 0.57468 8154456
k__Bacteria|p__Bacteroidetes|c__Bacteroidia 2|976|200643 17.89849 0.22872 3207300
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria 2|1224|1236 15.67728 0.20034 6230076
k__Bacteria|p__Proteobacteria|c__Alphaproteobacteria 2|1224|28211 10.00891 0.1279 3602244
k__Bacteria|p__Firmicutes|c__Clostridia 2|1239|186801 2.29134 0.02928 1068984
k__Bacteria|p__Proteobacteria|c__Epsilonproteobacteria 2|1224|29547 0.16809 0.00215 21126
k__Bacteria|p__Proteobacteria|c__Betaproteobacteria 2|1224|28216 0.11527 0.00147 18942

Hi @Hena
I think the problem is related to the metaphlan analysis type (-t parameter). The merge_metaphlan_profile.py has been thought to use with the default analysis type (-t rel_ab)

Thanks @aitor.blancomiguez
Previously in metaphlan3 it worked when -t rel_ab_w_read_stats option was used so I assumed it would work this time too. Not a big issue though :slight_smile:
Hena

Hi @Hena
Yes, for version 4 we decided to check the “#” headers in the output profile in order to avoid merging profiles built with different “-t” parameters. So we ended up only allowing to merge the profiles with the default “-t” analysis.

1 Like