MetaPhlan 4 not working with .gz for sample2markers.py step

Hi

I am trying to use metaphlan 4 to get an output for strainphlan 4,

My input are the .gz files from the tutorial (the metaphlan tutorial (metaphlan4 · biobakery/biobakery Wiki · GitHub), does anyone know how to get the sam.bz2 files from metaphlan 4 which are needed for this step?

sample2markers.py -i sams/*.sam.bz2 -o consensus_markers -nproc 14

I tried this and it didn’t work

for i in *.fasta.gz ; do metaphlan $i --input_type fasta --nproc 14 -s sams/${i%}.sam.bz2 --bowtie2out bowtie2/${i%}.bowtie2.txt -o profiles/${i%}_profiled.tsv ; done

If I use genuine .bz2 files as an input and use this loop, it works

for i in *.fastq.bz2 ; do metaphlan $i --input_type fastq --nproc 14 -s sams/${i%}.sam.bz2 --bowtie2out bowtie2/${i%}.bowtie2.bz2 -o profiles/${i%}_profiled.tsv ; done

Any suggestions would be welcome - or do I need to change all my .gz to .bz2?

cheers
Julian

Hi Guys

Stand down - I think the issue is the files supplied in the tutorial - metaphlan4 · biobakery/biobakery Wiki · GitHub - are corrupted as when I run them in MetaPhlan 4 I get this

for i in *.fasta.bz2 ; do metaphlan $i --input_type fasta --nproc 14 -s sams/${i%}.sam.bz2 --bowtie2out bowtie2/${i%}.bowtie2.bz2 -o profiles/${i%}_profiled.tsv ; done
WARNING: MetaPhlAn did not detect any microbial taxa in the sample.
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.

cheers

Hi @Julian-marchesi
Which version of the metaphlan database are you using. Currently, the available strainphlan tutorial files are only working with version vJan21 (the files of the tutorial are not real communities and they were made to work fast with the jan21 db)