Confusing Instructions About Strain Analysis Using MetaPhlAn

I have noticed that some reference genomes have names like Fusobacterium nucleatum subsp. vincentii and Fusobacterium nucleatum subsp. animalis. How can I analyse them?

MetaPhlAn’s --help has a paragraph like:

  • Finally, to obtain all markers present for a specific clade and all its subclades, the
    -t clade_specific_strain_tracker should be used. For example, the following command
    is reporting the presence/absence of the markers for the B. fragilis species and its strains
    the optional argument --min_ab specifies the minimum clade abundance for reporting the markers

$ metaphlan -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt

But, below I see that it is missing from the list of valid values of -t. Why is it missing?

-t ANALYSIS TYPE Type of analysis to perform:
* rel_ab: profiling a metagenomes in terms of relative abundances
* rel_ab_w_read_stats: profiling a metagenomes in terms of relative abundances and estimate the number of reads coming from each clade.
* reads_map: mapping from reads to clades (only reads hitting a marker)
* clade_profiles: normalized marker counts for clades with at least a non-null marker
* marker_ab_table: normalized marker counts (only when > 0.0 and normalized by metagenome size if --nreads is specified)
* marker_counts: non-normalized marker counts [use with extreme caution]
* marker_pres_table: list of markers present in the sample (threshold at 1.0 if not differently specified with --pres_th
[default ‘rel_ab’]

Nonetheless, the analysis works, but I see strange IDs. How do I relate them back to familiar names such as subsp. animalis? Why is the first column heading Sample ID? What does 1 mean? It is a Boolean value of True?

#mpa_v30_CHOCOPhlAn_201901
#metaphlan -t clade_specific_strain_tracker --clade s__Fusobacterium_nucleatum OSCC_1-Pintermediate.bz2 --input_type bowtie2out
 --nproc 8 --bowtie2db databases/bacteriaMarkers/ --output_file OSCC_1-PstrainsMetagenome.txt
#SampleID       Metaphlan_Analysis
851__Q7P5X0__RN95_03310 1
851__Q8R657__RO08_04045 1
851__R9R952__CI111_08490        1
851__Q7P4W1__cmk2       1
851__C7XQ46__H848_00740 1

Can you create a step-by-step tutorial for GitHub which demonstrates this analysis, please?

2 years later and I also didn’t find any explanation of the output from this command.

This is what I have:

#mpa_vOct22_CHOCOPhlAnSGB_202212 #/home/danielsg/miniconda3/envs/metaphlan4/bin/metaphlan /scr1/users/danielsg/carsten_skarke_run_1/sunbeam_output/classify/metaphlan4/bowtie2out/ACB.RecSw.60.bowtie2.bz2 --nproc 8 -t clade_specific_strain_tracker --clade s__Escherichia_coli --input_type bowtie2out --bowtie2db /mnt/isilon/microbiome/analysis/biodata/metaphlan_databases/v4 --index mpa_vOct22_CHOCOPhlAnSGB_202212 -o /scr1/users/danielsg/carsten_skarke_run_1/sunbeam_output/classify/metaphlan4/s__Escherichia_coli/ACB.RecSw.60.strain.txt #36418052 reads processed #SampleID Metaphlan_Analysis UniRef90_P30847|2__5|SGB10068 1 UniRef90_P76108|3__6|SGB10068 1 UniRef90_P77730|6__9|SGB10068 1 UniRef90_P25718|1__4|SGB10068 1 UniRef90_P76213|1__4|SGB10068 1 UniRef90_P25798|4__8|SGB10068 1 UniRef90_P76146|2__6|SGB10068 1 UniRef90_P77334|10__13|SGB10068 1

I assume the UniRef… are the gene cluster, and the SGB are the markers but I have no idea what’s in between. Nor do I have any idea what to do with this for downstream analysis. Any help / advice would be appreciated. Thanks!

Hi @scottdaniel the -t clade_specific_strain_tracker just report the markers of the specified species (specified with --clade) that are present in the sample. In my opinion, it is not really useful for any downstream analyses

1 Like