I have noticed that some reference genomes have names like Fusobacterium nucleatum subsp. vincentii and Fusobacterium nucleatum subsp. animalis. How can I analyse them?
MetaPhlAn’s --help
has a paragraph like:
- Finally, to obtain all markers present for a specific clade and all its subclades, the
-t clade_specific_strain_tracker
should be used. For example, the following command
is reporting the presence/absence of the markers for the B. fragilis species and its strains
the optional argument--min_ab
specifies the minimum clade abundance for reporting the markers
$ metaphlan -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt
But, below I see that it is missing from the list of valid values of -t
. Why is it missing?
-t ANALYSIS TYPE
Type of analysis to perform:
* rel_ab: profiling a metagenomes in terms of relative abundances
* rel_ab_w_read_stats: profiling a metagenomes in terms of relative abundances and estimate the number of reads coming from each clade.
* reads_map: mapping from reads to clades (only reads hitting a marker)
* clade_profiles: normalized marker counts for clades with at least a non-null marker
* marker_ab_table: normalized marker counts (only when > 0.0 and normalized by metagenome size if--nreads
is specified)
* marker_counts: non-normalized marker counts [use with extreme caution]
* marker_pres_table: list of markers present in the sample (threshold at 1.0 if not differently specified with--pres_th
[default ‘rel_ab’]
Nonetheless, the analysis works, but I see strange IDs. How do I relate them back to familiar names such as subsp. animalis? Why is the first column heading Sample ID? What does 1 mean? It is a Boolean value of True?
#mpa_v30_CHOCOPhlAn_201901
#metaphlan -t clade_specific_strain_tracker --clade s__Fusobacterium_nucleatum OSCC_1-Pintermediate.bz2 --input_type bowtie2out
--nproc 8 --bowtie2db databases/bacteriaMarkers/ --output_file OSCC_1-PstrainsMetagenome.txt
#SampleID Metaphlan_Analysis
851__Q7P5X0__RN95_03310 1
851__Q8R657__RO08_04045 1
851__R9R952__CI111_08490 1
851__Q7P4W1__cmk2 1
851__C7XQ46__H848_00740 1
Can you create a step-by-step tutorial for GitHub which demonstrates this analysis, please?