Virulence Factor Identification

Dario · August 18, 2021, 11:00am

I am new to microbiome research and I have used HUMAnN 3 and MetaPhlAn 3 so far. Another researcher who has a focus on microbiome research recommended doing a virulence gene analysis. Is there anything in bioBkery for that purpose? For example, Fusobacterium nucleatum is a commensal oral microbe. But, a recent journal article reported that its FadA gene is associated with cancer progression. Could such a finding be easily analysed with some bioBakery tool which I am not aware of? I find that the matabolic pathways are not really what’s needed for a cancer vs. normal tissue comparison.

franzosa · August 27, 2021, 2:47pm

HUMAnN is our general-purpose functional profiling tool. If you are able to associate your genes of interest with UniRef identifiers, you can look them up directly in the genefamilies.tsv output for each sample. Alternatively, if the genes are associated with broader Pfam/KO/eggNOG categories, you can regroup UniRef abundances to those systems and look up the corresponding identifiers.

We also offer ShortBRED as an approach to targeted functional profiling. There, you start with a small set of gene sequences of interest and identify peptide-level markers that are conserved within them but rare in other proteins. These markers can then be used for highly specific, accelerated functional profiling. Compared with HUMAnN, the ShortBRED approach provides more confident presence/absence calls for specific gene families, but requires you to pre-specify those families and do some “indexing” on them (to identify their peptide markers) before analyzing your sample.

Hope this helps!

Dario · August 28, 2021, 12:00am

ShortBRED sounds great for my use case. From its journal article in 2015:

ShortBRED-Identify takes two inputs: (i) a FASTA file of proteins of interest and (ii) a comprehensive catalog of reference protein sequences (as a FASTA file or preformatted BLAST database). As of this writing, IMG is no longer available for download, and we recommend using UniRef100 or UniRef90 as alternative comprehensive protein reference datasets.

Can you provide modern-day recommendations for a good reference database?

franzosa · August 30, 2021, 9:32pm

UniRef90 is still around and remains our go-to for a non-redundant representation of the known protein universe.

Topic		Replies	Views
Help with Understanding Microbial Community Analysis Tools HUMAnN	1	94	October 11, 2024
Building own reduced database for AMPs, Virulence factors and others HUMAnN	3	318	July 12, 2021
About the Humann2 category HUMAnN	1	812	October 4, 2022
HUMAnN targeted functional analysis? HUMAnN	2	62	January 9, 2026
Annotate predicted gene sequences HUMAnN	1	313	November 5, 2021

Virulence Factor Identification

Related topics