Normalize ShortBRED Reads

Joan_Slonczewski · February 11, 2020, 10:49pm

How do we normalize hit numbers to the sample?
Is it reasonable to normalize hits to the number of reads in the FASTA file?

franzosa · February 12, 2020, 3:24pm

The “Count” column in the ShortBRED quantify output is in units of RPKMs, which are already adjusted for sequencing depth (number of reads in the sample). Hence you should be able to analyze those directly.

Joan_Slonczewski · February 13, 2020, 1:52am

Thanks very much. I believe RPKM is best for comparing different types of marker within a sample, whereas Hits would work best across samples.
I’m wondering why Counts (RPKM) and Hits give such different results. Why would some of my top-scoring Hits markers show zero RPKMs?

Family	Count	Hits	TotMarkerLength
gb	AAB58447_1	ARO_3002662	APH(9)-Ia__Legionella_pneumophila_130b_
gb	AHA41505_1	ARO_3002930	vanRO__Rhodococcus_hoagii_
gb	AAD51345_1	ARO_3003052	smeB__Stenotrophomonas_maltophilia_
gb	AAF86691_1	ARO_3001816	ACC-2__Hafnia_alvei_
gb	CAC14596_1	ARO_3003057	smeF__Stenotrophomonas_maltophilia_

Joan_Slonczewski · February 13, 2020, 1:54am

Family	Count	Hits	TotMarkerLength
gb	AAB58447_1	ARO_3002662	APH(9)-Ia__Legionella_pneumophila_130b_
gb	AHA41505_1	ARO_3002930	vanRO__Rhodococcus_hoagii_
gb	AAD51345_1	ARO_3003052	smeB__Stenotrophomonas_maltophilia_
gb	AAF86691_1	ARO_3001816	ACC-2__Hafnia_alvei_
gb	CAC14596_1	ARO_3003057	smeF__Stenotrophomonas_maltophilia_

Joan_Slonczewski · February 13, 2020, 1:55am

That didn’t work; here are the counts|hits.

Count	Hits
0	127
14.68017047	63
0.192048279	17
0	15
2.095746364	13

franzosa · February 13, 2020, 2:16am

RPKMs are good for both within- and between-sample comparison. The “K” part provides gene-length normalization, which allows you to compare between genes of different lengths (e.g. in the same sample). The “M” part provides read-depth normalization, which lets you compare between samples of different read depths.

The “Hits” column is the integer count of reads that hit one of the markers for a particular family. If a family has multiple markers, but <50% of markers recruited reads, then ShortBRED doesn’t consider the family to be confidently detected and reports Count = 0.

johnlee · February 5, 2021, 7:15pm

Thanks for this thread. Is there any way to change the parameter of <50% of marker recruited reads so that the count report is reflective of hits? Also, is there a downside to changing this parameter if you can? I don’t understand how if you have some hits, that it would be listed as not present? Is this a false positive?

Also, one other question. Would having 250 base pair (miseq) reads affect the number of hits?

Thank you for your time.

John

Topic		Replies	Views
ShortBRED yields zero hits for ALL samples ShortBRED	0	124	February 22, 2024
ShortBRED for Nanopore data ShortBRED	3	256	October 20, 2023
Shortbred returns all zero count ShortBRED	1	473	November 5, 2021
Reverse "normalize by number of reads in the sample" ShortBRED	0	386	March 19, 2021
RPKM or other gene-length normalized feature value for MMUPhin MMUPHin	1	272	July 6, 2022

Normalize ShortBRED Reads

Related topics