Discrepancy in relative abundance and raw abundance files

SciLiciumTheo · October 6, 2022, 2:59pm

Hello there,

I just ran the wmgx workflow (with default parameters), and the number of entries in “relative abundance” output files is systematically lower than in the “raw” abundance files:

wc output/humann/merged/*.tsv
9975 49877 588863 output/humann/merged/ecs.tsv
9914 49572 671235 output/humann/merged/ecs_relab.tsv
160259 801297 9380879 output/humann/merged/genefamilies.tsv
160258 801292 9105224 output/humann/merged/genefamilies_relab.tsv
789 7078 83367 output/humann/merged/pathabundance.tsv
757 6918 77632 output/humann/merged/pathabundance_relab.tsv

From my understanding, we should have the same number of lines in raw and relative abundance files. I suspect some sort of filtering, but could not find any info on that.

Could you please explain the reason of this discrepancy?

Thanks a lot!

franzosa · October 6, 2022, 3:13pm

My guess is that you are removing special features (UNMAPPED, UNGROUPED, etc.) during your normalization.

SciLiciumTheo · October 6, 2022, 3:30pm

Indeed, thanks for the quick and spot-on reply!

awk ‘{print $1}’ tmp/humann/merged/pathabundance_relab.tsv > tmp/humann/merged/pathabundance_relab_names.list
grep -f tmp/humann/merged/pathabundance_relab_names.list -v tmp/humann/merged/pathabundance.tsv

UNMAPPED 29967.1437145868 115981.3770280872 UNINTEGRATED 42684.0053677411 107902.3717791706 UNINTEGRATED|g__Acidaminococcus.s__Acidaminococcus_intestini 0 UNINTEGRATED|g__Acidaminococcus.s__Acidaminococcus_intestini_CAG_325 UNINTEGRATED|g__Alistipes.s__Alistipes_putredinis UNINTEGRATED|g__Alistipes.s__Alistipes_putredinis_CAG_67 UNINTEGRATED|g__Bacteroides.s__Bacteroides_caccae UNINTEGRATED|g__Bacteroides.s__Bacteroides_cellulosilyticus UNINTEGRATED|g__Bacteroides.s__Bacteroides_coprocola_CAG_162 UNINTEGRATED|g__Bacteroides.s__Bacteroides_dorei UNINTEGRATED|g__Bacteroides.s__Bacteroides_eggerthii 0 UNINTEGRATED|g__Bacteroides.s__Bacteroides_faecis UNINTEGRATED|g__Bacteroides.s__Bacteroides_fluxus UNINTEGRATED|g__Bacteroides.s__Bacteroides_massiliensis 0 UNINTEGRATED|g__Bacteroides.s__Bacteroides_ovatus UNINTEGRATED|g__Bacteroides.s__Bacteroides_plebeius 0 UNINTEGRATED|g__Bacteroides.s__Bacteroides_plebeius_CAG_211 UNINTEGRATED|g__Bacteroides.s__Bacteroides_thetaiotaomicron UNINTEGRATED|g__Bacteroides.s__Bacteroides_uniformis 0 UNINTEGRATED|g__Bacteroides.s__Bacteroides_uniformis_CAG_3 UNINTEGRATED|g__Bacteroides.s__Bacteroides_vulgatus 0 UNINTEGRATED|g__Bacteroides.s__Bacteroides_vulgatus_CAG_6 UNINTEGRATED|g__Bacteroides.s__Bacteroides_xylanisolvens UNINTEGRATED|g__Dialister.s__Dialister_invisus 0 UNINTEGRATED|g__Dialister.s__Dialister_invisus_CAG_218 0 UNINTEGRATED|g__Eubacterium.s__Eubacterium_eligens 0 UNINTEGRATED|g__Eubacterium.s__Eubacterium_eligens_CAG_72 UNINTEGRATED|g__Faecalibacterium.s__Faecalibacterium_prausnitzii UNINTEGRATED|g__Lachnospira.s__Lachnospira_pectinoschiza UNINTEGRATED|g__Prevotella.s__Prevotella_copri 12502.6280418438 UNINTEGRATED|g__Prevotella.s__Prevotella_copri_CAG_164 UNINTEGRATED|unclassified 11601.6028965991 102861.3404898724 100500.2299600781
110796.7696967103 129155.9278670251
5083.1125600959 0 0
0 2847.0181364410 0 0
0 6579.9207693948 0 0
0 1361.8073637478 0 0
0 1978.2316749417 0 0
0 0 1622.6973434366 0
1366.6531819421 0 0 0
0 2849.6180497356 26822.4802909961 5163.9951951313
0 0 10326.9369524682
0 0 1129.4210672802 0
0 0 0 7654.4874354997
0 0 3754.9005810025
0 1340.7765518493 0 6540.0046354173
0 0 1995.6522102630
0 0 0 2059.2582283209
0 0 0 655.8485785403
6251.8385039334 11774.4850171448 12279.0012881862
0 1134.2728263491 4359.1739231876 4395.3539403961
8920.3810815709 3297.2798884464 9724.9721189971
0 2725.4170554521 562.7256560792 3030.4270121272
0 3399.6100801369 1692.4641674226 3307.6268810915
2247.3598830230 0 0
4996.5301415372 0 0
1574.2077156635 0 0
0 1351.8872697552 0 0
11138.7293161285 11796.5848588853 17861.9183857442 0
0 5085.6496177987 0 0
0 0 0
4266.5038170823 0 0 0
22488.9431624300 24055.6542762881 17258.5426240774

May I suggest updating the README (GitHub - biobakery/humann: HUMAnN is the next generation of HUMAnN 1.0 (HMP Unified Metabolic Analysis Network).)
‘“Special” features (such as UNMAPPED) can be included or excluded in the normalization process (excluded by default in the wmgx workflow).’?

Topic		Replies	Views
Unmapped reads - relative abundance and absolute counts in gene families output HUMAnN	1	373	September 21, 2023
Relative abundance >1.0 and stratification HUMAnN	6	1848	April 28, 2020
Humann3 does not recognize "relative abundance" in metaphlan profile file HUMAnN	8	1044	November 4, 2025
Tranformation of the abundance OTU table into relative abundance. HUMAnN	0	1390	January 21, 2022
Can you compare separate HUManN runs? HUMAnN	7	159	January 2, 2025

Discrepancy in relative abundance and raw abundance files

Related topics