Hi,
I am a newbie to Humann2 and got some doubts on it:
Why SUM of relative abundance of my samples are not 1.0, actually they are >1.00.
(i joint the files of all samples pathabundance and then calculated the relative abundance using this script:
humann2_join_tables --input very-sensitive/ --file_name pathabundance --output humann2_pathabundance_SEN.tsv)
how stratification works? what is the meaning of this?
(after calculating relative abundance, i did stratification using following script:
humann2_split_stratified_table --input humann2_pathabundance_relab_cat.tsv --output stratified/)
I did understand that unstratified gives only pathways and stratified give pathways+bacteria but my startified numbers are <1.0 and unstratifed ~1.0 and SUM of both is close to my original relative abundance of samples which i mentioned in 1st question.
Am i doing something wrong? if it is correct, then if i want to report pathways + microbes, which file should i use stratified or original relative abundance file, which i got in my first step as it mentioned in 1st question?
Kindly help me to solve these questions,
Thanks,
Khem
The default normalization procedure is to normalize to the sum of functions’ community totals. For functions that are computed strictly by adding up gene abundances (e.g. outputs from regroup_table), the sum of the stratifications will be the same as the sum of the community totals (such that a normalized sample/column for these data will sum to to 2.0).
Pathways, however, are not a simple sum over genes, since the pathway must be “complete” to be counted. Hence, it’s possible for species A to encode half the pathway (0 complete copies), species B to encode the other half of the pathway (0 complete copies), and the community total to have 1+ complete copies. (Where “copies” here is a proxy for abundance.) What you are seeing is a less extreme example of this phenomenon, i.e. the species-stratified pathway abundances summing to less than the community total pathway abundances.
The split_stratified_table script is simply dividing a stratified table into two new tables: one with only community totals and another with all the stratifications. This is intended for use with external tools that are not able to accommodate the multi-level structure of the default HUMAnN output.
I would not merge species abundance and functional abundance into a single table, if that’s what you’re asking? They represent different ways of slicing up the composition of a community (equivalent to, say, sorting a set of shapes by color and sorting them separately by size).
Maybe I’m not following your question - when you say “pathways + bacteria” do you mean the stratified pathways? I.e. the default HUMAnN2 output format that looks like:
pathway|g__Genus.s__species
If that is what you mean I would recommend using the above (default) format for organizing and sharing your data.
Yes Eric, i meant the default humann2 output (pathway|g__Genus.s__species), which you called startified pathways
Just to make sure that I understood correctly,
I can report the relative abundance of “default humann2 output files (pathway|g__Genus.s__species)” right?
if 1st is correct, then how can i report only pathways, without species?
one more thing, i got confused about stratified pathways, as you mentioned “default humann2 output (pathway|g__Genus.s__species)” is a stratified pathway, but “split_stratified_table…” script gives two output files stratified.tsv and unstratifed.tsv. should i ignore these files, in terms of stratified pathways?
Asking many things at the same time, i hope, explained myself.
Thanks,
Khem