Which database should I use to run `humann_regroup_table` and `humann_rename_table` command?

Hi there!!! @franzosa @lauren.j.mciver @stro0070
I have run my data with HUMAnN3 by using UniRef90_ec_filtered database with the following command:
for f in *.fastq.gz; do humann --input $f --output humann_output/ --taxonomic-profile ${f%.fastq.gz}_profile.txt --remove-temp-output; done
and got three final outputs.After that I have renormalised the genefamilies.tsv file in relative abundance. Now,

  1. should I rename the file table first or regroup the table?
  2. Which database should I use for the above two steps? My utility_mapping database contains the following files:
map_ec_name.txt.gz                       
map_go_name.txt.gz     
map_ko_uniref50.txt.gz     
map_pfam_name.txt.gz   
map_uniref50_uniref90.txt.gz
map_eggnog_name.txt.gz      
map_go_uniref50.txt.gz  
map_ko_uniref90.txt.gz        
map_pfam_uniref50.txt.gz   
map_uniref90_name.txt.bz2
map_eggnog_uniref50.txt.gz  
map_go_uniref90.txt.gz  
map_level4ec_uniref50.txt.gz  
map_pfam_uniref90.txt.gz   
uniref50-tol-lca.dat.bz2
map_eggnog_uniref90.txt.gz  
map_ko_name.txt.gz      
map_level4ec_uniref90.txt.gz  
map_uniref50_name.txt.bz2  
uniref90-tol-lca.dat.bz2

My target is to find out differentially abundant pathways and the genefamilies in two (Cases and Control) groups.

Hello - The script “humann_rename_table” will add the full names to the terms in the file. You can run it before or after the regroup table. When you run the group script, you would want to use a database that includes the terms you have in your input file (so say uniref90 gene families) and also contains the group of terms that you want to regroup to (so say go or kos). Instead of providing the file directly you can provide the “–groups” term.

Thank you,
Lauren

1 Like

Thanks a lot @lauren.j.mciver for your kind response. I have done the grouping step. I found relative abundance of 0.65-0.7 for UNMAPPED + UNGROUPED when used uniref90_go. Relative abundance for UNMAPPED ranges between 0.45-0.55 and for UNGROUPED it is around 0.2.
Do you think such level of relative abundance for the two groups (UNMAPPED and UNGROUPED) are fine? If not, what should be the normal range?
I have also noticed that the relative abundance shows only upto 3 levels of precision. As a result, I am getting too many zero abundances for many terms. I have also asked in this thread and yet to get a response. Can you please help regarding this?

Many thanks ,
DC7

Hello - The unmapped number in the output reflects how many reads were not mapped in either the nucleotide or translated search portions of the workflow while the ungrouped reflects the number that no longer have a group after regrouping from one annotation to another. I am not sure the exact numbers expected for each but reviewing Eric’s prior post you referenced (Thank you) it sounds like 50% unknown reads would be possible (which is about what you have with 0.45-0.55 unmapped). I am not sure how many uniref annotations have a go annotation so I am not sure if an ungrouped of 0.2 is expected but it seems possible (referring again to Eric’s post about pfams and ecs).

Yes, I will follow up on your other post on the other thread about the precision.

Thank you,
Lauren

1 Like

Hi @lauren.j.mciver - I have run HUMAnN3 with all default options (and, uniref90_ec_filtered). Now I have joined all the path abundance files and now want to rename the table. As far my knowledge for the pathway annotation it uses MetaCyc, right? That’s why I have used --names metacyc-rxn. But, it reverses : Renamed 0 of 460 entries (0.00%). Why? Which name should I use to rename the path abundance file? --names metacyc-pwy also reverses the same message.

EDIT: Is it like the pathway file is renamed by default? Because I am seeing pathway names even before doing the --humann_rename-table step which is replaced by NO_NAME after this step. Here’s a reproducible portion from the default generated files (after joining):

 UNINTEGRATED|unclassified	0.0129599	0.00912485
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis	0.000401208	0.000463844
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Acinetobacter.s__Acinetobacter_ursingii	0	0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Akkermansia.s__Akkermansia_muciniphila	0	0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Alistipes.s__Alistipes_finegoldii	0	2.6102E-06
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Alistipes.s__Alistipes_indistinctus	0	0
1CMET2-PWY: N10-formyl-tetrahydrofolate biosynthesis|g__Alistipes.s__Alistipes_onderdonkii	0	0

This is after the rename step:

|UNINTEGRATED|unclassified|0.0129599|0.00912485|
|---|---|---|
|1CMET2-PWY: NO_NAME|0.000401208|0.000463844|
|1CMET2-PWY: NO_NAME|g__Acinetobacter.s__Acinetobacter_ursingii|0|0|
|1CMET2-PWY: NO_NAME|g__Akkermansia.s__Akkermansia_muciniphila|0|0|
|1CMET2-PWY: NO_NAME|g__Alistipes.s__Alistipes_finegoldii|0|2.6102E-06|
|1CMET2-PWY: NO_NAME|g__Alistipes.s__Alistipes_indistinctus|0|0|
|1CMET2-PWY: NO_NAME|g__Alistipes.s__Alistipes_onderdonkii|0|0|

Thanks,
DC7

Hi DC7, Thank you for the detailed post and sorry for any confusion with the script names. The “rename” script adds the full names to each annotation group. Your pathways file already has the full names (they are included after the colon). If you would like to change a data file to another annotation set use the “regroup” script.

Thank you,
Lauren