The percentage of unmapped and unintegrated, combined is coming upto 80% and the relative abundance of rest of the pathways is very very small. Is it right to leave the unmapped from further analysis? if not, please let me know the significance?
And how come I am getting such high abundance of unmapped? What are the possible reasons?
What kind of samples are you analyzing? Which HUMAnN settings are you using?
Generally speaking, since only a minority of read mass can be assigned to pathways, I think it’s fine to perform a normalization within the measured pathways before analysis (excluding UNMAPPED and UNINTEGRATED). Otherwise variation in these “UN” features may dominate your results.
Thank you for your reply.
These are environmental samples, I am using the basic HUMAnN settings with uniref90 as the reference protein database. I am using it for the first time, so I did not change any settings.
Please let me know if I need to modify the settings.
I would recommend using UniRef50 for environmental samples. Otherwise, as long as they are of reasonable depth, you should be OK keeping everything else to the default value.
For classification of pathways I am using map_metacy-pwy_lineage.tsv file that you have shared before.
But for the categorical classification of COGs, I am not able to map all the IDs as the list also has arCOG, ENOG and KOG, which are not included in the main COG categories.
Please let me know how this can be dealt with.
Thank you again.