I’m trying to analyze some metatranscriptomics data from human stool with HUMAnN2, but the results I’m getting seem to be really ambiguous (unfortunately no paired metagenomics to go along with this). I’m removing adapters and low complexity sequences with bbduk, removing human sequences and 16S rRNA with kneaddata, but ending up with 75%+ of my data being “UNMAPPED” in the genefamilies.tsv file. Here’s the average proportions across three timepoints:
If I regroup to EC or MetaCyc, then >99% ends up being UNMAPPED or UNINTEGRATED.
I inspected the metaphlan_bugs_list.tsv files in the temp directories of a few samples and saw that some samples are absolutely dominated by some Viruses. Is it possibly the case that these viruses simultaneously extremely active at transcription and also not well-represented in the uniref90 database? I’m not sure what else might be going on. If there’s some parameters I could tune to make my results more useful, I would greatly appreciate having them pointed out to me.
Thanks for your time,