I am a beginner in bioinformatics. I have analyzed a metatranscriptome dataset with HUMAnN2 pipeline and it has generated “bugs_list.txt” as expected. However, this file only shows the relative abundance of the detected taxa. Is there a way I could access the number of raw reads for the marker gene of every taxon?
My motivation for that is to perform rarefaction analysis to see whether sequencing depth was sufficient to detect all functionally active taxa. Based on that, I want to determine sequencing depth for another experiment where I am only interested in getting the list of active taxa.
Sorry for my slow reply - the only way to get to read counts would be to look at the raw mapping files that HUMAnN parses to build its various abundance profiles and directly count the number of reads hitting each sequence (or taxon). All of these files are available under your sample’s temp/ directory.
Thanks, Eric! Are you referring to “sample_diamond_aligned.tsv”? I see the following files in the temp folder.
Correct - that file stores the raw mapping of reads to protein sequences (of unclassified taxonomy) whereas the equivalent “_bowtie2_aligned.tsv” file stores the mapping of reads to pangenome sequences. You could count up the number of times each target sequence occurs in that file to get something more count-like (noting that it won’t directly maps HUMAnN’s abundances due to lack of filtering/normalization).