I was hoping you could give me some general feedback regarding the use of custom humann3 reference databases:
My idea is to use a recently published collection of mouse gastrointestinal bacterial genomes to generate custom nucleotide and translated reference databases for humann3 (see GitHub - BenBeresfordJones/MGBC: The Mouse Gastrointestinal Bacteria Catalogue and linked publication for more info). This collection contains ~26,400 high-quality bacterial genomes assembled from mouse gut metagenomes. Using available kraken/bracken databases generated from this collection I get very complete species level taxonomic classification of my sample reads (~95% compared to 30-60% with ncbi).
I was thinking that making custom databases for humann3 from this collection may similarly enable much more complete mapping of my sample reads to functional units (genes/reactions/pathways) and would make it easier to compare the humann3 outputs to my kraken/bracken taxonomic profiles. I think once I have custom nucleotide and translated databases I could adjust and use the bracken outputs as the taxonomic profile to bypass metaphlan.
However I am very new to computational analysis and am still not 100% sure if doing this makes sense. I am also a bit overwhelmed in trying to figure out the necessary steps and don’t have a good feel for how big of a job this might be in terms of computational resources.
I was hoping to get some feedback regarding the usefulness (based on my goal of more complete mapping of reads) and feasibility of making these custom databases. Also if you have any advice/ can point me to any relevant resources I would be very grateful!