Hello!
I am trying to analyze some metagenomic soil samples (Illumina NovaSeq 6000, 250 bp paired end, ~30 million trimmed reads per sample) using the Biobakery pipeline. Right now I am using MetaPhlAn v.4.0.6 and HUMAnN v.3.9. I’ve had some success with MetaPhlAn using the Jan25 database and relaxed Bowtie2 parameters (using --bt2_ps very-sensitive-local; the very-sensitive default gives only about 1% classification) on filtered/trimmed reads so that the bacterial taxonomic composition reflects what we have seen with 16S and Kraken analysis. I was thinking about trying to add additional markers to include Chlorophyta, as algae is a focus in our lab and just wanted to clarify some points:
From what I can understand the Jun23 database can be used with HUMAnN v3.9, but if I want to update to HUMAnN 4.0 I would have to use the Oct22 database. Jun23 still seems to do pretty well (98.77% classification) compared to Jan25 (100% classification) when run with the relaxed bt2 parameter, however it does provide noticeably less depth.
Would the marker extraction process detailed in this post still be applicable for adding markers to the Jun23 database / updating the ChocoPhlAn HUMAnN database ? Would you recommend updating to the latest versions of MetaPhlAn and HUMAnN if the Oct22 database seems to perform comparably to Jun23? I would also like to play around a bit more with some of the other mapping settings as we were expecting to get at least some archaea and eukaryotic hits, whereas all we have seen so far is bacteria. Any suggestions on this front would also be greatly appreciated!
Thanks!