Custom chocophlan database

How can I generate custom chocophlan database in a single command?

07/27/2023 04:13:21 PM - humann.search.prescreen - INFO: Creating custom ChocoPhlAn database …
07/27/2023 04:13:21 PM - humann.utilities - DEBUG: Using software: /usr/bin/gunzip
07/27/2023 04:13:21 PM - humann.utilities - INFO: Execute command: /usr/bin/gunzip -c /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Enterococcus.s__Enterococcus_faecalis.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Klebsiella.s__Klebsiella_pneumoniae.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Haemophilus.s__Haemophilus_influenzae.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Bifidobacterium.s__Bifidobacterium_longum.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Veillonella.s__Veillonella_infantium.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Rothia.s__Rothia_mucilaginosa.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Klebsiella.s__Klebsiella_aerogenes.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Streptococcus.s__Streptococcus_peroris.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Klebsiella.s__Klebsiella_oxytoca.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Streptococcus.s__Streptococcus_parasanguinis.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Haemophilus.s__Haemophilus_parainfluenzae.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Streptococcus.s__Streptococcus_salivarius.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Streptococcus.s__Streptococcus_mitis.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Veillonella.s__Veillonella_seminalis.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Lactobacillus.s__Lactobacillus_gasseri.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Streptococcus.s__Streptococcus_infantis.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Veillonella.s__Veillonella_parvula.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Veillonella.s__Veillonella_dispar.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Streptococcus.s__Streptococcus_pneumoniae.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Lactobacillus.s__Lactobacillus_paragasseri.centroids.v201901_v31.ffn.gz /Users/hyunwookkoh/Desktop/HUMANN/HUMANN_DB/chocophlan/g__Bifidobacterium.s__Bifidobacterium_bifidum.centroids.v201901_v31.ffn.gz
07/27/2023 04:13:22 PM - humann.humann - INFO: TIMESTAMP: Completed custom database creation : 4 seconds
07/27/2023 04:13:22 PM - humann.search.nucleotide - INFO: Running bowtie2-build …
07/27/2023 04:13:22 PM - humann.utilities - DEBUG: Using software: /opt/homebrew/anaconda3/envs/mpa/bin/bowtie2-build
07/27/2023 04:13:22 PM - humann.utilities - INFO: Execute command: /opt/homebrew/anaconda3/envs/mpa/bin/bowtie2-build -f /Users/hyunwookkoh/Desktop/Data/SRR4052022_fastq/pathway/SRR4052022_humann_temp/SRR4052022_custom_chocophlan_database.ffn /Users/hyunwookkoh/Desktop/Data/SRR4052022_fastq/pathway/SRR4052022_humann_temp/SRR4052022_bowtie2_index
07/27/2023 04:19:43 PM - humann.humann - INFO: TIMESTAMP: Completed

I want to do this work separately, so I can use this custom database for all of my other samples in pipeline. skipping this for all humann steps

You can take a set of pangenomes, concatenate them as a single FASTA file, and then index that FASTA as a Bowtie 2 database with the bowtie2-build command:

https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#the-bowtie2-build-indexer

You can then provide that index to HUMAnN to use for nucleotide search (bypassing the normal prescreen-and-index process).

Thank you for your reply:) documents that you provided help me a lot to understand how build index.

But I have another problem that some of mapped species by MetaPhlAn4 don’ exist in CHOCOPhlAn database. Do I need to find mapped species in CHOCOPhlAn database. with their SGB id? or is it ok to just skip them?

++For more detail, for example, I extract 315 different species (without ending with t__~~ i.e. ‘s__GGB4587_SGB6346t__SGB6346’ ) With the HUMAaN3 I could find 90 species (species.ffn file for index) in the log file. When I compare them only 56 species are overlapped. That means that if I extract species from my result (MetaPhlAn4 merged table) for making index, some reference genome of some organisms would be dropped.

Could you please suggest any option or method to find all prescreened species even though I don’t use HUMAaN3?

I’m not sure I follow the question. Are you trying to find the set of HUMAnN 3 pangenomes that cover the total set of SGBs found across all your MetaPhlAn 4 profiles? If so, there is a mapping file included with HUMAnN that indicates which v4 SGBs are paired with which v3 pangenomes. Your observation is right though that there are some SGBs with no v3 pangenome equivalent (e.g. all uSGBs that were observed only from MAGs and never isolate genomes).