Running humann from biobakery wmgx pipeline reruns metaphlan

Hi all,

I’m running into a problem using the biobakery wmgx pipeline to run metaphlan (v4.0.6) and humann (v3.6).

I initially ran the pipeline to perform only kneaddata and metaphlan taxonomic profiling (with --bypass-functional-profiling). I later decided I wanted to run the functional profiling so I used the same command, removing ‘–bypass-functional-profiling.’ The pipeline successfully skips kneaddata steps, but begins re-running the taxonomic profiling with metaphlan.

If I use --bypas-taxonomic-profiling the pipeline does not work, as the input folder is then the original fastq files (not taxonomic profiles). It gives the error “ERROR: Bypassing taxonomic profiling but all of the tsv taxonomy profile files are not found in the input folder. Expecting the following input files:”

However, changing the input folder to the location of the metaphlan-generated tsv taxonomic profiles gives the error “ERROR: No files were found in the folder with extension fastq.gz”

Is it possible to continue with the pipeline from this point without re-running metaphlan? I know I could run humann on the files without the pipeline, but it’s convenient to run this way with the slurm scheduling, as I have many files.

Thanks for the support!

Hi, Yes. There are two options. The first option is to place the MetaPhlAn taxonomic profiles (tsv files) in your input folder and use the --bypass-taxonomic-profiling option. The second option is to re-run the exact same command but remove the --bypass-taxonomic-profiling option like you did. This should not re-run MetaPhlAN if the output files (both tsv and sam) are all in the same folder and their timestamps have not changed. Hopefully one of these two options will work for you. Feel free to reply with any questions!

Thanks!
Lauren