Using multiple files as input for Metaphlan

MichaelVMgh · July 4, 2023, 1:55pm

Hi,

I would like to ask a few questions regarding the execution of the tool:

1- First, I am using paired-end reads (L1 and L2 fastq.gz files) for each sample. But for each patient (patient ID) I have multiple samples, as I have one sample for each timepoint. E.g: 100-3mo-L1.fq.gz, 100-3mo-L2.fq.gz, 100-5mo-L1.fq.gz, 100-5mo-L2.fq.gz…etc.
I know that, even that Metaphlan does not use paired-end information, we can still feed the input with both paired-end files. However, is it possible to feed the input with multiple paired-end files (multiple samples)? I am trying to run the following code:

metaphlan {subset_reads_dir}/mo3_100_EKDN34700-1A_{sample_id}L1_1.fq.gz,
{subset_reads_dir}/mo3_100_EKDN34700-1A{sample_id}_L1_2.fq.gz
{subset_reads_dir}/mo5_100_EKDN34701-1A_HGD53_L1_1.fq.gz,
{subset_reads_dir}/mo5_100_EKDN34701-1A_HGD53_L1_2.fq.gz
–bowtie2out metagenome.bowtie2.bz2 --bowtie2db {metaphlan_db_dir} -x
mpa_v31_CHOCOPhlAn_201901 -t rel_ab_w_read_stats --nproc 24
–input_type fastq > {metaphlan_profile_dir}/{sample_id}.profile.txt;

Unfortunately, I am getting an error stating that it is receiving {subset_reads_dir}/mo5_100_EKDN34701-1A_HGD53_L1_1.fq.gz,\ and
{subset_reads_dir}/mo5_100_EKDN34701-1A_HGD53_L1_2.fq.gz\ as unexpected arguments. When I remove these files from the input, however, and I only use the two mo3 files (L1 and L2) as input, the run starts to work though. Is this normal?

2- Is it recommended to use 1 metaphlan run for all the samples of each patient (multiple paired-end files) and then merge the abundance output tables together? Or should we use only one metaphlan run for each single pair-end files (one L1 file and one L2 file)?

Many thanks in advance.
Best,
Michael

aitor.blancomiguez · July 28, 2023, 2:38pm

Hi @MichaelVMgh
Is important that you run one execution of metaphlan for each metagenomic sample , e.g.
metaphlan {subset_reads_dir}/mo3_100_EKDN34700-1A_{sample_id}L1_1.fq.gz,
{subset_reads_dir}/mo3_100_EKDN34700-1A {sample_id}_L1_2.fq.gz
–bowtie2out metagenome.bowtie2.bz2 --bowtie2db {metaphlan_db_dir} -x
mpa_v31_CHOCOPhlAn_201901 -t rel_ab_w_read_stats --nproc 24
–input_type fastq > {metaphlan_profile_dir}/{sample_id}.profile.txt

metaphlan {subset_reads_dir}/mo5_100_EKDN34701-1A_HGD53_L1_1.fq.gz,
{subset_reads_dir}/mo5_100_EKDN34701-1A_HGD53_L1_2.fq.gz
–bowtie2out metagenome.bowtie2.bz2 --bowtie2db {metaphlan_db_dir} -x
mpa_v31_CHOCOPhlAn_201901 -t rel_ab_w_read_stats --nproc 24
–input_type fastq > {metaphlan_profile_dir}/{sample_id}.profile.txt;

Topic		Replies	Views
Paired-end reads in MetaPhlAn3 MetaPhlAn	1	1595	July 7, 2020
MetaPhlAn 4 for paired end reads of multiple samples MetaPhlAn	4	1467	August 30, 2023
Paired end files processing MetaPhlAn	1	1826	December 3, 2021
Help for metaPhlan3 with paired-end reads MetaPhlAn	3	573	April 21, 2023
Would MetaPhlAn support PE mapping in future version? MetaPhlAn	8	332	November 30, 2022

Using multiple files as input for Metaphlan

Related topics