Gzip fastq as input

farmer2020 · July 27, 2022, 10:04am

Hi, I am using MetaPhlAn version 3.0.14, in the help, the gzip file format is not mentioned. May I ask whether the gzip fastq can be directly used as input for MetaPhlAn? If so, the --input_type should be selected “fastq”? For example, the part of command as below:
metaphlan xxx_1.fastq.gz,xxx_2.fastq.gz --input_type fastq

When running successfully, there is “WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.” for some samples, may I ask whether this is normal, and can be ignored or what to do?

Thanks
Wang

aitor.blancomiguez · July 28, 2022, 7:44am

Hi @farmer2020
Yes, it is possible to run MetaPhlAn 3.0.14 with FASTQ files compressed with gzip. Exactly, the --input_type should still be fastq.
The warning you are describing is normal. MetaPhlAn 3 includes markers describing species groups (for 1328 species as they were unlikely to be distinguishable in metagenomic samples, Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3 | eLife). When some of these species groups are detected, MetaPhlAn reports that warning for the user to be aware of it.

farmer2020 · August 10, 2022, 1:16am

Hi Aitor, many thanks for your fast reply.

farmer2020 · October 15, 2022, 1:18pm

Hi Aitor,
I am moving from MetaPhlAn from version 3.0.14 to version 4.0.2. May I further ask that the same warning is still produced, right? If so, the warning can be still ignored without any influence on the MetaPhlAn output, right? In addition, the compressed forward and reverse fastq.gz files can be as input files in the version 4.0.2, right?
Thanks

aitor.blancomiguez · October 18, 2022, 11:53am

Hi @farmer2020
Yes, in metaphlan 4 the input files are managed in the same way as in version 3

farmer2020 · October 18, 2022, 12:43pm

Hi Aitor,
Thanks for your reply. So the produced warning as same as MetaPhlAn3 should be still ignored with no any influneces on the taxonomic results, right?
Thanks again.

aitor.blancomiguez · October 18, 2022, 1:05pm

Hi @farmer2020
Yes, Exactly!

farmer2020 · October 19, 2022, 3:57pm

Hi Aitor, thank you very much for your confirmation.

farmer2020 · November 11, 2022, 3:01am

Hi Aitor,
I noticed that in the output profile from MetaPhlAn version 4.0.2, the number of reads processed sometimes is lower than the total paired reads, for example, “#4807601 reads processed” from fastqs with 3104148 reads in forward and reverse fastqs, respectively. May I ask if it is normal and the reason?
The header for forward and reverse fastq are exactly same, is it fine for MetaPhlAn version 4.0.2 to proceed?
The command used is: metaphlan fastq_forward,fastq_reverse --input_type fastq --bowtie2db bowtie2db/ --index mpa_vJan21_CHOCOPhlAnSGB_202103 --bowtie2out bowtie2.bz2 --unclassified_estimation --nproc 6 --output_file profile.txt
Thanks

aitor.blancomiguez · November 11, 2022, 9:01am

Hi @farmer2020
Yes, before mapping, metaphlan will perform a quality mapping of the reads passed as input removing short reads (<70bp length)

aitor.blancomiguez · November 11, 2022, 9:02am

MetaPhlAn do not account for paired information when using paired-end reads, and will account each reads of the pair a a single-end read.

farmer2020 · November 11, 2022, 11:59am

Hi Aitor, thanks for your reply for both questions, which make sense to me. So for the second question, whether the headers of forward and reverse fastq are same or not will make no differences for the output of metaphlan, am I right?

aitor.blancomiguez · November 11, 2022, 12:40pm

It will not as internally metaphlan will append an autoincremental number at the end of each read ID

farmer2020 · November 11, 2022, 12:53pm

Will not produce any differences?

aitor.blancomiguez · November 11, 2022, 1:15pm

Whether they are the same or not will not produce any difference in the output, they will still be accounted as independent reads

farmer2020 · November 11, 2022, 1:28pm

That’s great to learn more about MetaPhlAn. Thank you very much.

MalbertR · January 24, 2023, 1:29pm

Hi Aitor,

So, to continue on this (somewhat)…If I want to know how many reads actually mapped to the database (with the idea to use this number to multiply the relative abundance by to get absolute counts), what number should I look at? I’m guessing not the “reads processed”, but the ‘#estimated_reads_mapped_to_known_clades’?

aitor.blancomiguez · January 31, 2023, 8:46am

Hi @MalbertR
You can extract that information from both the sam output file or the bowtie2out.

Topic		Replies	Views
Just one species in my output file MetaPhlAn	0	233	July 14, 2021
Input Specification if Pre-Existing MetaPhlAn Files HUMAnN	4	512	May 20, 2021
Input for Panphlan PanPhlAn	2	419	July 22, 2021
Using multiple files as input for Metaphlan MetaPhlAn	1	514	July 28, 2023
Metaphlan-strainphlan: sam.bz2 StrainPhlAn	2	509	January 18, 2023

Gzip fastq as input

Related topics