Should I concatenate all the files before running MetaPhlAn and HUMAnN?

DEEPCHANDA7 · November 24, 2020, 10:08am

HI, @franzosa @fbeghini -
I have found an already submitted dataset (PRJEB2054). Here, for each subject (alias) there are 2 or more Runs. And, the corresponding read length varies between the subjects. Some are 88bp while some are 150 bp.
Considering the difference in the number of files and read length between the subjects, can I just concatenate the files for corresponding subjects and run them MetaPhlAn and HUMAnN?
Or, I should catenate only two files from each subject that too with same read length?

Thanks,
DC7

franzosa · November 24, 2020, 4:34pm

Both tools will be robust to differences in read length within a file (including differences resulting from concatenation of files). Without knowing more about the dataset it’s hard to say if it’s safe to concatenate the files. If (e.g.) the same sample was run on two sequencing lanes, then it would be very safe to merge the resulting reads. If they were sequenced at two different centers or on two different platforms then merging could get weird (I’d be more inclined to profile them separately and compare the resulting profiles for technical differences).

DEEPCHANDA7 · November 24, 2020, 4:54pm

Thanks a lot @franzosa. I understand it is safe to profile the samples separately depending upon their read length. But what do you mean by

"…compare the resulting profiles for technical differences "?

franzosa · November 24, 2020, 5:26pm

E.g. are certain taxa/functions consistently enriched/depleted in shorter-read replicates compared to longer-read replicates?

DEEPCHANDA7 · November 24, 2020, 5:50pm

Ah… ok… I understand… actually I was confused as I mixed all types of files for each sample regardless number of sample specific files and read length. And, I found only 1 significant result after MaAsLin2. Whereas, LefSe output was around 36 significant taxa. Same difference also found in case of pathway also. Thus, I thought probably because of the catenation this is happening.

Topic		Replies	Views
Humann3 Paired end reads HUMAnN	20	6392	August 27, 2025
Paired-end reads in MetaPhlAn3 MetaPhlAn	1	1618	July 7, 2020
Paired-end files HUMAnN2 HUMAnN	10	4364	February 22, 2022
Merging results of metaphlan and humman tables for two batches of the same study (two different timepoints) Downstream analysis and statistics	10	51	January 25, 2026
HUMAnN 4.2 for long read data HUMAnN	7	182	September 6, 2025

Should I concatenate all the files before running MetaPhlAn and HUMAnN?

Related topics