Should I concatenate all the files before running MetaPhlAn and HUMAnN?

HI, @franzosa @fbeghini -
I have found an already submitted dataset (PRJEB2054). Here, for each subject (alias) there are 2 or more Runs. And, the corresponding read length varies between the subjects. Some are 88bp while some are 150 bp.
Considering the difference in the number of files and read length between the subjects, can I just concatenate the files for corresponding subjects and run them MetaPhlAn and HUMAnN?
Or, I should catenate only two files from each subject that too with same read length?


Both tools will be robust to differences in read length within a file (including differences resulting from concatenation of files). Without knowing more about the dataset it’s hard to say if it’s safe to concatenate the files. If (e.g.) the same sample was run on two sequencing lanes, then it would be very safe to merge the resulting reads. If they were sequenced at two different centers or on two different platforms then merging could get weird (I’d be more inclined to profile them separately and compare the resulting profiles for technical differences).

1 Like

Thanks a lot @franzosa. I understand it is safe to profile the samples separately depending upon their read length. But what do you mean by

"…compare the resulting profiles for technical differences "?

E.g. are certain taxa/functions consistently enriched/depleted in shorter-read replicates compared to longer-read replicates?

1 Like

Ah… ok… I understand… actually I was confused as I mixed all types of files for each sample regardless number of sample specific files and read length. And, I found only 1 significant result after MaAsLin2. Whereas, LefSe output was around 36 significant taxa. Same difference also found in case of pathway also. Thus, I thought probably because of the catenation this is happening.