Does Genefamilies, Pathabundance and pathcoverage work with split reads?

Sesh_1214 · March 14, 2024, 2:33pm

Hi there,

I am running HUMAnN on a number of fastq files that are part of a public database. Each sample has R1, R2 and R3. For most of my samples, I would concatenate R1, R2 and R3 to generate R4 and run HUMAnN on R4. For some of them, however, R1 and R2 are too big and HUMAnN takes a long time to run (my HPC permissions for durations of jobs do not extend for that long unfortunately). I was wondering if the output would make sense if I split R1 and R2 into two equal halves e.g. R1A and R1B + R2A and R2B and then ran HUMAnN on all of them individually and then combining the output files? I imagine because the output is in RPKs combining genefamilies should not be much of an issue but I am more curious with the other two.

Thanks for your help in advance!

franzosa · April 11, 2024, 6:16pm

This sort of thing can work in principle. You are right that if you divide the file in half and compute gene RPKs separately that the RPKs can then be summed to get a total RPK value (since RPKs behave like sequencing coverage). The one drawback with this approach is that HUMAnN often uses sequence coverage to decide if a sequence should be considered at all. It’s possible that a given sequence would fail to be sufficiently covered in either partial file but would be covered in the full file, thus making it a false negative under this approach.

Topic		Replies	Views
Running paired-end metagenome, only 3 result files were generated HUMAnN	1	28	March 7, 2025
Generate pathabundance and pathcoverage only HUMAnN	1	398	February 11, 2022
Combine HUMAnN gene families file for MaAsLin HUMAnN	3	295	December 8, 2023
Confusion with HUMAnN 'regroup_table' and higher-level pathway information HUMAnN	1	1183	February 2, 2024
Should I concatenate all the files before running MetaPhlAn and HUMAnN? HUMAnN	4	881	November 24, 2020

Does Genefamilies, Pathabundance and pathcoverage work with split reads?

Related topics