High value of UNINTEGRATED reads


I have been analyzing mouse gut samples with HUMAnN 3.0. The properties of samples: pair-end, read_len=100, depth ~20M reads/sample.
Initially, I had a lot of unmapped reads (~60%). After investigating the log file, many reads were discarded based on the --translated-subject-coverage-threshold param. I lowered it to 30%, which made sense in my head, considering the depth of the sequencing. Now, I reduced the UNMAPPED to ~20-25%, which seems reasonable. Just to mention, the number of unmapped reads after the nucleotide alignment is >95%.

The problem is that now, I have ~70% of UNINTEGRATED reads. I understand that only some parts are mapping to pathways, but the thing that worries me is that over 65% are “UNINTEGRATED|unclassified”. If I understand correctly, even though reads can’t be integrated into a pathway, they should be classified to a species level.

Would you expect this kind of behavior?


