Predicted = if a species was expected to be at 5x coverage (based on the sample read depth and its desired relative abundance), then it contributes 5x expected coverage for all of the functions it encodes at single copy (or 10x for duplicated functions, and so forth).
Sampled = actually quantify the coverage of each function individually based on how reads were sampled. For example, two functions in the same genome could receive different coverage values by random read sampling. This can have a big impact on sensitivity in weakly covered genomes (if a read was never drawn from gene X, then you can’t quantify gene X, even if its source species had non-zero coverage).
We use the former (predicted) approach for calculating accuracy stats in all the main-text figures since it gives a LESS rosy view of performance. There is a figure in the supplement where we quantify how much of a hit to optimal performance is explained by not accounting for (typically) unknowable variation in read sampling. It was about 0.1 units of Bray-Curtis divergence in that example.