Jump to my Home Page Send me a message Check out stuff on GitHub Check out my photography on Instagram Check out my profile on LinkedIn Check out my profile on reddit Check me out on Facebook

HPC CPU Oversubscription

Oversubscription, running multiple "tasks" per "core", carries a risk of CPU overload. Unfortunately this risk is often vaguely understood, and vaguely understood risks lead to overly conservative behavior – translation: missed opportunity and higher compute costs. By transforming this vaguely understood risk into a real world, actuarial model we can quantify both the risk and the reward enabling a rational, financial decision making process.

While there are many components to a full oversubscription TCO, this paper focuses on one of the most difficult steps:

How to use measured job efficiency data to compute a first order risk model for CPU overload

Yes. The focus really is just on this single computation. Because the computation can take years to complete if approached directly, much of the material covers the computational algorithms required to make the calculation tractable.

Download PDF: CPU Oversubscription in Compute Clouds