The Edgecast Caching Emulator

Posted by Marcel Flores, Harkeerat Bedi

One of the core tasks in operating a Content Delivery Network is caching requested content for content providers. In order to ensure that our caching systems are well optimized, we are constantly exploring alternative caching techniques that involve a rich variety of optimizations and adjustments to our caching policies and structures. In order to rapidly develop and test such improvements, we have developed a caching emulator, the Edgecast Caching Emulator (ECE), to allow us to freely experiment with various techniques.

Edgecast provides CDN services for thousands of customers whose content includes live video streaming, static website contents, large software updates, and dynamic site content. This diversity of customer profiles, and consequently client access behaviors, means that caching solutions which may work well for some customers may induce pathological behavior in others.

In order to address these complex interactions, while still making improvements to the performance of the Edgecast caching systems, we have developed a mechanism for exploring the impact of new caching algorithms and techniques. In particular, this mechanism consumes real access data and allows us to evaluate the impacts of such changes on our core performance metrics before we implement and deploy such systems in production.

The Caching Emulator

The Edgecast Caching Emulator (ECE) is designed to provide insight into the behavior of arbitrary caching policies that we may want to implement in the production CDN. The core idea of the system is that it provides important cache metrics for a given set of access behaviors which will be given by real behavior observed from CDN access logs.

The figure below provides an overview of the emulator. The system takes a set of access logs taken from the production CDN as an input. Each request is then “played back” over the cache policy being simulated, where the object is either found in the cache (a cache hit) or must be retrieved from origin (a cache miss). The output of the system is then a new cache state (i.e. the set of files in the cache, which will depend on the cache policy being examined as well as the input logs), and a set of time series data. This time series data includes information about the hit rates (both in terms of number of requests and bytes), the number of bytes that were read and written from the cache, the amount fetched from origin, and other extensible properties.

Data flow diagram of the emulator. Each client request passes through a chain of arbitrarily configured caches. Output includes periodically computed metrics.

The ECE allows significant flexibility in the cache policies it can evaluate. In particular, it implements the cache as a chain of sub-caches. Each sub-cache may be of arbitrary size, implement an arbitrary admission and eviction policy, and have differing costs associated with read and write operations. While current implementations access the sub-caches linearly (e.g. they try the first cache, if a miss, try the second, etc.), non-linear paths may be implemented in the future.

By taking log data as its primary input, the system is able to assess the impacts on real customer traffic as seen directly by production servers. Doing so avoids many of the complexities and errors experienced by attempting to model customer traffic. Analysis of the time series output can further be subdivided by customer, allowing for explorations of the impacts on a per customer basis.

Current Policies

Currently, the ECE implements a handful of policies, which enable a strong foundation on which more complex caching systems can be explored. For example, one could easily construct a cache with a small first tier cache that uses probabilistic admission and FIFO eviction backed by a much larger cache with 2-Hit admission and LRU eviction. The tables below provide a brief overview of the available policies.

Type Description
N-Hit Admit on Nth request.
Probabilistic Admit with fixed probability.
Prob-Size Admit with probability dependent on the file size1
Type Description
FIFO A simple First In First Out queue.
LRU Least Recently Used.
COST LRU based method that equally weights file size and recency.
S4LRU S4LRU & Quadruply segmented LRU2.

An Open Source Project

In order to encourage further study on caching behaviors, we have released the ECE as an open source project. The release includes both the emulator itself, as well as instructions for use and implementing new policies, a small amount of sample data to demonstrate how the system works, and a handful of quickstart analysis scripts. Our hope is that this will be a useful tool to those exploring cache policies. We are especially looking forward to the implementation of additional policies on top of the basic set which we provide here. Happy caching!

  1. AdaptSize: Orchestrating the Hot Object Memory Cache in a Content Delivery Network, Daniel S. Berger, et al., Proc. of NSDI 2017 

  2. An analysis of Facebook photo caching, Qi Huang, et al., Proc of SOSP ‘13