Understanding Application-level Caching in Web Applications: a Comprehensive Introduction and Survey of State-of-the-art Approaches

Jhonny Mertz 0000-0002-2522-4700 Universidade Federal do Rio Grande do SulInstituto de InformáticaPorto AlegreRS91501-970BR and Ingrid Nunes Universidade Federal do Rio Grande do Sul and TU DortmundInstituto de InformáticaPorto AlegreRS91501-970BR

(2009; February 2007; March 2009; June 2009)

Abstract.

A new form of caching, namely application-level caching, has been recently employed in web applications to improve their performance and increase scalability. It consists of the insertion of caching logic into the application base code to temporarily store processed content in memory, and then decrease the response time of web requests by reusing this content. However, caching at this level demands knowledge of the domain and application specificities to achieve caching benefits, given that this information supports decisions such as what and when to cache content. Developers thus must manually manage the cache, possibly with the help of existing libraries and frameworks. Given the increasing popularity of application-level caching, we thus provide a survey of approaches proposed in this context. We provide a comprehensive introduction to web caching and application-level caching, and present state-of-the-art work on designing, implementing and managing application-level caching. Our focus is not only on static solutions but also approaches that adaptively adjust caching solutions to avoid the gradual performance decay that caching can suffer over time. This survey can be used as a start point for researchers and developers, who aim to improve application-level caching or need guidance in designing application-level caching solutions, possibly with humans out-of-the-loop.

application-level caching, web caching, web application, self-adaptive systems, adaptation

Author’s addresses: J. Mertz and I. Nunes, Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre RS, 91501-970, Brazil; emails: {jmamertz, ingridnunes}@inf.ufrgs.br

^†^†journal: CSUR^†^†journalvolume: 9^†^†journalnumber: 4^†^†article: 39^†^†journalyear: 2017^†^†publicationmonth: 3^†^†copyright: acmlicensed^†^†doi: 0000001.0000001^†^†ccs: General and reference Surveys and overviews^†^†ccs: Information systems Web applications^†^†ccs: Theory of computation Caching and paging algorithms^†^†ccs: Software and its engineering Software performance

1. Introduction

Due to the increasing user demand for web-based applications, many optimization techniques have been proposed over the past years (Ravi et al., 2009) to allow web-based applications to deliver adequate performance. A popular technique to scale and improve the performance of this kind of application is web caching, which consists of keeping data that are expected to be frequently requested easier to be retrieved, e.g. without the need for recurrent calculations and processing.

Many efficient web caching solutions have been proposed and employed in the web system architecture. However, traditional caching solutions are deployed outside the application boundaries as a transparent layer, and thus caching decisions are not made taking into account particularities of specific web applications. As a result, traditional types of caching are usually less efficient when the web application processes complex logic and personalized content. Therefore, application-level caching can complement other caching solutions and potentially improve performance and scalability of web-based applications, as well as reduce the workload and communication delays of these systems.

However, application-level caching solutions are usually developed in an ad-hoc way, based on conventional wisdom of web usage patterns (Mertz and Nunes, 2016). It means that the caching design and implementation, which may initially satisfy desired non-functional requirements, may become obsolete due to application changing dynamics. Thus, to ensure that the cache continues contributing to the application performance, a frequent adjustment of caching decisions is needed, which implies additional time spent by developers on maintenance (Radhakrishnan, 2004).

Given the issues involved with application-level caching, substantial advances have been made towards supporting developers while developing such caching solutions. In this paper, we provide a comprehensive overview and comparison of existing approaches proposed in this context, so that it is possible to understand what can be put into practice and remaining open issues. We first provide an introduction to web and application-level caching and then focus on surveying static and adaptive approaches in the literature. Thus, the scope of this survey has two key dimensions: static and adaptive application-level caching approaches. The former refers to non-adaptive solutions to help developers design, implement and maintain an application-level caching solution, typically focusing on raising the abstraction level of caching, with the provision of caching implementation support or automating some of the required tasks, thus providing caching management support. The latter is focused on cache management approaches that can adaptively deal with caching issues, usually by monitoring and automatically changing behavior to achieve desired objectives.

Web caching is an in-depth studied optimization technique, and previous reviews on caching locations and deployment schemes (Ravi et al., 2009; Zhang et al., 2015b; Abdullahi et al., 2015; Hattab and Qawasmeh, 2015; Tamboli and Patel, 2015), coordination tasks (Podlipnig and Böszörmenyi, 2003; Balamash and Krunz, 2004; Ali et al., 2011), measurement (Domènech et al., 2006), and adaptation (Venketesh and Venkatesan, 2009; Ali et al., 2011; Sulaiman et al., 2011; Ali et al., 2012b, a) have already been published and discussed. However, these reviews mainly address the context of web pages (at the proxy level), which are not always best suited to the application-level or other caching locations. Although our paper may contain some overlapping topics, our survey differs itself from previous surveys because it focuses specifically on caching issues applied to application-level caching design, implementation, maintenance, and adaptation, which are associated with distinguished challenges.

The remainder of this paper is organized as follows. Section 2 gives a comprehensive overview of challenges and issues of application-level caching. Section 3 presents static and adaptive application-level caching approaches, focusing on the issues addressed, benefits and techniques employed. Finally, conclusion, open challenges and future directions in the field are presented in Section 4.

2. Introduction to Application-level Caching

As opposed to caching alternatives that are placed outside of the application boundaries, application-level caching allows storing content at a granularity that is possibly best suited to the application. For instance, modern web applications nowadays provide customized content and, in this case caching the final web pages is usually useless. We consider a web-based application any application that contains data and business logic maintained by developers or contains a presentation logic to provide content or features to users through the web. Therefore, application-level caching can be used to separate generic from specific content at a fine-grained level.

Application-level caching is mainly characterized by caching techniques employed along with the application code, i.e. business, presentation and data logic. Therefore, it is not tied up to a specific caching location (server-side, proxy or client-side), because it can be conceived at the server-side, to speed up a Java-based application that produces HTML pages sent to users, as well as at the client-side, as a JavaScript-based application that executes part of its logic directly on the client’s browser. In both situations, developers can reason about caching and implement a caching logic to satisfy their needs.

Thus, application-level caching has become a popular technique to reduce the workload on content providers, in addition to other caching layers surrounding the application. Such popularity is confirmed by Mertz and Nunes (2016) that analyzed ten web applications and showed that application-level caching represents a significant portion of lines of code of investigated applications as well as a significant number of issues, considering that caching is essentially a non-functional requirement. In addition, Selakovic and Pradel (2016) analyzed 16 JavaScript open source projects and reported that 13% of all performance issues are caused by repeated execution of the same operations, which could be solved by means of application-level caching resulting in an improved performance and decreased user perceived latency.

We next provide an introduction to application-level caching. For a background on caching and web caching, we refer the reader to Appendix A.

2.1. Overview

To ease the understanding of what application-level caching is, we provide an introductory example in Figure 1, in which the process of using and implementing application-level caching is detailed. First, a web application receives a request (step a), which is eventually processed by a component C1. However, C1 depends on C2, and calling and executing C2 may imply an overhead regarding computation or bandwidth. Therefore, C1 manages to cache C2 results and, for every request, C1 verifies whether C2 should be called or there are previously computed results already in the cache (step b). If the content is cached, a hit occurs, and C1 can avoid calling C2. However, when a miss occurs, computation by C2 is required (step c). Finally, the results of C2’s computation can be cached to process future requests faster. These steps are those commonly performed by any implementation of caching. The key difference is that, in application-level caching, the responsibility of managing the caching logic is entirely left to application developers, possibly with the support of frameworks that provide an implementation of the cache system, such as EhCache¹¹1http://www.ehcache.org/ and Memcached²²2https://memcached.org/.

Given that application-level caching is essentially a caching solution, it shares commonalities such as coordination issues and metrics with other caching and web caching solutions. For more information regarding general caching and web caching issues, we refer the reader to Appendix A, where we provide an extensive background on caching and web caching, pointing out to other surveys and papers that provide more detailed research on each topic.

2.2. Challenges and Issues

Building an application-level caching solution involves four key challenging issues: choosing what content to cache, defining when the selected content will be inserted into and removed from the cache component, determining where to store cache content efficiently, and deciding how to properly implement caching logic (Mertz and Nunes, 2016). The latter should consider that both the cache and underlying source of data should be handled and linked with each other by the application, as shown in Figure 1(a). Therefore, developers must manually insert and retrieve content, translate between raw data and cache objects, assign keys, and keep consistency between the cache and the source of the cached content. Such logic is placed within the application and is usually tangled with the business logic—see Figure 1(b), in which caching decisions are made explicit (i.e. which objects to get, put or remove from the cache), as opposed to an implicit cache, in which the cache is implemented as a transparent layer.

Caching implementation and maintenance are thus a challenge because caching becomes a cross-cutting concern mixed with business logic and spread all over the application base code, resulting in increased complexity and maintenance time (Ports et al., 2010). Moreover, web applications are usually not conceived to use caching since their beginning. As the application evolves, complexity or usage analysis is performed and may lead to a performance improvement demand (Radhakrishnan, 2004). Developers must thus refactor the application business code to insert caching logic into the proper locations. Although it is often impossible to predict that an application will require application-level caching in the future, posterior cache implementation can lead to significant rework due to refactoring, and an implementation that would have better quality if thought before the application reaches advanced stages of development.

Despite implementation issues, deciding what and when to cache specific content demand a significant effort and reasoning from developers. Such decisions involve the choice for cacheable content, in which an admission policy should be adopted, and the definition of rules to maintain consistency between cache and source of data, in order to avoid stale content. If these decisions are not taken properly, it can result in an increased cache memory consumption and, at the same time, the cached content does not lead to hits, which tends to decrease the application performance. Therefore, it is crucial to understand what are the typical usage scenarios, how often the data selected to be cached is going to be requested, how much memory this data consumes, and how often it is going to be updated. Finally, developers should concern where the cached data will be stored. Such issue involves the management of the cache system, which consists of several non-trivial decisions, such as establishing a replacement policy and defining the size of the cache (Mertz and Nunes, 2016).

These issues require a significant and manual effort from application developers, given that the design and maintenance of this caching demand extensive knowledge of the application to be properly done. Given these shortcomings, application-level caching development is not trivial and, therefore, is a time-consuming and error-prone task, as well as a common source of bugs (Ports et al., 2010; Gupta et al., 2011; Mertz and Nunes, 2016).

2.3. Static vs. Adaptive Application-level Caching

A fundamental problem of application-level caching is that all issues mentioned above usually demand extensive knowledge of the application to be properly solved. Consequently, developers manually design and implement solutions for all those mentioned tasks. To provide solutions to deal with caching that require less intervention, thus easier and faster to be adopted, static approaches have been proposed to help developers while designing, implementing and maintaining an application-level caching solution. Static approaches usually focus on decoupling this non-functional requirement from the base code and providing ready-to-use caching components with basic functionalities and default configurations, automating some of the required tasks such as a storage mechanism and standard replacement policies. However, even when leveraging static solutions to ease the development of a caching solution, the issues and challenges related concerning design and maintenance remain unaddressed, because default configurations do not cover all the design issues, and those provided may not even perform well in all contexts.

Therefore, developers must still specify and tune cache configurations and strategies, taking into account application specificities. A common approach to develop a cache solution is to define configurations and strategies according to well-accepted characteristics of access and common-sense assumptions. Then, cache statistics, such as hit and miss ratios, can be observed and used to improve the initial configuration. This tuning process of caching configurations is repeated until a steady performance improvement is achieved. As a result, to achieve caching benefits so that the application performance is improved, it is necessary to tune cache decisions constantly. Despite the effort to do so, eventually, an unpredicted or unobserved usage scenario may emerge. As the cache is not tuned for such situations, it would likely perform sub-optimally (Radhakrishnan, 2004).

This shortcoming motivates the need for adaptive caching solutions, which could overcome these problems by adaptively adjusting caching decisions at run-time to maintain a required performance level. Moreover, an adaptive caching solution minimizes the challenges faced by developers, requiring less effort and providing a better experience with caching for them. Such adaptive caching solutions aim at optimizing the usage of the infrastructure, in particular for the caching system.

Given that we introduced application-level caching and provided the background needed to understand the approaches that are discussed in this paper, we now proceed to the presentation of different ways to provide a caching solution for developers as well as static and adaptive approaches that were proposed in the context of application-level caching.

3. Application-level Caching Approaches

To introduce existing approaches in the context of application-level caching, we outline in Figure 2 a taxonomy of application-level caching solutions, also indicating all the works discussed in our survey. Issues and challenges that appear in this taxonomy were introduced in the previous sections.

It is important to mention that our initial intention was to perform a systematic review of application-level caching approaches. However, while specifying appropriate search strings, many alternative searches led to either a small set of papers, with many important papers not being retrieved, or a large set that included a significant amount of unrelated works. This large set was unfeasible to be processed in a timely fashion. This is due to the investigation of caching in a wide variety of contexts. Given the different types of caching that can be employed through the web infrastructure and the lack of specific nomenclature that identifies each web caching technique, it is difficult to match solely papers that deal with application-level issues. Furthermore, terms such as adaptive, web application, and cache, associated with our study, are widely used in many other contexts. As result, proceeding with a systematic approach would result in a poor review of application-level caching. Therefore, we conducted our survey with studies we collected using a less rigorous selection process. We collected many relevant papers using alternative query strings, and further searched for papers published by key authors or cited in relevant papers, using a snowball approach, until we reached a fixed point. This gives us confidence that relevant papers are indeed included in our survey.

As presented in Figure 2, approaches are classified into three categories, depending on how the approach helps developers with caching issues. We next discuss these approaches, following such categorization. First, we present work focused on supporting the implementation of a caching solution. Then, static approaches regarding coordination issues of application-level caching are presented and discussed. Finally, we introduce caching approaches that can adapt their behavior, typically by means of a feedback loop.

3.1. Cache Implementation

In order to reduce the effort demanded by developers while implementing application-level caching, implementation-centered approaches have been extensively developed. Such approaches focus on the provision of solutions that raise the level of abstraction and reduce a significant amount of caching logic to be added to the base code. Manually controlling and maintaining the caching logic may be error-prone and tedious, because it is often repetitive and not related to the business logic, as presented in Section 2. Therefore, a research challenge in application-level caching is how to ease the implementation of a caching logic for developers.

Usually, the idea underlying solutions to implementation issues is that the application effort can be reduced by providing a system or library that handles some caching operations, freeing developers to write the most relevant code (i.e. business logic). As shown in Figure 2, approaches related to caching implementation can be categorized into two types, concerning the adopted programming model: programmatic and compositional. The former model requires code changes to take advantage of the caching approach and usually results in an application-specific solution. For example, EhCache provides a full-featured caching component, which is responsible for dealing with memory and disk store. However, it requires coding with the EhCache provided classes or annotations to perform the caching operations. The latter has a lower impact on the application, as it does not require to introduce code interleaved with its base code.

3.1.1. Programmatic

The simplest programmatic approach for application-level caching consists of the use of abstract solutions such as design and implementation patterns. Recent work done by Mertz and Nunes (2016) provided practical guidance to developers as patterns and guidelines to be followed while designing, implementing and maintaining application-level caching, derived from a set of existing web applications that use application-level caching. Although simple, patterns are abstract solutions and require a significant amount of changes in the base code to be adopted, which may not provide an adequate degree of transparency and flexibility to developers (Pohl, 2005). To provide concrete support, libraries and frameworks have been developed, providing useful and ready-to-use cache-related features. Redis³³3https://redis.io/, Memcached, Spring Caching⁴⁴4https://docs.spring.io/spring/docs/current/spring-framework-reference/html/cache.html, EhCache and Caffeine⁵⁵5https://github.com/ben-manes/caffeine are popular examples of libraries and frameworks. For a deeper analysis of typical cache systems and libraries, as well as their main techniques regarding data management, we refer the reader to elsewhere (Zhang et al., 2015b).

Typically, concrete solutions provide an in-memory storage component along with methods that allow to put and get arbitrary content, identified by a key, i.e. a hash table. The key benefit of such implementations is that they are: (a) flexible, because different types of content can be stored, ranging from database query results to entirely generated web pages; (b) simple; and (c) scalable. Besides being storage units, these caching solutions also provide support to basic caching issues, such as memory allocation, eviction policies, and size limitations. Despite these advantages, adopting such solutions still require a programming effort to manage the cache and couples caching concerns to the base code, which adds complexity and reduces the possibility of reuse within it (Ports et al., 2010; Wang et al., 2014).

Distributed key-value stores (or caches) have become popular for supporting large-scale web applications because they can easily scale up—i.e. they can be distributed to multiple servers, which means it can linearly scale to handle greater transaction loads—and cache a significant portion of user requests, improving hit ratios and response time. Such distributed caches are an essential component in the infrastructure required by large enterprises such as Netflix⁶⁶6http://techblog.netflix.com/2016/03/caching-for-global-netflix.html, Facebook (Nishtala et al., 2013) and Twitter⁷⁷7https://github.com/twitter/twemproxy. Besides design, implementation and maintenance issues of application-level caching, there is research work that proposes and improves cache components in the context of distributed infrastructures (Zhu et al., 2012; Fan et al., 2013; Xu et al., 2014; Li et al., 2015). These approaches are not further discussed in this survey, given that our focus is on issues specific to application-level caching, and not broad caching issues, such as storage, concurrency management, and other topics of distributed systems. Such topics have been widely researched and addressed by other communities (Zhang et al., 2015a; Abdullahi et al., 2015).

Applications can also be developed with business logic at the client-side, typically with the support of JavaScript frameworks. Popular frameworks provide features to develop complex and sophisticated applications. However, caching configurations are usually specified by the content provider (web server that received the request) in the header of the response, and caching is automatically done by browsers, which are limited to caching cache entire web pages or request responses. To increase the flexibility regarding the customization of caching configurations at the client-side, Huang et al. (2010) proposed a caching framework in which developers can define the appropriate strategy among pre-defined possibilities. Similar to the caching provided by browsers, the proposed framework does not require developers to manage the cache content manually but allows customizations, such as setting cache granularity and eviction policies. Huang et al. (2010) also proposed an approach to adaptively deal with consistency management, which is better explored in Section 3.3.

3.1.2. Compositional

Compositional approaches can partially relieve the developers’ burden by automating tasks through declarative or seamless approaches. The former is based on the provision of knowledge associated with the semantics of application code and data. In this case, annotations referred to as assertions or contracts, are used to describe application properties, which can be processed at runtime to execute specific tasks without coupling the caching logic with the application base code. Such annotations have become widely adopted in dynamic programming languages (Stulova et al., 2015). CacheGenie (Gupta et al., 2011) is a system that provides a higher level of abstraction for caching frequently observed query patterns in web applications. These abstractions take the form of declarative query objects and, once developers specify them, insertions, deletions, and invalidations are done automatically. CacheGenie is based on triggers inside the database to automatically invalidate the cache, or keep it synchronized with the data source, as expressed by the developer in the application code. Similarly, Ports et al. (2010) proposed TxCache which offers for developers a simple way to indicate database query functions as cacheable, and then TxCache automatically manages and caches those marked function results. CacheGenie and TxCache also provide automatic consistency management, which is better explored in Section 3.2.

Seamless solutions are those that are coupled to the application in a transparent way, being added to the application, for example, as a surrounding layer, similarly to, e.g., a database and web proxy, without the need for refactoring application code. Such solutions are especially useful in scenarios in which the application was not conceived to use caching since the beginning, and requests for performance and scalability improvements emerged after many releases. In this context, middleware-based approaches that integrate two or more layers of a web application have been proposed as easy-to-use and transparent solutions to deal with caching (Huang et al., 2010; Wang et al., 2014). Furthermore, aspect-oriented programming (AOP) (Kiczales et al., 1997) has been explored by increasing modularity with an improved separation of cross-cutting concerns. As caching is fundamentally a cross-cutting concern, Bouchenak et al. (2006) and Surapaneni et al. (2013) demonstrated the use of AOP to develop caching solutions. Such approaches come as an alternative to refactoring the application with the introduction of programmatic approaches. Although faster results may be obtained with transparent solutions, they can be hard to support and tune, because system administrators, or even developers, might need to be specially trained or experienced in particular caching solutions and scenarios to configure them properly.

A non-exhaustive survey on database and mid-tier transparent approaches are presented by Ravi et al. (2009), while Ali et al. (2011) surveyed some studies of proxy-level caching. We complement such surveys with seamless approaches that consider application-level specificities, which are better explored in Sections 3.2 and 3.3, given they also provide static or adaptive solutions to design and maintenance of application-level caching.

Although programmatic and compositional approaches focused solely on implementation issues can raise the level of abstraction and, consequently, reduce a significant amount of caching logic to be added to the base code, they still require design reasoning, such as deciding whether to cache content and ensuring consistency. Therefore, a more fine-grained solution would require reduced effort and input from developers, which are the focus of the approaches described next.

3.2. Static Cache Management

Static application-level approaches are those that address design and maintenance issues of caching focusing on automating required tasks or giving suggestions towards easing the developer reasoning. According to the taxonomy presented in Figure 2, static cache management solutions are classified into three categories, depending on which caching issue they address. The remainder of this section groups static approaches according to these categories.

3.2.1. Content Admission

There are two main ways of helping developers select cacheable content: providing caching recommendations, or providing an automated admission by automatically identifying and caching content. Table 1 summarizes the caching approaches that deal with admission issues.

Table 1. Static Caching Approaches for Selection of Cacheable Content.

	Approach	Based on	Content	Data Input	Analysis	Output
Recommendation	MemoizeIt (Della Toffola et al., 2015)	Iterative profiling	Method calls	Time, frequency, and input-output profiling	Hit ratio, invalidation and size thresholds, and estimations	Report with ranked list of potential opportunities to the user for manual inspection
	Subsuming Methods (Maplesden et al., 2015)	Application profiling	Method calls	Calling context tree	Customized metric based on method distance in the context tree	Subset of the methods in the application that are interesting from a performance perspective
	Xu (2012)	Application profiling	Data Structures	Heap data structures for each allocation site during the execution of the application	Customized metric to approximate reusability based on three different levels (instance, shape, and data)	Report with a list of top potentially reusable allocation sites to the user for manual inspection
	Cachetor (Nguyen and Xu, 2013)	Abstract dependency graph analysis	Bytecode instructions, data structures, and method calls	Instructions, data structures and call sites profiling	Cacheability measurements for each type of content based on frequency	Ranked list of potential opportunities
Automated Caching	IncPy (Guo and Engler, 2011)	Application profiling	Method calls	File accesses, value accesses, and method calls	Safeness (consistency) and worthwhileness (expensiveness) heuristics	Automatically caches and invalidates data files
	Sqlcache (Scully and Chlipala, 2017)	Application profiling	Database queries generated by the application	SQL statements	SQL analysis and program instrumentation	Automatically caches and invalidates database query results
	EasyCache (Wang et al., 2014)	Database queries monitoring	Simple database queries	SQL statements	Query text analysis	Automatically caches and invalidates database query results
	AutoWebCache (Bouchenak et al., 2006)	Web requests and database queries monitoring	Final web pages	Servlet method calls and definition of database queries	Content dependency analysis	Automatically caches and invalidates web pages
	Surapaneni et al. (2013)	Database queries monitoring	Database sub-queries	Frequency, cost of updates and selectivity of database queries	Utility function-based	Automatically caches and invalidates sub-queries

The first way of helping developers while admitting content in the cache is by recommending improvement opportunities. Approaches in this context are usually based on the analysis of application profiling information, which can capture application-specific details through monitoring its execution when facing different situations. Traditional profiling approaches typically record measurements of method calls. Usually, cost measurements are captured with calling context information, which conveys the hierarchy of active methods calls of a request.

Della Toffola et al. (2015) addressed this problem by identifying and suggesting method-caching opportunities. Their approach, called MemoizeIt, is based on comparing inputs and outputs of method calls and dynamically identifying redundant operations, according to specified thresholds. To prevent the expected overhead of the approach, which implies a comparison of all method invocations, MemoizeIt analyzes the collected executions level by level through iterations. First, it compares input and output objects without tracking dependencies, and then iteratively tracks deeper dependency levels of objects that remain under the specified thresholds. By doing this, MemoizeIt can explore the application traces faster to define a set of methods that would provide benefits if cached. Also by analyzing method calls in an application profile, the work of Maplesden et al. (2015) can identify the entry point to repeated patterns of method calls, which are called subsuming methods and are identified by analyzing the smallest parent distance among all the common parents of a method.

Xu (2012) focused on a common problem in object-oriented applications, in which an allocation site creates data structures with the same content repeatedly, but in different moments during an execution. He follows the same principle of helping developers with a report, which lists the top allocation sites that create such data structures. Then developers can manually inspect the code and implement the appropriate solution to improve performance. Cachetor, proposed by Nguyen and Xu (2013), addresses repeated computations by suggesting spots of invariant data values that could be cached for later use. They proposed a runtime profiling tool, which uses a combination of dynamic dependency and value profiling to identify and report operations that keep generating identical data values. However, Cachetor makes strong assumptions about the programming language, such as the presence of a specialized type system, which is not valid in dynamically-typed languages. Infante (2014) addressed this type system restriction by proposing a multi-stage profiling technique that uses dynamically collected data to reduce the profiling overhead.

Although these approaches can reduce complexity and time required by caching design, developers should still review the recommendations, decide whether to cache or not the suggested opportunities and integrate the appropriate caching logic into the application. Thus, a second way of helping developers while dealing with content admission is to automatically identify and cache the cacheable content, as opposed to just report potentially cacheable methods. Such approaches require not only ways to analyze the application behavior, but also mechanisms to manage cache and application at runtime.

Guo and Engler (2011) achieved this by implementing a customized Python interpreter, namely IncPy, which includes an approach to identify and cache repetitive creation or processing of data files stored on disk. Such disk accesses may lead to long-running method calls resulting in bottlenecks. Thus, at runtime, it monitors and analyzes disk operations based on heuristics regarding safeness and worthiness, which are used as reference to cache and evict content. Similarly, Scully and Chlipala (2017) proposed Sqlcache, which consists of a compiler optimization that is able to analyze the application code and generate the appropriate caching logic for database queries. However, Sqlcache is specifically proposed and implemented to web applications in the Ur/Web⁸⁸8http://www.impredicative.com/ur/ programming language.

EasyCache (Wang et al., 2014) combines properties of database caching with application-level specificities to provide a mid-tier mechanism, which caches database queries and translates it into application-level objects in a seamless way. Such approach is built on top of a popular application programming interface to access databases, and thus does not require any modification of the application code. As it is delivered as a caching layer over the integration between application and database, it is able to inspect all the text from database queries that comes from the application in order to automatically find cacheable queries and maintain consistency. However, in order to keep the approach feasible regarding memory and network bandwidth, complex queries (e.g. nested and statistics queries) are unsupported, as well as large objects. Moreover, because it is implemented at a lower level, as a Java Database Connectivity (JDBC) driver, the approach does not work with object-relational mapping frameworks (despite the authors argued that the integration is feasible), which are commonly used while implementing web applications.

Also based on JDBC, Bouchenak et al. (2006) used AOP to implement a middleware support for caching, named AutoWebCache. Such middleware is based on intercepting web application servlets as well as database queries through JDBC. AutoWebCache then automatically integrates front-end and back-end by analyzing the queries to identify cacheable web pages. Furthermore, such analysis gives the content dependencies, which are used to trigger cache invalidations to maintain consistency with the back-end database. Despite built to support only databases as source of information, this approach is generic enough to be implemented with other types of sources. Furthermore, AOP is used by Surapaneni et al. (2013), which described an approach to cache the results of join sub-queries, with the goal of caching partial query results, rather than whole queries. Such approach identifies cacheable sub-queries based on a cache factor, which takes into account monitored information such as frequency, cost of updates and an estimation of selectivity (calculated through histograms built from observed collections of data at runtime) of sub-queries.

3.2.2. Consistency Maintenance

We can classify static application-level caching consistency approaches as relying on two main techniques: expiration and invalidation. The former provides higher availability by allowing stale data to be returned from the cache. The latter ensures that no stale data is returned by identifying when the underlying data source is modified and evicting related cached items. For more detailed information regarding web caching consistency concepts, please refer to Appendix A. Table 2 summarizes the caching approaches that address these problems of dealing with consistency management.

Table 2. Static Caching Approaches for Consistency Management.

Approach	Based on	Content	Data input	Analysis	Consistency Technique
TxCache (Ports et al., 2010)	Database queries monitoring	Results of database queries generated by application	Queries designated as cacheable by developers	Transactional context	Invalidation-based
CacheGenie (Gupta et al., 2011)	Database queries monitoring	Results of database queries generated by application	Queries designated as cacheable by developers	Database triggers	Invalidation or Expiration-based
Leszczyński and Stencel (2010)	Application and database monitoring	Database content	SQL statements	Dependency graph	Invalidation-based

Ports et al. (2010) introduced TxCache, which consists of a caching approach with transaction support for data-driven web applications. TxCache associates cached content with invalidation tags, which indicate content dependencies. Every query processed in the database server, if not read-only, is returned along with its invalidation tags, which indicates content that is affected by the write operation (e.g. insertions, updates and removals). Write operations trigger invalidation processes that take the invalidation tags and evict all the related content in the cache, based on the presence of such tags. By doing so, TxCache provides a validity interval for cached content, instead of defining an arbitrary expiration time. Unlike TxCache, CacheGenie (Gupta et al., 2011) keeps consistency by performing in-place updates rather than invalidating and recomputing data. CacheGenie is integrated with the object-relational mapping module provided by the web framework Django⁹⁹9https://www.djangoproject.com/ and takes as input caching configurations for each application object that is mapped to the database through a key-value structure. Such specification contains strategies, dependencies and other meta-data about caching and are used to automatically creating triggers within the database to evict and update cached data. Furthermore, CacheGenie also provides a weak consistency approach based on the definition of an expiration time, considering that many web applications do not require strong consistency approaches.

Instead of requiring developer inputs, Leszczyński and Stencel (2010) described an approach to cache data, which keeps it in a consistent state based on a dependency graph that provides a mapping between update statements in a relational database and cached content. When a database update is performed, the graph allows detection of cached content that must be invalidated to preserve the consistency of the cache and the data source. Also focused on keeping consistency by using the database as source of information, Wang et al. (2014) and Bouchenak et al. (2006) both propose solutions in the form of a middleware, which monitors database queries through JDBC and, based on the analysis of queries to track content dependencies, it triggers invalidations of all cached content that is related to a change, when a changing query is executed.

3.2.3. Size-limitation Management

Although we in this survey do not explore approaches that focus on general caching issues such as concurrency, scheduling, and other distributed systems topics, there are infrastructure-related issues that developers usually need to configure or at least be aware of, when implementing application-level caching (Mertz and Nunes, 2016), being cache size one of them. Because caches are size-limited, two main decisions must be made: (i) the definition of the adequate cache size, and (ii) the choice of a replacement algorithm. Table 3 summarizes the caching approaches that address these problems of dealing with size limitation.

Table 3. Static Caching Approaches for Size-limitation Management.

Approach	Based on	Data Input	Analysis	Output	Size-limitation Issue
Mimir (Saemundsson et al., 2014)	Cache profiling	Cache size	Simulation-based	Hit ratio estimative according to the available cache size	Cache size definition
Zaidenberg et al. (2015)	Cache profiling	Recency and frequency of cached content	Heuristic-based	Cached content to be evicted	Replacement
GD-Wheel (Li and Cox, 2015)	Cache profiling	Recency and cost to retrieve of cached content	Heuristic-based	Cached content to be evicted	Replacement

Regarding the first decision, Saemundsson et al. (2014) proposed an insight into how much memory is necessary to trade-off the desired performance. Their approach is based on an online profiler, called Mimir, which estimates how the popular least recently used (LRU) algorithm would perform with different ways of memory allocation, by continually exposing the hit ratio curve as a function of size cache. Thus, the approach allows what-if questions, providing dynamic estimations of the cost and performance regarding different cache sizes.

While cache size is a key to improve the cache efficiency, replacement algorithms are fundamental as well. Such algorithms are usually based on simple heuristics such as recency, frequency or even randomly. Popular and well-accepted examples of replacement algorithms are the already mentioned LRU and least frequently used (LFU). Even though replacement seems a solved problem (Podlipnig and Böszörmenyi, 2003), new proposals have been developed. Zaidenberg et al. (2015) presented a quantitative evaluation of alternative replacement strategies that consider both recency and frequency properties. They control data replacement through the management of different lists of cached items and levels of frequency and recency. The authors used LRU as a baseline and, according to benchmarks, the new proposals can increase the hit ratio. Despite this improvement, there are disadvantages, such as the sensibility to cache thrashing.

Recently, Li and Cox (2015) proposed GD-Wheel, a cost-aware replacement policy based on the greedy-dual size (GDS) algorithm. GD-Wheel brings the exact GDS in the context of application-level caches, given that GDS was initially proposed by Young (1994) for web proxy caches. In a nutshell, besides the benefits of LRU, GDS also takes into account the cost to retrieve a piece of content while selecting content to be evicted (Cao and Irani, 1997). Thus, GD-Wheel provides developers with a mechanism to include the cost information on each insertion or update in the cache. Such cost is taken into account in the replacement decisions. GDS works by maintaining a value of remaining priority $Rp$ for each cache entry. When inserting or reusing a piece of content $i$ , $Rp(i)$ is defined as the cost of retrieving it $c(i)$ . When an eviction is required, the piece of content with the lowest $Rp$ is evicted. After the eviction of $i$ , all $Rp$ values of cached content are decreased by $Rp(i)$ . By doing this, the algorithm integrates recency and cost to retrieve, because content with lower cost and not recently used has lower $Rp$ values, and consequently tends to be evicted faster.

A key limitation of the approaches presented in this section is that, due to the non-adaptive nature of these solutions, they do not automatically consider changes in application workload and access patterns, which can lead such approaches to a poor or sub-optimal performance before the revision of caching decisions. Furthermore, such approaches usually do not take application specificities into account and require human intervention for configuring, customizing and tuning the solution to the application requirements and needs.

3.3. Adaptive Cache Management

We now focus on presenting the landscape of adaptive caching at application level and discuss research challenges in this context. Web caching approaches also include an underlying problem: the stochastic nature of user behavior, which may lead to unexpected or unpredicted workloads and this, consequently, affects the cache performance, especially in terms of coordination policies. This can cause, for example, cache thrashing, i.e. eviction of the useful data. Therefore, since their conception, it is desirable to make caching solutions conform to the dynamic changing of user demands and the network environment. Traditional cache strategies, such as replacement algorithms, already present a mechanism for adapting themselves over time, based on general heuristics like recency or frequency (Wang, 1999). Although such strategies can somehow be seen as adaptive caching, they are primarily conceived and implemented statically. Therefore, exploiting concepts of adaptive systems could help design more robust, reliable, and scalable caching strategies.

In the remainder of this section, we discuss adaptive approaches for cache management, following the taxonomy shown in Figure 2. However, due to additional complexity of these approaches in comparison with static approaches, we first overview existing work and present alternatives used in the different activities of the feedback loop.

3.3.1. Overview

Adaptive approaches address different caching issues and, for doing so, they analyze a particular property associated with caching. For example, if an approach focuses on consistency maintenance, it usually analyzes content changeability, because if the source of a cached content changes, it should be updated or removed from the cache. Based on this analysis, approaches adapt a particular task of cache management, e.g. definition of expiration time of cached item. This we refer to as adaptation target. Broadly, adaptive caching approaches aim at increasing the cache performance as a means of improving the application performance. Selected measurements are used to evaluate the cache performance, and are thus associated with the expected outcomes of a particular approach. We summarize the different adaptive caching approaches in Table 4, analyzing these discussed approach characteristics as well as give examples from literature.

Table 4. Overview of Adaptation Scenarios for Application-Level Caching.

Caching Issue	Analyzed Property	Adaptation Target	Expected Outcomes	Representative Examples
Content admission	Web content cacheability	– Content selection – Content filtering	– Higher hit ratio	(Baeza-Yates et al., 2007; Einziger and Friedman, 2014; Chen et al., 2016)
Consistency maintenance	Web content changeability	– Expiration time definition	– Higher hit ratio – Increased content freshness	(Huang et al., 2010)
Content eviction	Web content evictability	– Replacement decision – Proactive eviction	– Higher hit ratio – Improved resource usage	(Ali Ahmed and Shamsuddin, 2011; Ghandeharizadeh et al., 2014; Negrão et al., 2015)
Cache infrastructure management	Cache resource availability	– Cache size definition	– Improved resource usage	(Radhakrishnan, 2004)

Adaptation is usually driven by changes and behavior of the application environment and, therefore, monitoring this environment is needed. This is typically done by capturing the execution traces of an application or cache to be analyzed, and deciding which actions to take. Such traces allow retrieving input data for the analysis, which can be mainly of two types. The first is stateless data, which depend on the content that can be cached (e.g. the size of a query result, the frequency of a method call, and the recency of a requested content), and the second is stateful data, which are based on historical information. The most straightforward approach is to use historical access information regarding the observed web component, such as requested URLs or database queries (Baeza-Yates et al., 2007; Ma et al., 2014; Chen et al., 2016). Such monitored data can be easily obtained from web servers or database systems, which automatically store them in the form of weblogs. Moreover, the spatial locality can be used by analyzing the content context (Negrão et al., 2015). Although such contextual information can be useful, it requires a deeper analysis of the content being managed.

The analysis of monitored data requires the characterization of the execution of an application or cache. Such task can be done based on simple heuristics (Negrão et al., 2015), or by using sophisticated techniques, such as a mathematical model (Huang et al., 2010; Chen et al., 2016) and machine learning (Sajeev and Sebastian, 2010). Furthermore, static analysis of source code can also be useful (Chen et al., 2016). Such static analysis can reduce the complexity and processing required to achieve the caching decisions, i.e. reduce the learning phase. For example, while looking for cacheable opportunities, static analysis can provide a limited set of possibilities and places to look at, thus reducing the amount of data that should be analyzed, making the decision process faster and more efficient.

The decision can be implemented also with heuristics (Negrão et al., 2015), utility-functions (Baeza-Yates et al., 2007; Ma et al., 2014; Einziger and Friedman, 2014), or even taking into account the output of a classifier (Sajeev and Sebastian, 2010). Finally, the goals can be achieved by effectors connected directly with the cache (Ma et al., 2014; Einziger and Friedman, 2014), the application (Chen et al., 2016), or even with a middleware component.

The general activities presented and exemplified above (i.e. monitoring, analyzing, deciding and adapting) are seen in self-adaptive systems as a feedback loop mechanism. Based on such a loop, these systems are able to change their behavior in response to noted changes in their environment. A deeper discussion about autonomic computing and self-adaptive systems is given by Huebscher and McCann (2008) and Lalanda et al. (2013). In this survey, we capture characteristics of existing approaches matching the activities of a feedback loop to facilitate understanding and comparison of the approaches, even if an approach is not explicitly structured in this way. A feedback loop of an adaptive system can be described in terms of six key properties, as described below. These properties are used to compare and describe adaptive approaches in later sections.

Monitored data:: Monitored data are measurable input parameters from which the current state of the system can be characterized.
Analysis:: Based on the monitored data, an analysis process should be employed to characterize the state of the system. Such characterization can be based on a single monitored piece of information or a combination of a set of inputs, such as a utility function, or even a complex model built through machine learning approaches.
Behavior:: The behavior represents how the adaptive component acts over the managed resource or system. It can be reactive, i.e. responding to observed events after it occurs, or proactive, i.e. taking actions in advance before events have a chance to happen.
Operation:: The operation corresponds to when and how the adaptation process can analyze data and take actions. Online operation is the case when the adaptation process takes place while the application is executing, possibly impacting in its functioning. The adaptation process can be seen as part of the application. Offline operation, in contrast, is performed separately, based on the information collected from the application to a later analysis. Actions to be taken are performed when deemed adequate.
Decision:: Decisions are conclusions or resolutions achieved after the analysis. Such decisions involve reasoning about the current state of the system and defining necessary adaptations (if needed) to achieve the goals.
Goal:: Goals consist of one or more expected properties that should be achieved by the adaptations. They summarize the desired system performance or behavior regarding input parameters.

3.3.2. Content Admission

We now focus on approaches that can dynamically evaluate the cacheability status towards discovering cacheable data, i.e. whether a particular piece of content should be cached. As previously mentioned, admission approaches help developers while selecting caching opportunities. This task is particularly complex because selected opportunities must continuously be revised, due to the changing workload characteristics and access patterns, or even the application evolution. This shortcoming motivates the need for adaptive caching solutions, which can minimize the effort required from application developers to maintain caching solutions by automatically improving themselves regarding changes in the application context.

As shown in Figure 2, adaptive admission-focused approaches can be seen from two different perspectives, depending on the purpose of the adaptation: (a) dynamic selection of the best cacheable opportunities through the evaluation of cacheability properties, and (b) reactive filtering of content that was previously identified as cacheable but should not anymore, due to some particular reason. Table 5 summarizes the surveyed adaptive caching approaches that deal with admission issues.

Table 5. Comparison of Adaptive Application-level Caching Approaches for Admission.

Reference	Monitored Data	Analysis	Behavior	Operation	Decision	Goal
CacheOptimizer (Chen et al., 2016)	Web server and database access logs	Static source code analysis and dynamic weblog analysis with colored Petri nets	Proactive	Offline	Content miss ratio threshold	Dynamic identification of cacheable content and definition of caching configurations
Mertz and Nunes (2017); Mertz (2017)	Method calls	Cacheability metrics	Proactive	Online and offline phases	Heuristic-based	Dynamic identification and caching of cacheable methods
Baeza-Yates et al. (2007)	Query metadata and historic usage information	Utility-function based	Reactive	Online	Utility-function based on the probability of a query generate future cache hits	Dynamic decision regarding the admission of queries to the cache
TinyLFU (Einziger and Friedman, 2014)	Cache size (number of items it can store) and historic usage information	Frequency histogram	Reactive	Online	Utility-function based on the potential hit ratio increasing	Dynamic decision regarding the admission of content when an eviction is required
LWE-LRU (Sajeev and Sebastian, 2010)	Content attributes and traffic parameters	Multinomial logistic regression model to compute worthiness of content	Reactive	Offline and online phase	Worthiness thresholds	Dynamic admission and eviction decisions

The automatic identification of caching opportunities at the application level is addressed by CacheOptimizer (Chen et al., 2016), which monitors readable weblogs to create mappings between workload and database access. The analysis part consists of two processes: a static code analysis to identify possible caching configuration spots, and a characterization with colored Petri nets, which models the transition of states of an application based on weblogs from the web server. By doing so, it can reach a global optimal caching decision, instead of focusing on top cache accesses, using a greedy approach. Differently from other approaches, CacheOptimizer is not implemented as a new caching framework; instead, it is integrated with those existing, automating the caching configuration according to the results of its analysis. Although this approach addresses method caching opportunities, it focuses on database-centric web applications; thus, only database-related methods are cached. By focusing on general application methods as cacheable options, Mertz and Nunes (2017); Mertz (2017) proposed a seamless approach that automates the evaluation of cacheability properties and admits application methods at runtime based on the Cacheability Pattern (Mertz and Nunes, 2016). These approaches were evaluated considering single-shot scenarios, without having their adaptive behavior measured.

Reactive filtering approaches act at runtime by dynamically evaluating the cacheability of content that was previously identified as cacheable, according to the workload and access pattern, to avoid unworthy content taking place in the cache. By doing this, it can reduce the amount of cached data, freeing space in the cache, avoiding evictions and leading to higher hit ratios. Reactive filtering approaches are specifically useful when dealing with search engines, since not all search combinations will frequently be requested as shown by Baeza-Yates et al. (2007). They proposed an approach to filter infrequent searches, which would not improve application performance if cached. The proposed admission solution monitors query executions and caches such queries into a split cache scheme: controlled and uncontrolled. Both parts internally implement a regular LRU-based cache, but an admission policy is used to distinguish frequent queries, which are placed into the controlled space, from infrequent queries, which are placed into the uncontrolled space. The admission policy is implemented as a function that evaluates each processed search query taking into consideration the query metadata and past usage information. As result, the controlled cache is less pollution-prone, and frequent queries tend to remain cached for longer periods, improving hit ratios. Although infrequent queries are more rapidly evicted in the uncontrolled space, it is still an LRU-based cache and can provide good responses in cases of short-period bursts of infrequent queries.

Also focusing on reactively filtering content, Einziger and Friedman (2014) proposed TinyLFU. Despite such approach acts only when the cache is full, it admits new content in the cache only when such content is expected to provide more benefit than the content already in the cache. TinyLFU estimates the worthiness (i.e. contribution to the hit ratio) of the eviction candidate and compares it with the worthiness of the newly accessed content. To achieve such behavior, TinyLFU uses an approximate LFU structure, maintaining a frequency histogram for cached content, and then it is possible to trade-off the cost of eviction and the usefulness of the new content. A small variation of such approach is currently available to developers as a default replacement policy of the Caffeine caching framework, due to its high hit rate and low memory footprint.

Admission approaches can also be implemented by using sophisticated learning techniques. Sajeev and Sebastian (2010) proposed an admission technique using multinomial logistic regression (MLR) as a classifier. The model built with MLR assigns a worthiness class to the response of a processed request based on the application traffic and response content properties. Such model is trained with previously collected traces and sanitized logs for classifying the web cache’s content worthiness. Then, at runtime, when there is an incoming content, its worthiness class is computed and used in an admission control mechanism based on thresholds to decide whether the content should be cached or not.

3.3.3. Consistency Maintenance

Adaptation in consistency maintenance can be employed in both introduced consistency approaches, i.e. expiration-based and invalidation-based. However, differently from static approaches, existing adaptive work focuses only on the former, as can be seen in Figure 2. Consequently, all the approaches assume that stale data is allowed to be returned from the cache, that is, they are based on weak consistency. Table 6 summarizes the surveyed adaptive caching approaches that deal with consistency issues.

Table 6. Comparison of Adaptive Application-level Caching Approaches for Consistency.

Reference	Monitored Data	Analysis	Behavior	Operation	Decision	Goal
Huang et al. (2010)	Hit ratio and historic usage information	Function approximations	Reactive	Online	Utility-function based on the monitored data	Dynamic definition of the expiration time of data
Adaptive, Incremental and Machine-learned TTL (Alici et al., 2012)	Query metadata and historic usage information	Metric-based	Reactive	Online	Rule-based	Dynamic definition of the expiration time of data

In such approaches, it is assumed that cached content is accessed irregularly, and then the definition of a fixed timeout is inadequate. If the cached content has a short expiration time, it tends to lower hit ratios in situations different from a short burst of requests. Otherwise, if it has a long expiration time, it can result in stale data being returned to users. Despite staleness is expected in weak consistency approaches, the level in which it affects the system execution should be taken into account by developers. Due to this, Huang et al. (2010) proposed a cache framework that provides transparency for developers by offering an adaptive expiration time—also called time-to-live (TTL)—definition. The proposed strategy is based on the rationale behind the popular simulated annealing algorithm, in which an expiration time function is approximated according to changes into how users access content. Also focused on TTL definition, Alici et al. (2012) proposed a query result caching for search engines, which considers query-specific TTL values, using a machine learning model to assign TTL values to queries. Their machine learning model exploits a query log and result-specific features to predict the best TTL values. The authors showed that this strategy outperforms the fixed TTL strategy, by means of an evaluation of some parameterized functions (average and incremental TTL).

3.3.4. Size-limitation Management

Cache frameworks and libraries are usually based on a fixed memory size to allocate content. It can be done in terms of allowed number of entries or absolute size. As introduced, there are static approaches that help determine the appropriate cache size and deal with replacement issues that emerge due to a size-limited cache. There are also adaptive approaches that support them, presented and compared in Table 7.

Table 7. Comparison of Adaptive Application-level Caching Approaches for Dealing with Size Limitations.

Reference	Monitored Data	Analysis	Behavior	Operation	Decision	Goal
Radhakrishnan (2004)	Hit ratio and cache size	Linear quadratic estimation	Proactive	Online	Based on pre-defined thresholds	Dynamic increasing or decreasing cache size
ARC (Megiddo and Modha, 2003) ARC-TD (Raigoza and Sun, 2014)	Recency and frequency of requests to cached data	Heuristic-based	Reactive	Online	Likelihood of content being used in the near future	Dynamic selection between recency-based or frequency-based eviction
El Abdouni Khayari et al. (2011)	Cache access logs and metadata, such as requested data sizes, arriving times, hits and misses	Expectation Maximization (EM) algorithm	Reactive	Offline and online phase	Output of EM algorithm	Dynamic reconfiguration of C-LRU algorithm
CAMP (Ghandeharizadeh et al., 2014)	Cost to retrieve, size and recency of cached web content	Heuristics based on cost-to-size ratio and recency	Reactive	Online	Based on priority value of content	Dynamic selection of the best candidate to evict
SACS (Negrão et al., 2015)	Web content along with temporal information and pre-defined thresholds	Metric of distance between content	Reactive	Offline and online phase	Higher distance	Dynamic selection the best candidate to evict
ICWCS (Ali Ahmed and Shamsuddin, 2011)	Web access logs	Adaptive neuro-fuzzy inference system	Proactive	Offline	Trained neuro-fuzzy system that models the access probability of web content	Dynamic eviction of unworthy cached web content

Radhakrishnan (2004) addressed the trade-off between cache space allocation and performance improvement by dynamically changing the cache size according to the access patterns and workload. It uses a linear quadratic estimation (i.e. kalman filtering) algorithm that estimates properly cache size values based on runtime information (hit ratio and memory in use) collected over time and pre-defined thresholds. Given the cache size estimation, the approach is able to decide which action promotes more benefits: increasing or decreasing the cache size, or even keeping it as it is and replacing content.

Regarding replacement policies, popular algorithms such as LRU and LFU have the limitation of considering a single factor, possibly ignoring important properties that can influence the replacement efficiency. In realistic applications, access patterns significantly change over time, and simple replacement policies may not have enough information to properly deal with these situations. A common problem is cache pollution, in which the cache is loaded with unnecessary data, resulting in useful data being evicted. Such problem can be either cold (affecting LRU) or hot cache pollution (affecting LFU and size-based policies) (Ayani et al., 2003).

As shown in Figure 2, adaptive replacement approaches can be classified into three groups: (a) dynamic selection of algorithms, which focuses on exploring the advantages of each algorithm, (b) evolution of traditional replacement policies to achieve adaptation, including the modification of replacement approaches proposed originally for other caching levels (e.g. web proxy caching and database), and (c) the proposal of new and application-level specific approaches.

One of the first initiatives to adapt replacement policies is the Adaptive Replacement Cache (ARC) (Megiddo and Modha, 2003) approach, which combines the merits of different replacement policies, and dynamically balances between the recency and frequency components online. The policy keeps: (i) the cache space is partitioned according to defined constraints in order to hold content that was accessed recently and frequently (at least twice); and (ii) a recent eviction history of the partitions. It uses a learning rule to continually revise the adaptation parameter in response to the observed workload. A practical implementation done by the same authors (Megiddo and Modha, 2004) revealed a better hit ratio with ARC over LRU across a wide range of workloads while incurring practically the same low time cost as the LRU. Although it became popular and is currently implemented in frameworks available to developers, such as Caffeine, it is patented¹⁰¹⁰10https://www.google.com/patents/US20040098541 and requires a license agreement with IBM to be used. Furthermore, Raigoza and Sun (2014) proposed the Adaptive Replacement Cache-Temporal Data (ARC-TD), in which a buffer replacement policy is built upon the ARC policy. Both ARC and ARC-TD policies address the problem of caching priority by trading-off between frequently accessed and recently used content. However, the ARC-TD policy favors the cache retention of content in proportion to the average life span of the content in the buffer. Thus, a higher cache hit ratio can be achieved.

El Abdouni Khayari et al. (2011) presented a runtime closed-loop for self-reconfiguration of the Class-Based Least Recently Used (C-LRU) strategy. In C-LRU, the cache is divided into portions reserved for content of a specific size, and the content size distribution is described by a hyper-exponential distribution using the EM-algorithm. The approach is based on a continuous analysis of the cache trace (access logs), from which relevant data, such as the requested data sizes, the arriving times, hits and misses, are retrieved. Then, it computes the parameters for a hyper-exponential distribution function of the request content sizes, which are used to determine a new configuration for the C-LRU.

Recently, by using cost as an important feature to guide replacement decisions, Ghandeharizadeh et al. (2014) proposed the Cost Adaptive Multi-Queue Eviction Policy (CAMP), which is an adaptation of the GDS algorithm (Young, 1994) (as described in Section 3.2). CAMP manages multiple LRU queues based on the distribution of cost and size of cached content. Each queue is ordered according to the remaining priority of the items. When a replacement is required, CAMP finds the eviction candidate across all queues based on looking the remaining priority at the front of each queue, which conveys the item with the smallest priority. Such approach also has a variation in which FIFO queues are adopted (Ghandeharizadeh et al., 2015), because it demands fewer content moves.

As shown, most of the adaptive replacement approaches use temporal locality information to make replacement decisions. However, a recent work done by Negrão et al. (2015) proposed the consideration of spatial locality of cached content while finding an eviction candidate. Semantics-Aware Caching System (SACS) incorporates the assumption that content (in this case, web pages) that can be achieved from recently requested content tends to be requested shortly and, thus, should also be kept in the cache. Such assumption is captured by a distance metric, which is calculated as the minimum number of links that users are supposed to follow to navigate from one page to the other. Then, while selecting content to evict, SACS takes the most recently requested content (given a threshold) as pivots and calculates the distance of other cached content from pivots. Older cached content with higher distance values is removed until there is enough space to insert the new content. Thus, cached content that is spatially near to recently requested content is kept cached. In addition, when the same distance occurs in two or more entries, the frequency is used as the next criterion to make the decision.

Finally, focused on the client-side, Ali Ahmed and Shamsuddin (2011) proposed a neuro-fuzzy system, called Intelligent Client-side Web Caching Scheme (ICWCS), that estimates content that can be accessed in the future. This approach splits the cache space into short-term and long-term caches, in which the former is an LRU-based cache that receives content directly, and the latter receives content evicted from the short-term cache. When the long-term cache achieves its full capacity, a trained neuro-fuzzy system is employed to distinguish content in the long-term cache as cacheable or uncacheable. Then, content classified as uncacheable are evicted.

4. Conclusion and Future Directions

With the significant amount of users that web applications currently serve, meeting performance and scalability requirements while delivering services has been a crucial challenging issue. To address such requirements, various web caching technologies were made available in different layers of the web infrastructure. Recently, an application-tailored form of caching has increased in popularity, referred to as application-level caching, in which developers manually insert caching logic into the application base code to decrease the response time of web requests by temporarily saving frequently requested or expensive to compute content in memory. However, such caching leads to many issues and challenges for developers, concerning the design, implementation, configuration, tuning, and maintenance.

In this paper, we provided a comprehensive introduction to web and application-level caching, as well as surveyed approaches proposed in the context of the latter. We first presented a taxonomy of web caching in general, which helps understand the different associated concerns as well as the drawbacks and merits of available web caching options. We also provided an overview of application-level caching, highlighting its challenges and issues, followed by a big picture of the different approaches proposed to support its development. Application-level caching approaches were described and compared, grouped into three categories: cache implementation, static cache management, and adaptive cache management.

Our survey provides insights into the theory of application-level caching solutions and brings important points to the attention of researchers and practitioners. The survey can be used by developers to learn existing caching solutions available so that they can improve their web applications, and by researchers to have a solid foundation on this topic for developing future adaptive caching approaches. We envision that such approaches will require minimal effort and input from developers, as opposed to addressing assumed application bottlenecks through simple key-value systems that require manual management and tuning. We next wrap up our survey by providing reflections on the field and discussing future challenges to be addressed.

4.1. Reflections on the Field

Caching techniques can be found in many locations of computer and software architectures and, in all of them, such solutions provide benefits, considering the assumption that part of the content is repeatedly accessed, which is often valid. Web caching, in general, has been extensively studied in the last decade, especially for web pages (at the proxy level), and many issues have been brought up, discussed and addressed. Although web caching in its general form seems to be a resolved topic, there are open gaps that must be filled, regarding particular systems and requirements, such as multimedia (with many large objects), wireless (with higher latency and error rate), mobile systems and Internet of things (with mobility and smaller cache), and modern web applications (with customized content and higher changeability), being the last the focus of this survey.

In these specific caching scenarios, the performance benefits of a caching solution highly depend on exploiting the peculiarities of each scenario. However, while proposing new caching solutions, it is fundamental to learn from different areas. First, we must learn and leverage what has been done in traditional and older forms of caching, such as in computer architectures. Second, we must complement information about a particular domain with user access patterns, as well as computer and network characteristics. Third, to cope with the evolution of the application and its environment, self-adaptive techniques can be exploited. As discussed, this last point could be much further explored. Although it is possible to interpret many adaptive approaches in terms of the traditional activities of self-adaptive systems, they lack an explicit connection with work on this area. This hampers the exploitation of existing self-adaptive techniques in our context.

The continuously changing web environment is the main driver for the adoption of adaptive web caching approaches, but the availability of web access logs and the existence of specific tools for constantly monitoring resources (application and cache execution traces) have also motivated the development of adaptive techniques. As a consequence, the runtime environment can be instrumented to gather useful information to support dynamic changes in the system behavior, adjust in misconfiguration, or even achieve optimal configurations. Approaches that monitor the application execution and change its inner workings tend to perform better than a static and pre-defined solution (Ali et al., 2011).

Even though there are approaches that focus on providing application-level caching with an adaptive behavior, only a few consider application specificities. Usually, caching decisions are made based solely on access information (size, recency, and frequency) and statistics (hit and miss ratios), thus ignoring expressed cache metadata, which is application-specific. Application details are in fact information that developers use while designing and implementing application-level caching. Therefore, the discovery of application-specific details is crucial to achieve an improved performance of caching.

An important step in the direction of such approaches is to understand and structure knowledge regarding application-level caching spread in existing caching solutions for web applications. There is a qualitative study (Mertz and Nunes, 2016) in this direction that investigated how developers approach application-level caching through the analysis of ten (open-source and commercial) web applications. Furthermore, other works (Atikoglu et al., 2012; Nishtala et al., 2013; Xu et al., 2014) provided an analysis of the workload of application-level caches, giving characterizations and a detailed picture of these components. This kind of work is needed to not only have a solid understanding of how such caches are currently used but also for giving future directions to improve this form of caching.

4.2. Major Challenges and Research Issues

Providing an adaptive approach requires addressing different issues, highlighted in the feedback loop of self-adaptive systems. However, while designing and implementing ways to collect, analyze and make decisions regarding adaptation at application-level, other specific issues emerge.

The first issue is associated with the identification of what information to collect as part of the monitoring process. Despite web access logs are the most common source of information, because it provides the history of access to a given resource, a previous analysis of the application regarding static information, such as the application source code, is also useful to understand the behavior of the application (Della Toffola et al., 2015; Chen et al., 2016). The combination of static and dynamic analyses can potentially improve adaptation decisions as they are complementary, or even reduce the information collected at runtime required by the dynamic analysis. In addition, it can lead to better initial decisions, when limited information is available.

The second issue is how to adequately monitor and acquire information from the application and its environment without compromising its performance. Such monitoring should be as non-intrusive as possible to avoid affecting the performance of the application or cache, which is not always possible (Chen et al., 2016). Adaptive approaches should be designed considering the trade-off between the gain of automating caching concerns and the application performance loss due to adaptation overhead. In fact, even a relatively simple analysis of the application profiling required by adaptive approaches is already responsible for a possibly high overhead regarding memory consumption and processing time. The overhead of the monitoring activity was not explored in the surveyed approaches, despite mentioned by some of them (Mertz and Nunes, 2017; Mertz, 2017). Consequently, it is important to evaluate the practical feasibility of integrating such approaches at runtime with web applications, in terms of the impact they cause to collect data. Moreover, alternatives to reduce the impact of the monitoring activity should be explored.

Another aspect that deserves attention is the consideration of uncertainty in estimating application demands while evaluating application-level caching solutions. Configurations based on misleading information can result in a performance decay and increase the response time of the application. Furthermore, the adoption of standard benchmarks for evaluating and fine-tuning mechanisms of self-adaptation would help researchers and practitioners assess the performance and suitability of different caching algorithms.

Regarding caching algorithms, replacement policies are the most studied techniques because it is a straightforward solution for the cache fulfillment problem (Podlipnig and Böszörmenyi, 2003). However, size limitations can be solved from other perspectives, such as by designing a better admission policy (Baeza-Yates et al., 2007; Einziger and Friedman, 2014). Furthermore, the main issue regarding the development of consistency and admission approaches is that such techniques are highly dependent on the current application characteristics, as well as the required consistency level. Consequently, providing means of learning these is a key to the provision of satisfactory caching solutions.

In addition to consistency and admission policies, prediction-based admission (prefetching), which focuses on anticipating the cache of useful content, has also been proposed in other web caching levels, mainly for web proxy caches (Ali et al., 2011). This topic has not been explored at the application level yet, given that it usually requires a complex learning phase to guide the selection of future content. Moreover, spatial locality—an important feature to be analyzed while prefetching content because it gives the notion of nearness—is not trivial to be used, due to the difficulty in identifying and collecting information that represents this property.

Last, but not least, a feature provided by application-level caching approaches is the possibility for developers to configure and tune them (Guo and Engler, 2011; Gupta et al., 2011). Therefore, it is important to give to application developers the possibility of enriching the application model at design time with valuable application-specific information regarding the applicability of caching. This metadata can be processed and used by approaches, easing the process of analyzing and achieving adaptations. Moreover, this opens the possibility of caching approaches to use application-specific knowledge, which is the key reason why application-level caching exists. However, many issues must be addressed to capture such knowledge, such as understanding what sort of information must be provided, how they are modeled, and how to keep them updated.

In summary, despite significant advances have been made in the context of application-level caching and adaptive caching at web applications, research on such topics still offers several opportunities for future research. The presented issues and shortcomings, especially regarding adaptive solutions, call for new approaches and techniques, to reduce the effort demanded from developers to design, implement, and maintain web caching solutions.

Acknowledgements.

We thank the anonymous reviewers who were asked by ACM Computing Surveys to review this article, as well as the editor. They provided constructive feedback that was used to expand our research and improve this article extensively. Jhonny Mertz would like to thank CAPES grant ref. 1681815. Ingrid Nunes thanks for research grants CNPq ref. 303232/2015-3, CAPES ref. 7619-15-4, and Alexander von Humboldt, ref. BRA 1184533 HFSTCAPES-P.

References

(1)
Abdullahi et al. (2015) Ibrahim Abdullahi, Suki Arif, and Suhaidi Hassan. 2015. Survey on caching approaches in Information Centric Networking. Journal of Network and Computer Applications 56 (2015), 48–59. DOI:http://dx.doi.org/10.1016/j.jnca.2015.06.011
Ali et al. (2011) Waleed Ali, Siti Mariyam Shamsuddin, and Abdul Samad Ismail. 2011. A survey of Web caching and prefetching. International Journal of Advances in Soft Computing and Its Applications 3, 1 (mar 2011), 18–44.
Ali et al. (2012a) Waleed Ali, Siti Mariyam Shamsuddin, and Abdul Samad Ismail. 2012a. Intelligent Naïve Bayes-based approaches for Web proxy caching. Knowledge-Based Systems 31 (jul 2012), 162–175. DOI:http://dx.doi.org/10.1016/j.knosys.2012.02.015
Ali et al. (2012b) Waleed Ali, Siti Mariyam Shamsuddin, and Abdul Samad Ismail. 2012b. Intelligent Web proxy caching approaches based on machine learning techniques. Decision Support Systems 53, 3 (jun 2012), 565–579. DOI:http://dx.doi.org/10.1016/j.dss.2012.04.011
Ali Ahmed and Shamsuddin (2011) Waleed Ali Ahmed and Siti Mariyam Shamsuddin. 2011. Neuro-fuzzy system in partitioned client-side Web cache. Expert Systems with Applications 38, 12 (nov 2011), 14715–14725. DOI:http://dx.doi.org/10.1016/j.eswa.2011.05.009
Alici et al. (2012) Sadiye Alici, Ismail Sengor Altingovde, Rifat Ozcan, B. Barla Cambazoglu, and Özgür Ulusoy. 2012. Adaptive time-to-live strategies for query result caching in web search engines. In Proceedings of the 34th European conference on Advances in Information Retrieval (Lecture Notes in Computer Science), Ricardo Baeza-Yates, Arjen P. de Vries, Hugo Zaragoza, B. Barla Cambazoglu, Vanessa Murdock, Ronny Lempel, and Fabrizio Silvestri (Eds.), Vol. 7224. Springer Berlin Heidelberg, Berlin, Heidelberg, 401–412. DOI:http://dx.doi.org/10.1007/978-3-642-28997-2
Amza et al. (2005) Cristiana Amza, Gokul Soundararajan, and Emmanuel Cecchet. 2005. Transparent caching with strong consistency in dynamic content web sites. In Proceedings of the 19th annual international conference on Supercomputing. ACM Press, New York, New York, USA, 264. DOI:http://dx.doi.org/10.1145/1088149.1088185
Atikoglu et al. (2012) Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems 40, 1 (jun 2012), 53. DOI:http://dx.doi.org/10.1145/2318857.2254766
Ayani et al. (2003) Rassul Ayani, Yong Meng Teo, and Yean Seen Ng. 2003. Cache pollution in Web proxy servers. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE, Nice, France, 7. DOI:http://dx.doi.org/10.1109/IPDPS.2003.1213450
Baeza-Yates et al. (2007) Ricardo Baeza-Yates, Flavio Junqueira, Vassilis Plachouras, and Hans Witschel. 2007. Admission Policies for Caches of Search Engine Results. In Proceedings of the 14th international conference on String processing and information retrieval, Vol. 4726. Springer-Verlag, Santiago, Chile, 74–85. DOI:http://dx.doi.org/10.1007/978-3-540-75530-2_7
Balamash and Krunz (2004) Abdullah Balamash and Marwan Krunz. 2004. An overview of web caching replacement algorithms. IEEE Communications Surveys & Tutorials 6, 2 (2004), 44–56. DOI:http://dx.doi.org/10.1109/COMST.2004.5342239
Bouchenak et al. (2006) Sara Bouchenak, Alan Cox, Steven Dropsho, Sumit Mittal, and Willy Zwaenepoel. 2006. Caching Dynamic Web Content: Designing and Analysing an Aspect-Oriented Solution. Proceedings of the ACM/IFIP/USENIX International Conference on Middleware 4290 (nov 2006), 1–21. DOI:http://dx.doi.org/10.1007/11925071
Candan et al. (2001) K. Selçuk Candan, Wen-Syan Li, Qiong Luo, Wang-Pin Hsiung, and Divyakant Agrawal. 2001. Enabling dynamic content caching for database-driven web sites. Proceedings of the ACM SIGMOD International Conference on Management of Data 30, 2 (jun 2001), 532–543. DOI:http://dx.doi.org/10.1145/376284.375736
Cao and Irani (1997) P Cao and S Irani. 1997. GreedyDual-Size: A Cost-Aware WWW Proxy Caching Algorithm. In Proceedings of the USENIX Symposium on Internet Technologies and Systems. 18. citeseer.csail.mit.edu/cao97greedydualsize.html
Chen et al. (2016) Tse-Hsun Chen, Weiyi Shang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2016. CacheOptimizer: Helping Developers Configure Caching Frameworks for Hibernate-based Database-centric Web Applications. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2016. ACM Press, New York, New York, USA, 666–677. DOI:http://dx.doi.org/10.1145/2950290.2950303
Della Toffola et al. (2015) Luca Della Toffola, Michael Pradel, and Thomas R. Gross. 2015. Performance problems you can fix: a dynamic analysis of memoization opportunities. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, New York, USA, 607–622. DOI:http://dx.doi.org/10.1145/2858965.2814290
Domènech et al. (2006) Josep Domènech, José A. Gil, Julio Sahuquillo, and Ana Pont. 2006. Web prefetching performance metrics: A survey. Performance Evaluation 63, 9-10 (oct 2006), 988–1004. DOI:http://dx.doi.org/10.1016/j.peva.2005.11.001
Einziger and Friedman (2014) Gil Einziger and Roy Friedman. 2014. TinyLFU : A Highly Efficient Cache Admission Policy. In Proceedings of the 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. IEEE, Torino, 146–153. DOI:http://dx.doi.org/10.1109/PDP.2014.34 arXiv:1512.00727
El Abdouni Khayari et al. (2011) Rachid El Abdouni Khayari, Adisa Musovic, Robert Siegfried, and Ingo Zschoch. 2011. Self-reconfiguration approach for web-caches. In International Symposium on Performance Evaluation of Computer & Telecommunication Systems. IEEE, The Hague, Netherlands, 77–83. http://ieeexplore.ieee.org/articleDetails.jsp?arnumber=5984850http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5984850&tag=1
Fan et al. (2013) Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and concurrent memcache with dumber caching and smarter hashing. In Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 371–385. https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final197.pdf
Gao et al. (2005) Lei Gao, Mike Dahlin, Amol Nayate, Jiandan Zheng, and Arun Iyengar. 2005. Improving availability and performance with application-specific data replication. IEEE Transactions on Knowledge and Data Engineering 17, 1 (2005), 106–120. DOI:http://dx.doi.org/10.1109/TKDE.2005.10
Gawade and Gupta (2012) Sandhaya Gawade and Hitesh Gupta. 2012. Review of Algorithms for Web Pre-fetching and Caching. International Journal of Advanced Research in Computer and Communication Engineering 1, 2 (2012), 62–65.
Ghandeharizadeh et al. (2015) Shahram Ghandeharizadeh, Sandy Irani, and Jenny Lam. 2015. Cache Replacement with Memory Allocation. In Proceedings of the Seventeenth Workshop on Algorithm Engineering and Experiments, Ulrik Brandes and David Eppstein (Eds.). ACM, San Diego, California, USA, 9. http://dx.doi.org/10.1137/1.9781611973754.1
Ghandeharizadeh et al. (2014) Shahram Ghandeharizadeh, Sandy Irani, Jenny Lam, and Jason Yap. 2014. Camp. In Proceedings of the 15th International Conference on Middleware. ACM Press, New York, New York, USA, 289–300. DOI:http://dx.doi.org/10.1145/2663165.2663317
Ghandeharizadeh et al. (2012) Shahram Ghandeharizadeh, Jason Yap, and Sumita Barahmand. 2012. COSAR-CQN : An Application Transparent Approach to Cache Consistency. (2012). http://dblab.usc.edu/users/papers/cosarcqntr.pdf
Guerrero et al. (2011) Carlos Guerrero, Carlos Juiz, and Ramon Puigjaner. 2011. Improving web cache performance via adaptive content fragmentation design. In Proceedings of the IEEE International Symposium on Network Computing and Applications. IEEE, Cambridge, USA, 310–313. DOI:http://dx.doi.org/10.1109/NCA.2011.55
Guo and Engler (2011) Philip J Guo and Dawson Engler. 2011. Using Automatic Persistent Memoization to Facilitate Data Analysis Scripting. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM Press, New York, New York, USA, 287–297. DOI:http://dx.doi.org/10.1145/2001420.2001455
Gupta et al. (2011) Priya Gupta, Nickolai Zeldovich, and Samuel Madden. 2011. A trigger-based middleware cache for ORMs. In Proceedings of the 12th ACM/IFIP/USENIX International Middleware Conference, Vol. 7049 LNCS. Springer Berlin Heidelberg, Lisbon, Portugal, 329–349. DOI:http://dx.doi.org/10.1007/978-3-642-25821-3_17
Hattab and Qawasmeh (2015) Ezz Hattab and Sami Qawasmeh. 2015. A Survey of Replacement Policies for Mobile Web Caching. In Proceedings of the 8th International Conference on Developments of E-Systems Engineering. IEEE, Burj Khalifa, Dubai, UAE, 41–46. DOI:http://dx.doi.org/10.1109/DeSE.2015.13
Huang et al. (2010) Jiyu Huang, Xuanzhe Liu, Qi Zhao, Jianzhu Ma, and Gang Huang. 2010. A browser-based framework for data cache in web-delivered service composition. In Proceedings of the IEEE International Conference on Service-Oriented Computing and Applications. IEEE, Perth, Australia, 1–8. DOI:http://dx.doi.org/10.1109/SOCA.2010.5707138
Huang and Hsu (2008) Yin F. Huang and Jhao M. Hsu. 2008. Mining web logs to improve hit ratios of prefetching and caching. Knowledge-Based Systems 21, 1 (feb 2008), 62–69. DOI:http://dx.doi.org/10.1016/j.knosys.2006.11.004
Huebscher and McCann (2008) Markus C. Huebscher and Julie a. McCann. 2008. A survey of autonomic computing—degrees, models, and applications. Comput. Surveys 40, 3 (aug 2008), 1–28. DOI:http://dx.doi.org/10.1145/1380584.1380585
Infante (2014) Alejandro Infante. 2014. Identifying caching opportunities, effortlessly. In Companion Proceedings of the 36th International Conference on Software Engineering. ACM Press, New York, New York, USA, 730–732. DOI:http://dx.doi.org/10.1145/2591062.2591198
Kiczales et al. (1997) Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Maeda, Cristina Lopes, Jean-Marc Loingtier, and John Irwin. 1997. Aspect-oriented programming. European conference on object-oriented programming 1241/1997 (1997), 220–242. DOI:http://dx.doi.org/10.1007/BFb0053381
Labrinidis (2009) Alexandros Labrinidis. 2009. Caching and Materialization for Web Databases. Foundations and Trends® in Databases 2, 3 (mar 2009), 169–266. DOI:http://dx.doi.org/10.1561/1900000005
Lalanda et al. (2013) Philippe Lalanda, Julie a. McCann, and Ada Diaconescu. 2013. Autonomic Computing. Principles, Design and Implementation (1 ed.). Number April. Springer-Verlag London. XV, 288 pages. DOI:http://dx.doi.org/10.1007/978-1-4471-5007-7
Larson et al. (2004) P Larson, Jonathan Goldstein, and Jingren Zhou. 2004. MTCache: Transparent mid-tier database caching in SQL server. Proceedings of the International Conference on Data Engineering 20 (mar 2004), 177–188. DOI:http://dx.doi.org/10.1109/ICDE.2004.1319994
Leszczyński and Stencel (2010) Paweł Leszczyński and Krzysztof Stencel. 2010. Consistent caching of data objects in database driven websites. 14th East European Conference, Advances in Databases and Information Systems 6295 LNCS (sep 2010), 363–377. DOI:http://dx.doi.org/10.1007/978-3-642-15576-5_28
Li and Cox (2015) Conglong Li and Alan L. Cox. 2015. GD-Wheel. In Proceedings of the 10th European Conference on Computer Systems. ACM Press, New York, New York, USA, 1–15. DOI:http://dx.doi.org/10.1145/2741948.2741956
Li et al. (2006) Lei Li, Chunlei Niu, Haoran Zheng, and Jun Wei. 2006. An adaptive caching mechanism for Web services. In Proceedings of the International Conference on Quality Software. IEEE, Beijing, China, 303–310. DOI:http://dx.doi.org/10.1109/QSIC.2006.9
Li et al. (2015) Sheng Li, Pradeep Dubey, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, O. Seongil, and Sukhan Lee. 2015. Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. ACM, Portland, Oregon, USA, 476–488. DOI:http://dx.doi.org/10.1145/2749469.2750416
Ma et al. (2014) Hongyuan Ma, Wei Liu, Bingjie Wei, Liang Shi, Xiuguo Bao, Lihong Wang, and Bin Wang. 2014. Paap. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM Press, New York, New York, USA, 983–986. DOI:http://dx.doi.org/10.1145/2600428.2609490
Maplesden et al. (2015) David Maplesden, Ewan Tempero, John Hosking, and John C. Grundy. 2015. Subsuming Methods. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering - ICPE ’15. ACM Press, New York, New York, USA, 175–186. DOI:http://dx.doi.org/10.1145/2668930.2688040
Megiddo and Modha (2003) Nimrod Megiddo and Dharmendra S. Modha. 2003. ARC: A Self-Tuning, Low Overhead Replacement Cache. Proceedings of the 2nd USENIX conference on File and Storage Technologies 3 (mar 2003), 115–130.
Megiddo and Modha (2004) Nimrod Megiddo and Dharmendra S. Modha. 2004. Outperforming LRU with an adaptive replacement cache algorithm. Computer 37, 4 (apr 2004), 58–65. DOI:http://dx.doi.org/10.1109/MC.2004.1297303
Mehrotra et al. (2010) Deepti Mehrotra, Renuka Nagpal, and Pradeep Bhatia. 2010. Designing metrics for caching techniques for dynamic web site. In 2010 International Conference on Computer and Communication Technology, ICCCT-2010. IEEE, 589–595. DOI:http://dx.doi.org/10.1109/ICCCT.2010.5640464
Mertz (2017) Jhonny Mertz. 2017. Understanding and automating application-level caching. Master’s thesis. UFRGS. https://www.lume.ufrgs.br/handle/10183/156813?show=full
Mertz and Nunes (2016) Jhonny Mertz and Ingrid Nunes. 2016. A Qualitative Study of Application-level Caching. IEEE Transactions on Software Engineering 43 (2016), 20. DOI:http://dx.doi.org/10.1109/TSE.2016.2633992
Mertz and Nunes (2017) Jhonny Mertz and Ingrid Nunes. 2017. Automation of Application-level Caching in a Seamless Way. Software - Practice and Experience (2017). Submitted.
Negrão et al. (2015) André Pessoa Negrão, Carlos Roque, Paulo Ferreira, and Luís Veiga. 2015. An adaptive semantics-aware replacement algorithm for web caching. Journal of Internet Services and Applications 6, 1 (feb 2015), 4. DOI:http://dx.doi.org/10.1186/s13174-015-0018-4
Nguyen and Xu (2013) Khanh Nguyen and Guoqing Xu. 2013. Cachetor: detecting cacheable data to remove bloat. In Proceedings of the 9th Joint Meeting on Foundations of Software Engineering. ACM Press, New York, New York, USA, 268. DOI:http://dx.doi.org/10.1145/2491411.2491416
Nishtala et al. (2013) Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling memcache at facebook. In Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation. ACM, Lombard, IL, USA, 385–398. https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf?utm_content=buffer6e057&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Podlipnig and Böszörmenyi (2003) Stefan Podlipnig and Laszlo Böszörmenyi. 2003. A Survey of Web Cache Replacement Strategies. Comput. Surveys 35, 4 (dec 2003), 374–398. DOI:http://dx.doi.org/10.1145/954339.954341
Pohl (2005) Christoph Pohl. 2005. Adaptive caching of distributed components. Ph.D. Dissertation. Dresden University of Technology.
Ports et al. (2010) Dan R. K. Ports, Austin T Clements, Irene Zhang, Samuel Madden, and Barbara Liskov. 2010. Transactional Consistency and Automatic Management in an Application Data Cache. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, CA, USA, 279–292.
Radhakrishnan (2004) Ganesan Radhakrishnan. 2004. Adaptive application caching. Bell Labs Technical Journal 9, 1 (may 2004), 165–175. DOI:http://dx.doi.org/10.1002/bltj.20011
Raigoza and Sun (2014) Jaime Raigoza and Junping Sun. 2014. Temporal join processing with the adaptive Replacement Cache - Temporal Data policy. In Proceedings of the 13th IEEE/ACIS International Conference on Computer and Information Science. IEEE, Taiyuan, China, 131–136. DOI:http://dx.doi.org/10.1109/ICIS.2014.6912120
Ramaswamy et al. (2005) Lakshmish Ramaswamy, Ling Liu, Arun Iyengar, and Fred Douglis. 2005. Automatic fragment detection in dynamic web pages and its impact on caching. IEEE Transactions on Knowledge and Data Engineering 17, 6 (jun 2005), 859–874. DOI:http://dx.doi.org/10.1109/TKDE.2005.89
Ravi et al. (2009) Jayashree Ravi, Zhifeng Yu, and Weisong Shi. 2009. A survey on dynamic Web content generation and delivery techniques. Journal of Network and Computer Applications 32, 5 (sep 2009), 943–960. DOI:http://dx.doi.org/10.1016/j.jnca.2009.03.005
Saemundsson et al. (2014) Trausti Saemundsson, Hjortur Bjornsson, Gregory Chockler, Gregory.Chockler, and Ymir Vigfusson. 2014. Dynamic Performance Profiling of Cloud Caches. In Proceedings of the ACM Symposium on Cloud Computing - SOCC’14. ACM Press, New York, New York, USA, 3–5. DOI:http://dx.doi.org/10.1145/2523616.2527081
Sajeev and Sebastian (2010) G. P. Sajeev and M. P. Sebastian. 2010. Building a semi intelligent web cache with light weight machine learning. In Proceedings of the IEEE International Conference on Intelligent Systems. IEEE, Xiamen, China, 420–425. DOI:http://dx.doi.org/10.1109/IS.2010.5548373
Scully and Chlipala (2017) Ziv Scully and Adam Chlipala. 2017. A program optimization for automatic database result caching. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages - POPL 2017. ACM Press, New York, New York, USA, 271–284. DOI:http://dx.doi.org/10.1145/3009837.3009891
Selakovic and Pradel (2016) Marija Selakovic and Michael Pradel. 2016. Performance issues and optimizations in JavaScript. In Proceedings of the 38th International Conference on Software Engineering - ICSE ’16. ACM Press, New York, New York, USA, 61–72. DOI:http://dx.doi.org/10.1145/2884781.2884829
Sivasubramanian et al. (2007) Swaminathan Sivasubramanian, Guillaume Pierre, Maarten Steen, and Gustavo Alonso. 2007. Analysis of Caching and Replication Strategies for Web Applications. IEEE Internet Computing 11, 1 (2007), 60–66. DOI:http://dx.doi.org/10.1109/MIC.2007.3
Soundararajan and Amza (2005) Gokul Soundararajan and Cristiana Amza. 2005. Using semantic information to improve transparent query caching for dynamic content web sites. In Proceedings of the International Workshop on Data Engineering Issues in E-Commerce. IEEE, Tokyo, Japan, 132–138. DOI:http://dx.doi.org/10.1109/DEEC.2005.25
Stulova et al. (2015) Nataliia Stulova, José F. Morales, and Manuel V. Hermenegildo. 2015. Practical run-time checking via unobtrusive property caching. Theory and Practice of Logic Programming 15, 4-5 (sep 2015), 726–741. DOI:http://dx.doi.org/10.1017/S1471068415000344
Sulaiman et al. (2013) Sarina Sulaiman, Siti Mariyam Shamsuddin, and Ajith Abraham. 2013. A Survey of Web Caching Architectures or Deployment Schemes. (oct 2013). http://se.fc.utm.my/ijic/index.php/ijic/article/view/33
Sulaiman et al. (2011) Sarina Sulaiman, Siti Mariyam Shamsuddin, Ajith Abraham, and Shahida Sulaiman. 2011. Intelligent web caching using machine learning methods. Neural Network World 21, 5 (2011), 429–452. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.261.3179
Sulaiman et al. (2009) Sarina Sulaiman, Siti Mariyam Shamsuddin, Fadni Forkan, and Ajith Abraham. 2009. Autonomous SPY: Intelligent web proxy caching detection using neurocomputing and particle swarm optimization. In Proceedings of the 6th International Symposium on Mechatronics and its Applications. IEEE, Sharjah, 1–6. DOI:http://dx.doi.org/10.1109/ISMA.2009.5164846
Surapaneni et al. (2013) Swetha Surapaneni, Venkata Krishna Suhas Nerella, Sanjay K. Madria, and Thomas Weigert. 2013. Exploring optimization and caching for efficient collection operations. International Journal on Automated Software Engineering 21, 1 (nov 2013), 1–38. DOI:http://dx.doi.org/10.1007/s10515-013-0119-x
Tamboli and Patel (2015) Shakil Tamboli and Smita Shukla Patel. 2015. A survey on innovative approach for improvement in efficiency of caching technique for big data application. In Proceedings of the International Conference on Pervasive Computing. IEEE, Pune, India, 1–6. DOI:http://dx.doi.org/10.1109/PERVASIVE.2015.7087016
Veena Singh Bhadauriya, Bhupesh Gour (2013) Asif Ullah Khan Veena Singh Bhadauriya, Bhupesh Gour. 2013. Improved Server Response using Web Pre-fetching: A review. International Journal of Advances in Engineering & Technology 6, 6 (2013), 2508–2513.
Venketesh and Venkatesan (2009) P Venketesh and R Venkatesan. 2009. A Survey on Applications of Neural Networks and Evolutionary Techniques in Web Caching. IETE Technical Review 26, 3 (sep 2009), 171. DOI:http://dx.doi.org/10.4103/0256-4602.50701
Wang (1999) Jia Wang. 1999. A survey of web caching schemes for the Internet. ACM SIGCOMM Computer Communication Review 29, 5 (oct 1999), 36. DOI:http://dx.doi.org/10.1145/505696.505701
Wang et al. (2014) Wei Wang, Zhaohui Liu, Yong Jiang, Xinchen Yuan, and Jun Wei. 2014. EasyCache: a transparent in-memory data caching approach for internetware. In Proceedings of the 6th Asia-Pacific Symposium on Internetware on Internetware. ACM Press, New York, New York, USA, 35–44. DOI:http://dx.doi.org/10.1145/2677832.2677837
Wong (2006) Kin-Yeung Wong. 2006. Web cache replacement policies: a pragmatic approach. Network, IEEE 20, 1 (jan 2006), 28–34. DOI:http://dx.doi.org/10.1109/MNET.2006.1580916
Xu (2012) Guoqing Xu. 2012. Finding reusable data structures. Proceedings of the ACM international conference on Object oriented programming systems languages and applications - OOPSLA ’12 47, 10 (nov 2012), 1017. DOI:http://dx.doi.org/10.1145/2384616.2384690
Xu et al. (2014) Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2014. Characterizing Facebook’s Memcached Workload. IEEE Internet Computing 18, 2 (mar 2014), 41–49. DOI:http://dx.doi.org/10.1109/MIC.2013.80
Young (1994) N. Young. 1994. The k-server dual and loose competitiveness for paging. Algorithmica 11, 6 (1994), 525–541. DOI:http://dx.doi.org/10.1007/BF01189992 arXiv:cs/0205044
Zaidenberg et al. (2015) Nezer Zaidenberg, Limor Gavish, and Yuval Meir. 2015. New caching algorithms performance evaluation. In 2015 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS). IEEE, Chicago, IL, USA, 1–7. DOI:http://dx.doi.org/10.1109/SPECTS.2015.7285291
Zhang et al. (2015a) Hao Zhang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, and Meihui Zhang. 2015a. In-Memory Big Data Management and Processing: A Survey. IEEE Transactions on Knowledge and Data Engineering 27, 7 (jul 2015), 1920–1948. DOI:http://dx.doi.org/10.1109/TKDE.2015.2427795
Zhang et al. (2015b) Meng Zhang, Hongbin Luo, and Hongke Zhang. 2015b. A Survey of Caching Mechanisms in Information-Centric Networking. IEEE Communications Surveys & Tutorials PP, 99 (2015), 1–1. DOI:http://dx.doi.org/10.1109/COMST.2015.2420097
Zhu et al. (2012) Timothy Zhu, Anshul Gandhi, Mor Harchol-Balter, and Michael A. Kozuch. 2012. Saving cash by using less cache. In Proceedings of the 4th USENIX conference on Hot Topics in Cloud Computing. USENIX Association, Boston, MA, 3. http://dl.acm.org/citation.cfm?id=2342763.2342766

Appendix A Background on Caching and Web Caching

In this section, we provide foundations on caching and web caching, presenting definitions and explanations of associated terms and concepts. This gives the basic but required knowledge for understanding the static and adaptive application-level approaches presented in the paper.

To introduce web caching, we use the taxonomy presented in Figure 3, which groups web caching research into three main categories, each split into subcategories. The first category, location, is associated with where caching is located in the typically used web architecture. The second is concerned with how the caching is coordinated, which involves different tasks. Finally, the last is associated with how caching can be measured, to have its effectiveness assessed. We next explain each of these categories and their subcategories. Every caching solution can be classified according to these three categories, given that its development involves choosing a location, dealing with coordination issues, and selecting appropriate metrics to estimate the performance of the provided solution.

A.1. Location

Given that web caching is closely related to the web infrastructure components, we overview such architecture in Figure 4 along with the main caching locations, which are detailed later. It presents a typical web architecture, which is widely adopted in practice and can roughly be separated into three components: client, Internet, and server (Labrinidis, 2009; Ravi et al., 2009).

The client component is essentially the client’s computer and web browser; while the Internet component contains a wide range of different, interconnected mechanisms to enable the connection between client and server. A Domain Name Server is typically used to decode the name of the web address into an Internet Protocol (IP) address. With this IP address, routers are used so that the client can establish a connection with the server, using the HTTP protocol. The server can include multiple and different servers that are collectively seen as the web server by the client. In this case, a load balancer or similar component is used to route requests to the different servers in the cluster. A single web server is responsible for handling and responding to HTTP requests, and typically hosts a web application, which provides business services. The web application is connected to back-end systems (i.e. databases, file systems or other applications) and performs read and write operations driven by business rules (Labrinidis, 2009; Ravi et al., 2009). In summary, caching solutions can be deployed at different locations along the typical web architecture, highlighted (in bold and inside brackets) in Figure 4, ranging from the database to the client’s browser (Podlipnig and Böszörmenyi, 2003). We classify web caching locations into three main locations, as follows.

Caching solutions can be deployed at different locations along the typical web architecture, ranging from the database to the client’s browser (Podlipnig and Böszörmenyi, 2003). We classify web caching locations into three main locations, as follows.

Server:: Server-side caching solutions include all available caching locations within the server component, such as in databases, between database and application, within the application, and in the web server (Ravi et al., 2009).
Proxy:: Located between client and server, a proxy caching can be employed in a forward (proxy on behalf of clients) or reverse (proxy on behalf of servers) way (Podlipnig and Böszörmenyi, 2003; Ravi et al., 2009).
Client:: At the client level, browser data, and client-side computations can be cached near the client, more specifically, on a machine where the users’ web browser is located (Wang, 1999; Podlipnig and Böszörmenyi, 2003; Balamash and Krunz, 2004).

Usually, caching solutions are designed and implemented as an abstraction layer that is transparent from the perspective of adjacent layers. Database and proxy caching are well-known for following this design principle. Differently, application-specific caches, which can be conceived at the server-side or client-side, require an extra effort from developers to be integrated into the web infrastructure (Gupta et al., 2011; Wang et al., 2014; Ports et al., 2010). Given these several available caching locations and their pros and cons, we summarize this information in Table 8.

Each caching location has its benefits, challenges, and issues, leading to trade-offs to be resolved when choosing a caching solution. For instance, according to a selected choice of location, particular forms of content can be cached, such as HTML pages (Candan et al., 2001; Negrão et al., 2015), intermediate HTML or XML fragments (Ramaswamy et al., 2005; Li et al., 2006; Guerrero et al., 2011), application objects (Ports et al., 2010; Gupta et al., 2011), database queries (Soundararajan and Amza, 2005; Amza et al., 2005; Baeza-Yates et al., 2007; Ma et al., 2014), or database tables (Larson et al., 2004). Furthermore, support varies across different locations (Mehrotra et al., 2010), as well as hit and miss probabilities. Therefore, it is important to make an effort to achieve the best design rationale to optimize each caching solution according to different circumstances. Given these several available caching locations and their pros and cons, we summarize this information in Table 8.

Table 8. Available Web Caching Techniques based on the Location.

	Location	Content Granularity	Positive Aspects	Negative Aspects
Client-side	File	– Static resources – Pages or fragments	+ Integrated into the client browser + No development effort required + Effective for static content + A hit provides a large gain + Larger reductions in perceived latency	– Harder to maintain consistency – Cached content cannot be shared – Manually configured at server-side – Lower hit ratio
Client-side	Data	– Requests from a single client to many servers – Computations	+ Larger reductions in perceived latency + Addresses arbitrary content + Flexible implementation + Takes application specificities into account	– Manually implemented – Cached content cannot be shared
Proxy-based	Forward	– Requests from many clients to many servers in the gateway server	+ Wide area bandwidth savings and improved latency + Increases the availability of static content + Fully transparent to developers	– Harder to maintain consistency – Requires a large-sized memory
Proxy-based	Reverse	– Requests from many clients to one server	+ Reduces bandwidth requirements + Reduces server-side workload + Fully transparent to developers + Easier to set up	– Harder to maintain consistency – Requires a large-sized memory – Lower hit ratio
Server-side	Application	– Pages or fragments – Service calls – Database queries – Computations	+ Addresses arbitrary content + Easier to maintain consistency + Saves processing requests + Flexible implementation + Takes application specificities into account + Reduces application workload	– Specific implementation – Manually implemented
	Mid-tier	– Content processed between source of information and application	+ Specific and inflexible implementation + Automatically admits content + Automatically maintains consistency + Scales up complex components	– Limited reusability – Not fully transparent to developers – A hit provides a small gain
	Database	– Partial or full tables – Queries – Views	+ Improves performance and scalability of databases + Cached data is provided to multiple applications + Higher hit ratio + Fully transparent to developers + Easier to maintain consistency	– Specific implementation – A hit provides a small gain

Due to the variety of application domains, workload characteristics, and access patterns, there is no universal best caching solution, because it is hard, if not impossible, to achieve a deployment scheme that performs best in all environments and maintains its performance over an extended period. Typically, caching at lower levels (i.e. near the source of information, such as at hardware or database level) tends to achieve higher hit ratios, whilst caching at higher levels (i.e. near to where information is used or presented, such as client-side file caching) offers greater gains if a hit occurs, despite the expected lower hit ratios (Amza et al., 2005). Therefore, caching at different locations is complimentary, and a combination of different solutions is possible and can bring even more benefits with the cost of a higher effort invested in caching configuration (Ravi et al., 2009; Sivasubramanian et al., 2007).

Regardless of the cache location, coordination techniques should be employed to allow caching to improve the web-based system performance. Coordination topics are explored next.

A.2. Coordination

As discussed above, caches may be placed in several locations and, regardless of where and how the cache is implemented, issues such as the limited cache space should be addressed to ensure the cache efficiency. Cache coordination decisions address these issues. It includes deciding the adequate content to be cached, policies to avoid stale content, and how to deal with situations where the cache gets full of content, and there is a new content item to be added. Such strategies can be considered in any caching location. Coordination issues and associated strategies can be split into three main groups, namely admission, consistency, and replacement, which are detailed next.

Admission:: Admission policies are used for content selection and preventing caching of unimportant content. They can have two key goals: (a) to prevent unnecessary items from being inserted into the cache (Baeza-Yates et al., 2007; Einziger and Friedman, 2014), and (b) to predict content expected to be requested and insert them into the cache in advance (Sulaiman et al., 2009). The former is associated with the adoption of a reactive mechanism to filter content and admitting only valuable items in the cache. The latter proactively prefetches and caches content expected to be requested, which can be seen as a proactive admission policy, improving significantly cache performance and reducing the user perceived latency. Prefetching usually exploits the content spatial locality (e.g. correlated reference between content) and takes advantage of component’s (server, proxy or client) idle time, preventing bandwidth underutilization and hiding part of the latency (Gawade and Gupta, 2012; Veena Singh Bhadauriya, Bhupesh Gour, 2013). Ali et al. (2011) provided a comparison of server, client, and proxy prefetching challenges and issues. They also surveyed prefetching studies at the proxy level, and classified these studies into two broad categories (content-based and history-based), according to the data used for prediction.
Consistency:: Approaches that address consistency deal with the fact that once the content goes to the cache, it is potentially stale because cached items can be updated at their source. Therefore, if there are freshness requirements, it is necessary to design and implement consistency approaches to avoid staleness of cached items. There are two main consistency models to caching: (a) weak, which provides higher availability by relaxing consistency (i.e. stale data is allowed to be returned from the cache) (Gupta et al., 2011); and (b) strong, which ensures that stale data is never returned, with a higher processing cost (Ports et al., 2010; Amza et al., 2005). Both models can be achieved through validation or invalidation processes. With validation, the application explicitly invalidates cached content when they are modified by verifying the cached items in their source, e.g. with the origin server. With invalidation, cached content is evicted or updated as soon as changes in the underlying source of information are detected, or the source server notifies the cache system that a cached content has changed, in which strong consistency is not guaranteed. Furthermore, if weak consistency is adopted, timeouts—i.e. time to live (TTL)—are used to trigger evictions of expired cached content. Traditionally, consistency (or coherence) is treated as a separate issue from web caching, being widely studied, including in the area of distributed systems (Ghandeharizadeh et al., 2012; Sivasubramanian et al., 2007; Gao et al., 2005).
Replacement:: In situations in which the cache is full of content, replacement policies are responsible for deciding which one should be removed in case of new content should be added. Such policies ideally remove content that is less expected to result in hits in the future. To achieve such behavior, most of the proposed replacement policies associates a value (e.g. recency, frequency or worthiness) with each cache entry. When the cache achieves its full capacity and a new item should be inserted, entries are sorted according to their associated value. Then, the entry with the lowest value is removed. Traditional replacement strategies are classically classified into four categories: recency-based, frequency-based, function-based and randomized (Podlipnig and Böszörmenyi, 2003; Ali et al., 2011). Recently, the availability of monitoring workload and user accesses, and the dissemination of mechanisms to store and process such information have motivated the proposal of intelligent replacement policies, which exploit such data as training data to build a meta-model (Ali et al., 2011). Despite that the most popular policy is the least recently used, due to its simplicity and relatively good performance, several other replacement policies have been proposed with the aim of getting good performance in many situations (Wong, 2006), and almost all of them were demonstrated to be superior to others in their proposal (Podlipnig and Böszörmenyi, 2003; Ali Ahmed and Shamsuddin, 2011; Sulaiman et al., 2013). These results indicate that there is no best policy in all scenarios because a policy performance depends on workload characteristics. Focusing on these differences, Wong (2006) provides a pragmatic approach to choosing the best policy. In summary, each policy imposes a trade-off between success rate and computational overhead.

Many factors can be observed while choosing or proposing a coordination strategy. Coordination strategies usually evaluate temporal locality properties, such as recency (i.e. last time a piece of content was requested) and frequency (i.e. how many requests to a piece of content occurred) (Ali Ahmed and Shamsuddin, 2011). In addition to temporal locality, the spatial locality can also be explored by analyzing the content context, which is not always straightforward. Other factors include size, access latency and time since last modification of the content, or even heuristics, such as useful time of a cached item. However, taking into account all factors to achieve an improved coordination decision is not a trivial task, given the computational overhead. Furthermore, there are no well-accepted influence factors, because all of them are subject to the particularities of the environment requirements, which means that the importance of properties depends on the scenario or environment under consideration (Podlipnig and Böszörmenyi, 2003; Wong, 2006).

In summary, all coordination strategies aim to solve caching issues to improve the general performance of web-based applications. However, cache benefits come with the cost of investing an effort to design, implement, and maintain the cache. The benefits of each caching strategy can be estimated regarding improvements to the performance of the cache and the system, which is measured with different metrics that are described next.

A.3. Measurement

The last category of our taxonomy for web caching is related to measurement techniques, which are used to measure the efficiency of web caching strategies (Podlipnig and Böszörmenyi, 2003; Ali et al., 2011). The most popular and well-accepted metrics are summarized in Table 9 with their name and acronym, description and definition of the metric. Similarly to coordination strategies presented in the previous section, cache performance metrics can be applied in any caching scheme, being not limited to web caching. Furthermore, these metrics are well-known and generic enough to evaluate and compare any caching strategy and tuning task.

Table 9. Traditional Cache Evaluation Metrics.

Metric (Acronym)	Description	Definition
Hit Ratio (HR)	The fraction of requests that result in a cache hit. Usually, the higher the hit ratio, the better the technique employed.	$\sum_{i\epsilon R}\frac{h_{i}}{f_{i}}$
Byte Hit Ratio (BHR)	The fraction of bytes of requests that result in a cache hit over the total number of bytes requested by clients.	$\sum_{i\epsilon R}s_{i}\cdot\frac{h_{i}}{f_{i}}$
Latency Savings Ratio (LSR)	The approximate response time reduction provided by caching while processing client requests.	$\sum_{i\epsilon R}d_{i}\cdot\frac{h_{i}}{f_{i}}$
Throughput (TRP)	The number of requests processed per second throughout a period of time. A higher throughput shows the effectiveness of the caching, as more requests can be satisfied within the same period of time.	$\sum_{i\epsilon R}\frac{f_{i}}{t_{i}}$
Notation:
$s_{i}=$ size of item i
$f_{i}=$ total number of requests for item i
$h_{i}=$ total number of hits for item i
$t_{i}=$ total fetch delay in seconds of requests for item i
$d_{i}=$ mean fetch delay from server for item i
$R=$ set of all requests

Note that HR and BHR conflict with each other, given that keeping small popular items in the cache favors HR, whereas holding large popular items increases BHR. Moreover, because it is hard to measure the response time precisely (it is affected by many factors independent from the cache, such as network congestion and server stability), LSR and TRP require a well-specified performance test (Wong, 2006).

Proactive admission strategies, which focus on predicting web items expected to be requested in the future, require specific metrics to evaluate efficiency and performance of these strategies. Domènech et al. (2006) reviewed a representative subset of the prediction algorithms used specifically for web prefetching, proposing a taxonomy based on three categories (prediction-related, resource usage and latency-related), which allow us to identify analogies and differences among the commonly used measurements. Table 10 presents such prediction-based metrics (Domènech et al., 2006; Huang and Hsu, 2008).

Table 10. Prediction-based Cache Evaluation Metrics.

Metric	Description	Definition
Precision	The fraction of prefetched requests that subsequently result in a cache hit.	$\frac{GoodPredictions}{Predictions}$
Byte Precision	The fraction of prefetched bytes of requests that subsequently result in a cache hit.	$\frac{SizeOfGoodPredictions}{Predictions}$
Recall	The fraction of prefetched requests that subsequently result in a cache hit over the total number of requests.	$\frac{GoodPredictions}{UserRequests}$
Byte Recall	The fraction of prefetched bytes of requests that subsequently result in a cache hit over the total number of bytes requested.	$\frac{SizeOfGoodPredictions}{UserRequests}$
Notation:
$GoodPredictions=$ the amount of predicted items that are subsequently demanded by the user
$Predictions=$ the amount of items predicted by the prediction algorithm
$SizeOfGoodPredictions=$ the number of $GoodPredictions$ with their size in bytes
$UserRequests=$ the amount of items that the user demands