Wednesday, June 12, 2013

Full-Page caching with NGinx and PHP

This is part two in my caching series. Part one covered the concept behind the full page caching as well as potential problems to keep in mind. This part will focus on implementing the concept in actual PHP code. By the end of this you’ll have a working implementation that can cache full pages and invalidate them intelligently when an update happens.


Requirements

I’ll provide a fully functional framework with the simple application I used to get my benchmark figures. You’ll need the following software to be able to run it.

Nginx. I’m not sure which exact version but I generally use and recommend the latest development version.

PHP 5.3.0. I recommend at least 5.3.3 so you’ll have PHP-FPM for your fastcgi process management.

MySQL

Memcached

The Framework

You can download the framework here: Evil Genius Framework. I’ll be referencing code in the files instead of pasting it in this post to keep the size down, so you will probably want to download it.

The framework uses a 3 tiered setup like most of the popular frameworks. It consists of controllers, libraries and views.

A controller is what handles the flow of the request. It parses the input provided and decides on what action to take. Only one controller will ever be loaded during a request.

The libraries handle the brunt of the work, they’re usually be the ones to access the database and generate the actual data for the controller to handle. Several libraries might be used during a request.

Views are the template logic, they’re non-parsing and use PHP for their logic.

The index.php file handles the routing, there are a few settings there but nothing really interesting for this blog post. The only thing you need to know if you want to mess around with the sample application is that there is a direct URI to file routing. There is no manual routing available.

The Caching Logic

Just so we’re on the same page, the goal here is to define a way of invalidating cached pages that use stale data. Since the cached pages are served directly we have to invalidate cached pages when the data is being changed. So before we begin the implementation of this we’ll need a few concepts to help us keep things straight.

Cache keys. This is how pages will be identified in the cache. The framework uses a direct URI to controller mapping it makes sense to use the URI as the cache key, so if refer to the URI or the cached page I mean the key under which its cached.

DataKeys. These are essentially identifiers for data. The goal is to prevent stale data so we obviously need a way to identify and reference the data we’ll be working with.

With the cache keys and dataKeys concepts defined we can now begin to implement the invalidation logic. For this we need to track the data and establish a relation between data and cache keys. As we established in part one there might be multiple controllers using the same data so we need to map what data every controller use. Furthermore we need each controller to report which cache keys they generate so that we can invalidate them.

This is where cachetracker.php comes in, you can find it in the core directory. All caching logic is handled by this file. If you look at the top of it you’ll see an interface called ControllerCacheable. Every controller which handles cached data needs to implement this interface.

ControllerCacheable defines two methods, dataKeyReads() and dataKeyInvalidates(). The former handles mapping data to controllers and the latter handles mapping data to cache keys.

DataKeyReads() should return an array of the dataKey a controller will read from. This allows us to easily iterate every controller and generate a dependency mapping of data -> controller

DataKeyInvalidates() accepts the dataKey to invalidate and an optional payload (will show example later). When given a dataKey this method should return an array of cache keys that use this dataKey. These cache keys will then be invalidated.

The CacheTracker generates the dependency mapping in the getDataKeysAccessors()method. It will iterate through the controllers directory and call the DataKeyReads()method if it implements the ControllerCacheable interface. After covering all the cacheable controllers the mapping list will be stored to a file ‘deplist.txt’ in the root directory relative to the index.php. Please note that if you change the dataKeys a controller uses you’ll have to delete this file so that it’ll be regenerated.

The second method of interest in the CacheTracker is triggerDataKeyInvalidation(). This is the method that one should call whenever a change to data has been made. This method checks the dependency mapping list and call dataKeyInvalidates() in the controllers which use the dataKeys. At this point we’ve essentially managed to get the cache key used by every controller which use the piece of data we’ve just updated. Time to see how this translates into a real world example.

The Sample Application

The application I’ve included in the download is quite simple as it’s intended to showcase the concept only, it’s not a valid measurement of how fast a real world application would be. With that out of the way, have a look at the news.php controller. It’s got everything a news script really requires, news and comments! The actual news and comments implementation is not overly interesting so scroll to the bottom of the file and check out the methods defined by our ControllerCacheable interface.

DataKeyReads() defines an array with elements news and comments. These are the dataKeys that this entire controller deals with.

DataKeyInvalidates() converts a DataKey into the cache keys pages are stored under. The code pretty much speaks for itself but I do want to point out the use of $payload as this is a good example of how the payload information can be used to pinpoint the exact cache keys to invalidate. Without it we would have had to invalidate all the news posts.

Next in the sample application is the news library. It’s located in the cachetest folder under libraries. The interesting part here is the call to CacheTracker::triggerDataKeyInvalidation() whenever the library changes the data.

If you want to try out the sample application you need to configure a few things first. Inside the includes directory there is a config.php file. The various configuration options should speak for themselves. There is also an .sql file in the root which contains the table definitions and some sample data.

The Nginx Configuration

The configuration is as follows:

upstream memcached {


server 127.0.0.1:11211;

keepalive 1024 single;

}


upstream backend {

server 127.0.0.1:9000;

}


server {

listen 80;

server_name live.framework.com;


access_log /var/log/nginx/framework.access.log;

error_log /var/log/nginx/framework.errors.log notice;


root /home/framework;


try_files $uri @missing;


location @missing {

rewrite ^(.*[^/])$ $1/ permanent; # Add a trailing slash if none exist.

rewrite ^ /index.php last;

}


# Forbid the system dir, but allow media files.

location ~* ^/system/.+\.(jpg|png|gif|css|js|swf|flv|ico)$ {

expires max;

tcp_nodelay off;

tcp_nopush on;

}


location ~ /system {

rewrite ^ /index.php last;

}


# Check cache and use PHP as fallback.

location ~* \.php$ {

if (!-f $request_filename) {

return 404;

}


default_type text/html;

charset utf-8;


if ($request_method = GET) {

set $memcached_key fw53$request_uri;


memcached_pass memcached;

error_page 404 = @nocache;

}


if ($request_method != GET) {

fastcgi_pass backend;

}


}


location @nocache {

fastcgi_pass backend;

}

}



Full-Page caching with NGinx and PHP

No comments:

Post a Comment