SquidCacheObject - Parse a Squid cache file for administative tasks

Introduction

The Squid cache proxy is a popular HTTP caching proxy for Linux based web servers, it is highly optimized and configurable and all in all a great application. Despite its popularity, certain internal aspects are a mystery and parts of the programmer documentation are outdated and incorrect. This class, SquidCacheObject, is designed to help administrators and programmers out by easing the burden of parsing cache files (known as cache objects in Squid land) themselves and having to read the source code so that they may get useful information from that cache file to carry out administrative tasks like selective purging (you cannot just delete the cache file from the file system) or general cache analysis.

This class will create an object that represents the cache file, with all the meta data (and the actual file data if you choose) conveniently stored as public properties.

Please note

The default permissions on the Squid cache directory is setup to be readable by the squid user only, this means you will need to set the uid of the script to that of the squid user (with apache suexec or similar modules or the functions provided by PHP) or change the file mode on the Squid cache directory.

How to use it

This is a short run through of how to use the class, for anything not covered here read the source code, it is heavily commented.
1. First (and most obvious) we need to include the class into the script, that can be achieved using a line similar to the following:
include_once('SquidCacheObject.class.inc.php'); 
2.Now we instantiate (make) the object and pass the full path to the cache object you want to parse, remember the user the script runs as will need read permissions on the Squid cache file or Squid cache directory
$CacheObject = new SquidCacheObject('/path/to/squid/cache/00/00/00/0000001');
3.Now you can start accessing the various properties of the Squid cache object, remember not all the meta data types will be found so not all of the properties will be set for each and everything cache object, however, the general rule of thumb is that if you don't find what you need in a property it will be found in the SquidCacheObject::Headers property (you will need to parse these on your own by splitting the lines). The properties are (Remember I am just using the name $CacheObject as an example, for clarity):
// This is the key for the URL
$CacheObject->KeyURL;

// This is a unique SHA hash for the cache file
$CacheObject->KeySHA;

// This is a unique MD5 hash for the cache object
$CacheObject->KeyMD5;

// This is the original URL of the content that this cache object holds
$CacheObject->URL;

// This is the human readable size of the cache file
$CacheObject->Size;

// This is the human readable creation date (dd/mm/yy) of the cache file
$CacheObject->Created;

// This is the human readable modified date (dd/mm/yy) of the cache file
$CacheObject->Modified;

// This is the original HTTP headers from the cached data
$CacheObject->Headers;

// This is the actual cached file data
$this->Data;

Optional extras

If you are handling a lot of Squid cache files in an administration tool you might want this tool to operate a little faster (it is pretty fast already) and be a little more resource weary to ensure stability and consitancy. To do this you can provide an extra argument when calling the SquidCacheObject, this will ensure that the cache headers and the actual file data that the cache file holds are not parsed, only the cache file's binary meta data will be parsed as well as some general infromation about the file. Please check the example below for a better understanding.
// Include the SquidConfigObject class
include_once('');

// Instantiate the SquidCacheObject class and provide an extra argument
$CacheObject = new SquidCacheObject('/path/to/squid/cache/00/00/00/0000001'TRUE);

// At this point the $CacheObject would not hold anything in the Data and Headers property
var_dump($CacheObject);

How it works

Although the Squid cache file format might change slightly depending on the version that you are using, it usually looks something like this (everything in brackets is the data type):
meta-data-header  { 0x03, (unsigned int) meta-data-segment-length }
meta-data-segment {

    meta-data-tuple { (byte) meta-data-type, (unsigned int) meta-data-length, (bytes) meta-data-value }
    meta-data-tuple { (byte) meta-data-type, (unsigned int) meta-data-length, (bytes) meta-data-value }
    meta-data-tuple { (byte) meta-data-type, (unsigned int) meta-data-length, (bytes) meta-data-value }
    ...
    0x00, 0x00
}
HTTP headers...
HTTP headers...


Cache object data
EOF
This means you need to effectively parse the file with 3 parsing regimes, one for the meta data header, one for the actual meta data and one for the HTTP headers and file data. The meta data can be the following types:
Void         - This is an empty meta data item
Key URL      - This is key of the URL
MD5 URL      - This is a unique MD5 hash of the URL
SHA URL      - This is a unique SHA hash of the URL
URL          - This is the URL of the original file we cached
STD          - Unknown to me at this stage
Hit Metering - A value to define hit metering
Valid        - A value to determine the contents validity
End          - This meta data type will signify the end of the meta data segment
There is no guarantee which meta data values will be found in which cache object, for instance you might not have a URL meta data type, but you might have the URL in the headers, you should test for your specific system configuration and work from that.

About The Author

My name is Warren Smith and if you have anything more usefull for me to be doing than making stuff like this, I am available.If you have any bug reports, improvements or cool ideas about this application feel free to drop me an email. I can be contacted at smythinc@gmail.com

Requirements

PHP >= 4.3.0 (I have not tested on earlier versions but it could possibly work)