Content Security Policy (CSP)

CSP is a proposal being talked about by Mozilla developers. The proposal can be viewed here:
https://wiki.mozilla.org/Security/CSP/Spec

My nutshell interpretation of the proposal is to give client side web browsers the ability to enforce a policy defined by the web master of the web site serving the web page that the client is viewing.

If it's a policy defined by the web master, why can't the web master implement it? In a perfect world he can, but very often in today's world the web master is not really a code professional, but an everyday person running software (such as a blog or CMS or forum) that he did not write nor fully understands. This often results in XSS (and other) vulnerabilities that are not noticed or closed that allow a malicious individual to inject scripts into the web site for the purpose of attacking other users who visit that web site.

Sometimes these attacks can remain on the server for quite some time before they are found and dealt with. This is why I am so excited about the Mozilla proposal. With CSP, the web master can define a few simple parameters that alert the web browser that something fishy is going on. Furthermore, the web master can define a URL that the web browser can then use to alert the web master something fishy was attempted.

See the web page (linked above) for proper details and definition.

Disclaimer: I am not a Mozilla Developer and I am not involved in any of the decision making with respect to the CSP proposal.

Server Side CSP Filtering

Since it will be some time before CSP capable browsers have significant market share, I decided to implement CSP server side in a PHP class.

What the class (assuming it properly functions) does is remove any content from the web page that violates the CSP before the web page is even sent to the requesting browser. The beauty of doing it this way is that the user will not have to use a specific browser or install a plugin/add-on in order to benefit from a web master establishing a reasonable CSP policy (is CSP policy redundantly redundant?).

A web master who implements my class should still send the CSP header to the requesting client, server side CSP can not protect the user from policy violations that result from DHTML modification of the web page. DHTML should only happen via a script that is from an approved domain so the risk is lower, but the risk is still there, especially with the recent scary popularity of web masters using third party hosted JavaScript libraries.

Class Source

You can view the current development source here: cspfilter_class.phps.

To download the source with documentation: Download. Create a new directory and unpack the archive inside that directory, the tarball does not create a directory for you.

Target Audience

Anyone with dynamic content, especially dynamic content that displays data not provided by the page programmer.

I would love to see this class implemented by template developers and web applications commonly used by the general public (IE blogging software, content management software, boards, etc.)

License

The class is distributed under the terms of the Common Public License, version 1.0

CSP Implementation

Here is how the script implements CSP:

Image Nodes

If an img node does not have a src attribute that matches a CSP allowed source for images, one of two things will happen:

  1. If an alt attribute exists, the img node is replaced with a text node containing the contents of the alt tag. Earlier pre-release versions of the class just removed the src attribute, but that broke the W3C specification causing valid input to return invalid output after filtering.
  2. If an alt attribute does not exist, the img node is simply removed.

Example

If viewing this page via README.html you will see an image below. If viewing this page via index.php you will only see the alt tag.

[Test Image. Either your browser is not displaying images or the image was blocked.]

Media Nodes

Media nodes consist of the audio and video tags that look like they will be part of the (X)HTML 5 specification.

If a media node does not have a src attribute that matches a CSP allowed source for media, the media node is removed.

Script Nodes

Since there are no circumstances where a script node may have children in a CSP, any children of a script node are removed.

If a script node does not have a src attribute that matches a CSP allowed source for scripts, the script node is removed.

Event Attributes

From my understanding of the current CSP recommendation, event attributes are not allowed. All event attributes are therefore removed.

Example

Either you are viewing the index.php version on this page, or you are client side blocking script execution. If you allow scripts to execute, the README.html version of this page will have replaced this paragraph.

Object / Embed / Applet Nodes

If an object node does not have a src attribute that matches a CSP allowed source for objects, one of two things will happen:

  1. If the object node has children, it is turned into a div node (without any attributes), the children are preserved.
  2. If the object node does not have children, it is removed.

Embed

The embed element supposedly is a proprietary tag that has been deprecated by the object tag, yet I've seen some references to it being included in HTML 5.0 which makes zero sense to me since it does not do anything that can't be done with object. However, deprecated or not, it is a fairly popular tag, especially for flash.

The embed tag is not official (yet), but it seems like it is not allowed to have any children, so any children of an embed node are removed.

If an embed node does not have a src attribute that matches a CSP allowed source for objects, it is removed.

Applet

The applet tag has been deprecated, you should use object instead.

If an applet node does not have a code attribute that matches a CSP allowed source for objects, one of two things will happen:

  1. If the applet node has children, it is turned into a div node (without any attributes), the children are preserved.
  2. If the applet node does not have children, it is removed.

Example

If viewing this page via README.html you will see the contents of an object instead of this paragraph. If viewing this page via index.php you should see the children of the object node, including this paragraph.

Frame Nodes

A frame node is not allowed to have children by W3C specification, so any children of a frame node are removed.

If a frame node does not have a src attribute that matches a CSP allowed source for frames, it is removed.

If an iframe node does not have a src attribute that matches a CSP allowed source for frames, one of two things will happen:

  1. If the iframe node has children, it is turned into a div node (without any attributes), the children are preserved.
  2. If the iframe node does not have children, it is removed.

Example

Style Sheets

If the href attribute of a link node does not match a CSP allowed host for style sheets, the link node is removed.

Trigger Notification

If the CSP has a report URI specified, and the $policyLogFile variable does not point to a real file, code that violated the policy will be reported to that URI using a similar XML syntax as specified in the Mozilla CSP specification. It is up to the web developer to code something that does something useful with the report.

<cspfilter-report>
  <request>$_SERVER['REQUEST_METHOD'] $_SERVER['REQUEST_URI'] $_SERVER['SERVER_PROTOCOL']<request>
  <request-headers>Some headers related to the request</request-headers>
  <blocked-uri directive="policy">blocked resource URI</blocked-uri>
  <misc>violations that are not a URI resource</misc>
</csp-report>

It is very similar to what a CSP aware browser would send, but currently has some differences:

  1. The main node uses the tag name cspfilter-report opposed to csp-report to differentiate server side CSP enforcement from what is caught by the requesting client when the report is sent via post to the report-uri.
  2. The blocked-uri element will have an attribute whose name is the policy directive that caused the block and whose value is the policy host expression.
  3. I've added a misc node for alterations of the source by the filter that do not involve a blocked URI. The element name of that node may change.
  4. All reports for a page will be children of a single cspfilter-report node.

If you have specified a trigger log file, the report will be appended to the log file rather then sent to a report URI. The web server of course must have permission to write to the log file.

Beyond CSP

My class allows for some parameters that are not part of the CSP proposal.

Event Attribute Whitelist

You can define a white list of event attributes that will not be removed if scripting is allowed. However, if those event attributes have any functions with arguments, the argument will be removed. For example, <body onload="alert('Hello');"> would become <body onload="alert();">.

It is my hope that the final CSP recommendation will at least allow event handlers that do not contain any functions with arguments. Client side form validation really needs them. That's why my class allows for them.

Script Only in Head

This allows you to forbid any script nodes regardless of the src attribute that do not appear in the document head. It is useful if you declare third party JavaScript sources for your page to use but want to make sure your users can not inject a script from the same third party script host.

Other Checks

Attribute Content

The following are not allowed in any attribute and are removed if they are found:

Redundant Nodes

By W3C specification some nodes, such as title, are only allowed to occur once. The class assumes something fishy is being attempted and removes additional occurrences of those nodes.

Head Section

By W3C specification, some nodes may only occur as children of the head node. For example the meta node. When those nodes are not direct children of the head node, the output filter considers it to be suspicious and they are removed.

Nodes That Should Not Have Children

By W3C specification, some nodes, such as hr, are not allowed to have children. When those nodes are found to have children, the output filter considers it to be suspicious and the child nodes are removed.

Live Testing of Filter Class

You can test the current implementation of the CSP filter class here:
dom_script_test.php

No intentional import filtering of code entered into the textarea is performed. The text area upon submit is eaten by DOMDocument importHTML() which does do some minimal filtering of it's own, but if the code you enter is clean HTML it should be properly imported without modification and you can test how the class filters the input. Filtered input is displayed as XML at the bottom of the page after you hit submit so that you do not have to continuously view the page HTML source (which can sometimes fail in some browsers, fetching a new copy of the page via get to show the source) to see how it filtered your input.

Known Bugs

IDNA needs testing.
Zero testing has been done with host names and server paths that use IDNA.
Not Fully Compliant
Need to bring into compliance with https://wiki.mozilla.org/Security/CSP/Spec before 1.0
Does not check for existing meta tag
Need to check the DOM for an existing CSP meta tag and yank it if it exists.

Needs Checking

The obfus() function needs some heavy duty checking. I initially borrowed some of the regex in it from another source, and found the regex to be improper. The regex I replaced needs to be brutally tested to make sure it does what it is suppose to do, and the regex I did not replace needs thorough testing to make sure it doesn't need replacing.

I need to make sure I have the scope of the base tag correct, which elements it impacts and which elements it does not impact.

For script attributes, I need to make sure I catch all methods for invoking client side scripting. Right now it checks for javascript: vbscript: mocha: but there may be others it needs to check for?

I need to make sure my list of event attributes is complete, and that I properly understand what CSP wants to do with event handlers.

I need to check whether or not namespaced elements are legal in XHTML, right now I only check for namespaced attributes.

I really need to make sure there are no cases where W3C valid input result is invalid output after filtering.

Questions About the Spec

If style-src does not include the host the page is being served from, should style elements and style attributes be yanked?

Does the host expression list allow wildcards that are not at beginning of word?

Class Usage

Not a Substitute for Input Filter

You still need to filter your input, for a variety of reasons (IE you don't want an XSS vector stored in your database). If your input filtering is good and is configured to properly enforce your policy, the class should never be triggered to modify output generated by user input. The class is a second line of defense for situations where input validation failed to catch something.

If you are developing code and need an input filter, you are probably better off using something like HTML Purifier rather than writing a class/function of your own.

Class Operates on Complete Document

To use the class, you need to fully construct your HTML/XHTML document BEFORE using the class to filter it. Most (all?) popular template systems allow for this with relative ease.

If you mix HTML with PHP and/or use the PHP print() and echo() functions to send output to the browser before your PHP has finished running, you can not use this class. Since HTTP is a stateless protocol, sending data before you have finished building the document IMHO is poor design anyway, but doing so is very common. It won't work with this class though.

With respect to template systems, I'm not too familiar with them but it looks like some of them allow the page to be built in chunks and then sent as chunks instead of sent all at once. It would be better to create a buffer and append all your chunks to the buffer before sending them, and then pass the entire buffer through the output class.

Class requires PHP 5 with XML support

The class requires your document be in the form of a PHP 5 DOMDocument object. If your PHP installation does not support the PHP XML functions, you either need to recompile PHP or install the XML loadable module.

To import your document into a DOMDocument object:

$domdoc = new DOMDocument("1.0","utf-8");
$domdoc->preserveWhiteSpace = false; // optional
$domdoc->formatOutput = true; // optional but makes for prettier output
$domdoc->loadHTML($yourhtmlasbuffer); // use loadXML() for well formed XHTML

Now your HTML/XHTML is ready for filtering.

For an example of this method, see the PHP source of this page

Be Careful with Tidy

It may be tempting to pass your completed document through tidy before feeding it to a DOMDocument class for filtering. Be aware that tidy will move some tags into the head section that you do not want there. For example, if a malicious user manages to sneak a meta tag past your input filtering, tidy will then help the malicious user by moving that meta tag into the head section where the output filter will merrily allow it.

There may be a tidy option to prevent this, I don't know.

It should be relatively safe to use tidy as part of an import filter, as long as the tidy option show-body-only is specified as true.

Internationalized Domain Names for Applications

This functionallity has not yet been tested.

The CSP spec says that the CSP directives for IDNA needs to be punycode. The class does not do that for you. However, it will attempt to convert the source attributes to punycode for you if you if you install the IDNA Convert class.

You can get the class here: http://www.phpclasses.org/browse/package/1509.html

The class is not included with this distribution. Once you have the class, in order for cspfilter to use it you need to include the class, preferably somewhere in your script before you use the cspfilter class, or alternatively you can modify the cspfilter_class.php file and include it there.

I would be very grateful if someone with more IDNA knowledge than me could do some thorough testing of it. The live test page (hosted on clfsrpm.net) includes the necessary class and should work for IDNA testing (for html source, the CSP policy input fields must be entered as punycode encoded).

Using the Class

Initiate the class and tell it about your document:

$filter = new cspfilter($domdoc);

Specify the class option settings:

$filter->httphost = "www.yourdomain.com";
$filter->csp['allow'] = 'none';
$filter->csp['img-src'] = '*.yourdomain.com *.photobucket.com www.w3.org';

Run the filter:

$filter->processData();

Optionally apply the CSP meta tag or send the CSP header:

$filter->cspHeader = true; // defaults to false which creates meta tag instead of header
$filter->makeCSP(); // creates and applies the meta tag or sends the header

Now send the page to the requesting browser:

print $domdoc->saveHTML(); //for XHTML - use saveXML();

Public Class Variable

CSP Specific Variables

The class has a public array called csp that has seven indexes. These indexes are identical to the names and settings used by the CSP recommendation and take the same syntax for their setting.

$csp['allow']
Default policy. Set to self to default allow sources that originate from same domain as page is being served from, set to none to default deny (recommended, and default if not set), or set to a space delimited list of hosts that resources may be served from (* as leading wild card allowed, IE *.example.com).
$csp['img-src']
Override default policy for images. Set to self to allow images served from same host as page is being served from, or set to none to forbid images from being served, or set to a space delimited list of hosts images may be served from (* as leading wild card allowed, IE *.example.com).
$csp['media-src']
Override default policy for media (audio,video tags). Set to self to allow media served from same host as page is being served from, or set to none to forbid media from being served, or set to a space delimited list of hosts media may be served from (* as leading wild card allowed, IE *.example.com).
$csp['script-src']
Override default policy for scripts. Set to self to allow scripts served from same host as page is being served from, or set to none to forbid all script execution, or set to a space delimited list of hosts scripts may be served from (* as leading wild card allowed, IE *.example.com).
$csp['object-src']
Override default policy for object, embed, applet. Set to self to allow objects served from same host as page is being served from, or set to none to forbid objects from being served, or set to a space delimited list of hosts objects may be served from (* as leading wild card allowed, IE *.example.com).
$csp['frame-src']
Override default policy for frames and iframes. Set to self to allow frames served from same host as page is being served from, or set to none to forbid frames from being served, or set to a space delimited list of hosts objects may be served from (* as leading wild card allowed, IE *.example.com).
$csp['frame-ancestors']
Override default policy for frame ancestors. Note that there is no way to server side filter frame-ancestors, frame-ancestor filtering can only be done by client side CSP.
$csp['style-src']
Override default policy for style sheet source. Set to self to allow style sheets served from same host as page is being served from, or set to none to forbid all style sheets, or set to a space delimited list of hosts style sheets may be served from (* as leading wild card allowed, IE *.example.com).
$csp['report-uri']
URI that the browser should report policy violations to.

NOTE: cspfilter does not use the policy-uri directive. Specify your desired policy using the above array variables in your script. If you want to set site wide policy, you can create a class that extends the cspfilter class and define your defaults there. Why? See the FAQ.

Non CSP Variables

$version
The version of the class. Not used by the class. This variable should be considered read-only, but unfortunately there isn't (yet, appears to be in CVS) a simple way to declare public class variables as read-only. Yes, I've seen the hacks that do it, but since changing that variable will not break anything they don't interest me. Just understand there isn't a need to ever alter it.
$cspHeader
Boolean. Only used by the makeCSP() function. If set to false (default) then makeCSP() creates a CSP meta tag and puts it in the head section of your document. If set to true, makeCSP() send an HTTP header to notify the client browser of the CSP for the page.
$scriptOnlyInHead
Boolean. If set to true, the script node is only allowed in the document head section.
$httphost
String. The fully qualified host name the page is being sent from. The class does detect this from the $_SERVER["HTTP_HOST"] global, but that global can be impacted by headers the client sends, so it should not be trusted. It is better to specify the variable value.
$eventWhitelist
Array. If scripting is allowed, this array contains the name of event attributes that the filter is not to remove from the document.
$policyLogFile
String. If specified and the file exists, policy violations will be logged to this file. Even if $csp['report-uri'] is set, the $policyLogFile variable takes precedence so that the class does not need to make an HTTP connection.

Public Class Functions

cspfilter()
Constructor function that is run whenever a new instance of the class is created. There is no need to ever call this class directly. Requires a DOMDocument object as argument.
processData()
No Arguments. Applies the filtering to the DOMDocument object specified when the class is initiated according to the rules specified by the public class variables.
makeCSP()
Optional, but recommended. No arguments. Either creates a meta node specifying the CSP or sends an HTTP header specifying the CSP.

Extending The Class

One thing you can do is to create an extension to the class that specifies the base policies you want to enforce. You can still over ride them on a per page basis. Here is an example:

<?php
require_once('cspfilter_class.php');

class MyCSP extends cspfilter {
   var $csp = Array('allow'    => 'none',
            'img-src'          => 'self',
            'media-src'        => '',
            'script-src'       => '',
            'object-src'       => '',
            'frame-src'        => '',
            'frame-ancestors'  => '',
            'style-src'        => 'self',
            'report-uri'       => '');
   var $cspHeader     = true;
   var $policyLogFile = '/path/to/log/csplog.xml';
   var $httphost      = 'www.yourdomain.net';
   }
?>

Require the file containing that code and then you can call the class in your page via:

$filter = new MyCSP($domdoc);

The settings specified in the class extension will be used.

FAQ

OK, these are not really frequently asked, but they might be.

Why do you not support the policy-uri directive?
There are several possible sources for policy. One is the meta tag, one is the policy-uri, and one of course is the $csp array that the class uses.
 
I actually started to write code that merged the different possible sources using set intersection, but to do it right it started to get more and more complex. The more complex something is, the more likely it has bugs and the more difficult it is to maintain, so I opted for a KISS philosophy. Ignore any directives set in a policy file or in a meta tag of the DOM.
 
Since the default policy for the class is none ignoring those other methods is safe to do. Also, the headers / meta tag are not suppose to use the policy-uri unless that is the only directive sent, but if a user wants to use a site wide policy, the user can just extend the class and define the site wide policy there via the $csp variables.
When will the class support International Host Names?
I think it does now, but I have not done any testing whatsoever. The CSP spec states that the CSP directives need to be in punycode. I do not attempt to do that for you (should I??). I do however attempt to convert sources to punycode for checking against the CSP policy.
Do you have a bugzilla?
No. I'm just a regular guy who wrote this class for my own purposes and decided that others could benefit. Feel free to send bug reports to mpeters@mac.com.
Why do I have to have the page fully constructed before sending to use your class?
Because the class is an output filter that operates on a DOMDocument. You can cheat and alter the DOMDocument after running the filter, but then any content after running the filter is not filtered.
Are you going to write a CSP input filter?
From scratch? No. At some point I may try to extend the HTML Purifier class to add some CSP checks on input, but if (and I stress the if) I ever do, it probably won't be for awhile.

Valid HTML 4.01 Transitional