Version: @(#) $Id: xml_parser.php,v 1.34 2012/09/05 09:27:25 mlemos Exp $
XML document parser
Manuel Lemos (mlemos@acm.org)
Copyright © (C) Manuel Lemos 1999-2012
@(#) $Id: xml_parser.php,v 1.34 2012/09/05 09:27:25 mlemos Exp $
This class is meant to parse, validate and extract information from XML documents.
A XML document may be parsed using the Parse function by passing it the whole XML document data as a single string all at once or calling the function multiple times passing the XML document as separated data chunks.
Alternatively, the class can parse a XML document reading it from a given file using the ParseFile function or from a previously opened file or stream using the ParseStream function.
The ExtractElementData function can be used to validate and extract data from a XML document after it has been parsed.
It allows validating the XML document tag structure and data elements according to rules specific of common data types.
Custom types of tag element validation can be done by extending the class and implementing the ValidateElementData function in a subclass.
string
''
Store the message of the last error that occured.
Check this variable to retrieve the reason of failure if a call to the class failed.
int
0
Store the code of the last error that occured.
Check this variable to retrieve the number of the error that happened when a call to the class failed. Valid error code numbers are defined by constants:
XML_PARSER_NO_ERROR - no error happened
XML_PARSER_CREATE_PARSER_ERROR - failed to create the XML parser
XML_PARSER_PARSE_DATA_ERROR - failed to parse the XML document
XML_PARSER_READ_INPUT_DATA_ERROR - failed to read XML data from file
XML_PARSER_VALIDATE_DATA_ERROR - failed to validate XML data
int
0
Store the number of the line of the XML document related to the last error that occured.
Check this variable to retrieve the number of the XML document line of the error that happened when a call to the class failed.
XML document line numbers start at 1.
int
0
Store the number of the column of the XML document related to the last error that occured.
Check this variable to retrieve the number of the XML document column of the error that happened when a call to the class failed.
XML document column numbers start at 1.
int
0
Store the position of the byte of the XML document related to the last error that occured relatively to the beginning of the document.
Check this variable to retrieve the byte index of the XML document position of the error that happened when a call to the class failed.
XML document byte indices start at 0.
int
0
XML parser error code number
Check this variable to retrieve the original error code returned by the XML parser when the document parsing fails.
int
4096
Size of the buffer of the chunks of the XML document file to be parsed
Increase this variable if you need to parse a XML document file in larger chunks to reduce document parsing time overhead.
array
array()
Structure of the parsed XML document
This variable is an associative array that has all elements of the parsed XML document.
The indexes are strings that contain numbers separated by commas. The numbers represent the order of the element inside the parent tag element. The order number of the first element inside a tag is 0. The index of the root tag is '0'. The index of the first child element inside the root tag is '0,0', and so on.
The values of each document element may be strings for data elements, or associative arrays for tag elements. The tag element arrays may contain the following entries:
'Tag' - string with the tag name
'Namespace' - string with the namespace of the tag if the extract_namespaces variable is set to 1, otherwise the namespace is included in the 'Tag' entry.
'Elements' - integer with the number of elements inside a tag
'Attributes' - associative arrays with the tag attributes and their values.
'AttributeNamespaces' - associative array with the namespaces of each of the tag attributes if the extract_namespaces variable is set to 1, otherwise the attribute namespaces are included in the 'Attributes' entries.
array
array()
Positions of each document element
This variable is an associative array that stores the positions of each element of the parsed XML document, if the store_positions variable is set to 1.
The indexes are strings of the respective elements in the structure variable array.
The entry values are associative arrays with the Line, Column and Byte of the respective element.
bool
0
Set the class to keep track of the positions of the parsed XML document elements
Set this variable to 1 if you need to obtain the positions of XML document elements, in particular when a parsing error occurs.
bool
0
Set the class to extract the namespace names from tags and tag attributes.
Set this variable to 1 if you need to separate the namespaces from the tag names and attribute names.
bool
0
Set the class to convert the tag and attribute names to upper case.
Set this variable to 1 if you process the tags and attributes in a case insensitive way.
string
'ISO-8859-1'
Set the class to convert the character encoding of the parsed XML document text
Set this variable to a specific character encoding if you need the parsed text values to be returned in that encoding.
bool
0
Set the class to parse documents in simplified XML documents which use no tag attributes.
Set this variable to 1 to gain some parsing time and spend less memory if the XML document to be parsed is not expected to have tags with attributes. In this case, namespaces are not extracted from tags.
bool
0
Set the class to fail in error if the document tags have attributes and the simplified_xml variable is set to 1.
Set this variable to 1 if the XML document to parse uses tag attributes but those are not meant to be allowed.
string Parse(
Parse a XML document from data.
Pass the XML document data to the data parameter. The data of the document may be passed all at once or one chunk at a time. The end_of_data parameter should be set to 1 if the current chunk is the last chunk of the XML document.
data - The XML document data to be parsed.
end_of_data - Flag that determines if the data being passed is the last chunk of the XML document.
Message string of an error that occured when this function was called. An empty string means that no error occured.
string ParseStream(
Parse a XML document from an opened file or stream.
Pass an already opened file or stream to the stream parameter.
stream - The file or stream handler from which the XML document data will be retrieved.
Message string of an error that occured when this function was called. An empty string means that no error occured.
string ParseFile(
Parse a XML document read from a file.
Pass the name of the file to the file parameter.
file - Name of XML document file to be parsed.
Message string of an error that occured when this function was called. An empty string means that no error occured.
string ValidateElementData(
Validate values of parsed XML document elements
This function is called by the ExtractElementData function to perform a custom type of validation of tag data or attribute values.
Extend this class and implement this function in a subclass to provide custom element data validation types.
validation - Name of the custom validation to be performed.
path - Path of the element value being validated. It is the entry index of the element in the structure variable.
value - Value to be validated.
result - Associative array that should return the result of the validation.
This array should have an entry named error set to an error message in case there was a validation error. The error entry should not be set if the value is considered valid.
An optional entry named path should be set to the path of the element to which the eventual validation error is related in case the path is different from the original element path passed by the path.
This function should return an non-empty string as error message when an error occurs. Otherwise it should return an empty string even when the value is considered invalid.
string ExtractElementData(
Validate and extract values of parsed XML document according to given rules
Call this function after having parsed a XML document to validate document elements and extract element values.
path - Path of the parent of the XML documents to be processed.
It should be set to '0' if it is intended to process the children elements of the root tag. An empty string should be passed if the processing should start from above the root element.
name - Name of the type of document being processed. It could be anything. It is only used in eventual validation error messages.
types - Associative array with parameters that define rules on how element values should be processed and extracted.
This array should contain entries with the names of each tag element that is allowed inside the element with the path specified by the path argument.
The values of each tag entry should also be associative arrays with the respective tag element rule parameters. The rule parameters should be:
type - (required) Type of the element value. Supported values are: text, integer, decimal, boolean, date, path (for just returning the path of the element), array and hash (to return an array of contained tag elements).
minimum - Minimum number of times a tag element with the same name may appear. The default is 1.
maximum - Maximum number of times a tag element with the same name may appear. The default is 1. '*' means unlimited.
validation - Name of a custom validation type to be applied. The function will call ValidateElementData to perform the specified validation type.
Parameters for specific types:
For hash and array types:
types - Associative array with the rules for all the tags allowed inside the current tag. It is the same as the types argument.
attributes - Associative array with the names and rules of the allowed attributes for the current tag. Each attribute rule must contain a type parameter that is the same as the tag element type parameter except that it cannot be hash, hash or array.
All the type specific parameters are allowed. The validation parameter may also specify a custom validation type. Optional attributes must have set the parameter optional to 1.
For text type:
minimumlength - Minimum length of the value. The default is 0
maximumlength - Maximum length of the value. The default is unlimited.
For integer type:
minimumvalue - Minimum allowed value. The default is unlimited.
maximumvalue - Maximum allowed value. The default is unlimited.
Special type specific values:
For boolean:
True may be true, t, 1, y, yes
False may be false, f, 0, n, no
For date:
The current day date may be now
hash - Flag that determines if the extracted values should be returned as an associative array or as a regular array.
values - Array with the extracted element values.
Message string of an error that occured when this function was called. An empty string means that no error occured.