Manipulating XML with Sablotron, PHP and MySQL


XML is very quickly becoming a standard format for storing and exchanging data. The benefits of using XML are huge : separation of style and content for web publishing, better knowledge representation, increased readability, increased automation of machine to machine communications, etc. However handling XML poses a few problems : for example, how do you store and retrieve XML data? how do you transform one XML format into another one? How do you turn XML data into HTML for web publication? The goal of this short article is to answer these common questions using a few examples. It also aims at showing the power of some open source tools for storing data, transforming XML, and publish on the web.

Requirements:

- MySQL

MySQL is a fast and convenient relational database system, available at http://www.mysql.org. It has been recently licensed under the GPL, meaning that it should now develop new functionalities at a fast pace. If MySQL is not installed on your machine, have a look at this Linux Magazine article for a complete tutorial on installing MySQL on Unix environments.
 

- Sablotron

Sablotron is a fast, GPL or MPL licensed XSLT engine fully implemented in C++. It uses the well-known Expat XML parser (included with the source code). It is available at its creator web site :  http://www.gingerall.com. Okay, now you may ask : what is an XSLT engine, and what does it do? Well, it does XSL transformations i.e it transforms some XML data in a particular format into another format. Here is a tutorial on these transformations. Sablotron is not as complete as Xalan-J or XT (for example it didn't work on a docbook stylesheet), but it should do fine (and fast) for simple transformations.
Once installed, the command line for running Sablotron and applying the file.xsl stylesheet to the file.xml containing some XML data is :
sabcmd file.xsl file.xml

sabcmd is also stream-aware, so you can pipe the output of  a 'cat file' into sabcmd and output the result of the transformation to screen. For example:
cat file.xml | sabcmd file.xsl

Add the '-m' flag to output the time taken by the transformation:
sabcmd -m file.xsl file.xml
 

- PHP

PHP is the number one, server-side, web scripting language. We will use the brand new XSLT capabilities of PHP version 4, which are based on Sablotron. This means that PHP has to compiled with Sablotron support (use configure --with-sablot=/path/to/sablotron/library at compile time)

Storing XML data into a MySQL database

As an example, we will store the Slashdot news into a MySQL database. First of all, download the XML file at http://slashdot.org/slashdot.xml. This file looks like this:
<?xml version="1.0"?>

<backslash xmlns:backslash="http://slashdot.org/backslash.dtd">
 <story>
  <title>NASA Proposes Launch Solar Sail Vehicle For 2010</title>
  <url>http://slashdot.org/article.pl?sid=00/05/15/058238</url>
  <time>2000-05-15 07:54:15</time>
  <author>timothy</author>
  <department>ralph-nader-will-have-to-hire-a-chase-car</department>
  <topic>space</topic>
  <comments>99</comments>
  <section>articles</section>
  <image>topicspace.gif</image>
 </story>

 <story>
  <title>Linuxcare Responds To Tim O'Reilly's Article</title>
  <url>http://slashdot.org/article.pl?sid=00/05/15/0254252</url>
  <time>2000-05-15 02:57:07</time>
  <author>timothy</author>
  <department>consider-source-horses-mouth-grain-of-salt</department>
  <topic>linuxbiz</topic>
  <comments>142</comments>
  <section>articles</section>
  <image>topiclinuxbiz.gif</image>
 </story>
.
.
.
</backslash>

The most interesting things are the title, the url, and the time. That is all we will store into the database, so we create a new table, named "stories", that will contain these items. To do so, insert the following SQL command into a file named "stories.sql":

CREATE TABLE stories (

 title varchar(100) not null,

 url varchar(100) not null,

 time datetime not null,

primary key(story_url)

);

As you can see, all the fields are defined as "not null". The URL is defined as the primary key, so two stories with the same URL cannot coexist within the table.

Then, you may want to create a new database. To do so, use the following command:
mysqladmin create newsdb

Finally create the table within newsdb:
mysql newsdb < stories.sql

Next, create a XSL file called "slashtosql.xsl" to transform a Slashdot XML news into SQL commands:

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<xsl:apply-templates select="backslash/story"/>
</xsl:template>

<xsl:template match="backslash/story">insert ignore into stories (story_title, story_url, story_date) values ("<xsl:value-of select="title"/>", "<xsl:value-of select="url"/>", "<xsl:value-of select="time"/>");
</xsl:template>

</xsl:stylesheet>
This stylesheet is simple : when a story tag is encountered, an INSERT command is created with the values of the title, url and time elements. The IGNORE option tells MySQL not to ouput any error messages when it refuses to insert already stored stories.
 

Since a XSLT engine is initially designed to transform an XML format into another XML format, Sablotron inserts a line at the top of its output indicating the XML version and the encoding scheme. Since this line is not a valid SQL command, MySQL will reject it. The solution is to add the line <xsl:output omit-xml-declaration="yes"/> to the XSL stylesheet, inside the <xsl:stylesheet> node. The stylesheet then becomes:

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output omit-xml-declaration="yes"/>

<xsl:template match="/">

<xsl:apply-templates select="backslash/story"/>

</xsl:template>
<xsl:template match="backslash/story">insert ignore into stories (story_title, story_url, story_date) values ("<xsl:value-of select="title"/>", "<xsl:value-of select="url"/>", "<xsl:value-of select="time"/>");
</xsl:template>

</xsl:stylesheet>

You could also use (and that is the first thing I actually did) the sed Unix program (Stream EDitor) to remove this line:
sabcmd slashtosql.xsl slashdot.xml | sed -e 's/<.*>//g'

The final step is to execute the SQL commands within MySQL. This can be done by piping the output of sabcmd into mysql.
sabcmd slashtosql.xsl slashdot.xml | mysql newsdb
 
 

Turning database content into XML

Okay, the data is stored
<?
class phpSQLProcessor {
 var $doc_element;
 var $row_element;
 var $query;
 var $results;
 var $xml;
 var $doc_attrs;

 /** constructor **/
 function phpSQLProcessor() {
  $this->doc_attrs = array();
 }

 /** connecting to a MySQL database and selecting a datasbase **/
 function connect($host, $user, $pswd, $base) {
  mysql_connect($host, $user, $pswd);
  mysql_select_db($base);
 }

 /** setting the doc element **/
 function set_doc_element($str) {
  $this->doc_element = $str;
 }
 

 /** adding an attribute (name and value) to the doc element **/
 function add_doc_attr($attr, $value) {
  $this->doc_attrs["$attr"] = $value;
 }

 /** setting the row element **/
 function set_row_element($str) {
  $this->row_element = $str;
 }
 

 /** setting the SQL query **/
 function set_query($query) {
  $this->query = $query;
 }

 /** executing the query **/
 function execute() {
  $this->results = mysql_query($this->query);
  if (!$this->results)
   return false;
  else
   return true;
 }

 /** returns the XML data resulting from the SQL query **/
 function get_xml() {
  $rowset = (!empty($this->doc_element)?$this->doc_element:"ROWSET");
  $row = (!empty($this->row_element)?$this->row_element:"ROW");

  $xml .= "<?xml version=\"1.0\"?>\n";
  $xml .= "<$rowset";

  // adding doc attributes
  reset($this->doc_attrs);
  while (list($key, $val) = each($this->doc_attrs)) {
   $xml .= " $key=\"$val\"";
  }
  $xml .= ">\n";
  while ($tab = mysql_fetch_array($this->results, MYSQL_ASSOC)) {
   $xml .= " <$row>\n";
   reset($tab);
   while (list($key, $val) = each($tab)) {
    $xml .= "  <$key>";
    $xml .= $val;

    $xml .= "</$key>\n";
   }
   $xml .= " </$row>\n";
  }
  $xml .= "</$rowset>\n";
  return $xml;
 }
 

 /** get the errors as XML **/
 function get_error() {
  $xmlerr .= "<?xml version=\"1.0\"?>\n";
  $xmlerr .= "<error>" . mysql_error() . "</error>\n";
  return $xmlerr;
 }

}
 

Now let's use this class:

include("phpsqlprocessor.inc.php");
/** create a new processor **/
$p = new phpSQLProcessor();

/** connect to localhost, and select the "test" database
$p->connect("localhost", "olly", "", "test");

$p->set_doc_element("stories");

$p->add_doc_attr("source", "slashdot");

$p->set_row_element("story");

$p->set_query("select * from stories");

if (!$p->execute()) {
 print $p->get_error();
 exit();
}
print $p->get_xml();

<slashdot>
 

Turning XML into HTML with PHP and Sablotron.

$parser = xslt_create();

if (!$parser)

 print xslt_error();
$xsl = join ('', file ('./myslashdot.xsl'));
$xml = $p->get_xml();

$h = xslt_process($xsl, $xml, $result);

//print xslt_error();

print $result;

xslt_free($parser);

?>
If nothing happens, the xslt_error() function can be used to understand where the error may come from (from a syntax error in the XSL stylesheet for example).
 
 

Turing multiple source XML into WML

Now suppose you want to add the linuxtoday.com news, and you want the news to be displayed in WML format. Suppose that, for a good reason, the linuxtoday.com news need to be fetched directly from the web site and cannot be stored within your MySQL database
<linuxtoday>

 <story>

    <title>LinuxProgramming: Blackdown releases Java3D
1.2.1_01 for Linux</title>

    <url>http://linuxtoday.com/news_story.php3?ltsn=2001-05-27-006-20-NW-SW</url>

    <time>May 27, 2001, 16:26:41</time>

    <author>mhall</author>

    <topic>News,Software</topic>

    <comments>0</comments>

  </story>

 <story>

    <title>LinuxProgramming: International PHP conference
2001 Call for Papers</title>

    <url>http://linuxtoday.com/news_story.php3?ltsn=2001-05-27-006-20-PR-CY-SW</url>

    <time>May 27, 2001, 16:00:42</time>

    <author>mhall</author>

    <topic>Press Releases,Community,Software</topic>

    <comments>0</comments>

  </story>

.

.

.

</linuxtoday>

We won't need the date..

These news are not exactly in the right format, so we need to transform them a little bit. Here is the XSL stylesheet to do so:

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<stories source="linuxtoday">
<xsl:apply-templates select="linuxtoday/story"/>
</stories>
</xsl:template>

<xsl:template match="linuxtoday/story">
  <story_title><xsl:value-of select="title"/></story_title>
  <story_url><xsl:value-of select="url"/></story_url>
</xsl:template>

</xsl:stylesheet>

Ressources:
http://www.gingerall.com

Détail de l'élement <xsl:output/> http://www.w3schools.com/xsl/el_output.asp
http://www.linux-mag.com/cgi-bin/printer.pl?issue=2001-03&article=mysql  Linux-Magazine Installing, Configuring, and Using MySQL (the next killer app)