XML is very quickly becoming a standard format for storing and exchanging data. The benefits of using XML are huge : separation of style and content for web publishing, better knowledge representation, increased readability, increased automation of machine to machine communications, etc. However handling XML poses a few problems : for example, how do you store and retrieve XML data? how do you transform one XML format into another one? How do you turn XML data into HTML for web publication? The goal of this short article is to answer these common questions using a few examples. It also aims at showing the power of some open source tools for storing data, transforming XML, and publish on the web.
sabcmd is also stream-aware, so you can pipe the output of a 'cat
file' into sabcmd and output the result of the transformation to screen.
For example:
cat file.xml | sabcmd file.xsl
Add the '-m' flag to output the time taken by the transformation:
sabcmd -m file.xsl file.xml
<?xml version="1.0"?> <backslash xmlns:backslash="http://slashdot.org/backslash.dtd"><story>
<story>
<title>Linuxcare Responds To Tim O'Reilly's Article</title>
<url>http://slashdot.org/article.pl?sid=00/05/15/0254252</url>
<time>2000-05-15 02:57:07</time>
<author>timothy</author>
<department>consider-source-horses-mouth-grain-of-salt</department>
<topic>linuxbiz</topic>
<comments>142</comments>
<section>articles</section>
<image>topiclinuxbiz.gif</image>
</story>
.
.
.
</backslash>
The most interesting things are the title, the url, and the time. That is all we will store into the database, so we create a new table, named "stories", that will contain these items. To do so, insert the following SQL command into a file named "stories.sql":
CREATE TABLE stories ( title varchar(100) not null, url varchar(100) not null, time datetime not null, primary key(story_url) );As you can see, all the fields are defined as "not null". The URL is defined as the primary key, so two stories with the same URL cannot coexist within the table.
Then, you may want to create a new database. To do so, use the following
command:
mysqladmin create newsdb
Finally create the table within newsdb:
mysql newsdb < stories.sql
Next, create a XSL file called "slashtosql.xsl" to transform a Slashdot XML news into SQL commands:
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><xsl:template match="/">
<xsl:template match="backslash/story">insert ignore into stories
(story_title, story_url, story_date) values ("<xsl:value-of select="title"/>",
"<xsl:value-of select="url"/>", "<xsl:value-of select="time"/>");
</xsl:template>
</xsl:stylesheet>
This stylesheet is simple : when a story tag is encountered, an INSERT
command is created with the values of the title, url and time elements.
The IGNORE option tells MySQL not to ouput any error messages when it refuses
to insert already stored stories.
Since a XSLT engine is initially designed to transform an XML format into another XML format, Sablotron inserts a line at the top of its output indicating the XML version and the encoding scheme. Since this line is not a valid SQL command, MySQL will reject it. The solution is to add the line <xsl:output omit-xml-declaration="yes"/> to the XSL stylesheet, inside the <xsl:stylesheet> node. The stylesheet then becomes:
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output omit-xml-declaration="yes"/> <xsl:template match="/"> <xsl:apply-templates select="backslash/story"/> </xsl:template><xsl:template match="backslash/story">insert ignore into stories (story_title, story_url, story_date) values ("<xsl:value-of select="title"/>", "<xsl:value-of select="url"/>", "<xsl:value-of select="time"/>");
</xsl:stylesheet>
You could also use (and that is the first thing I actually did) the
sed Unix program (Stream EDitor) to remove this line:
sabcmd slashtosql.xsl slashdot.xml | sed -e 's/<.*>//g'
The final step is to execute the SQL commands within MySQL. This can
be done by piping the output of sabcmd into mysql.
sabcmd slashtosql.xsl slashdot.xml | mysql newsdb
<?class phpSQLProcessor {
/** constructor **/
function phpSQLProcessor() {
$this->doc_attrs = array();
}
/** connecting to a MySQL database and selecting a datasbase **/
function connect($host, $user, $pswd, $base) {
mysql_connect($host, $user, $pswd);
mysql_select_db($base);
}
/** setting the doc element **/
function set_doc_element($str) {
$this->doc_element = $str;
}
/** adding an attribute (name and value) to the doc element **/
function add_doc_attr($attr, $value) {
$this->doc_attrs["$attr"] = $value;
}
/** setting the row element **/
function set_row_element($str) {
$this->row_element = $str;
}
/** setting the SQL query **/
function set_query($query) {
$this->query = $query;
}
/** executing the query **/
function execute() {
$this->results = mysql_query($this->query);
if (!$this->results)
return false;
else
return true;
}
/** returns the XML data resulting from the SQL query **/
function get_xml() {
$rowset = (!empty($this->doc_element)?$this->doc_element:"ROWSET");
$row = (!empty($this->row_element)?$this->row_element:"ROW");
$xml .= "<?xml version=\"1.0\"?>\n";
$xml .= "<$rowset";
// adding doc attributes
reset($this->doc_attrs);
while (list($key, $val) = each($this->doc_attrs)) {
$xml .= " $key=\"$val\"";
}
$xml .= ">\n";
while ($tab = mysql_fetch_array($this->results, MYSQL_ASSOC))
{
$xml .= " <$row>\n";
reset($tab);
while (list($key, $val) = each($tab)) {
$xml .= " <$key>";
$xml .= $val;
$xml .= "</$key>\n";
}
$xml .= " </$row>\n";
}
$xml .= "</$rowset>\n";
return $xml;
}
/** get the errors as XML **/
function get_error() {
$xmlerr .= "<?xml version=\"1.0\"?>\n";
$xmlerr .= "<error>" . mysql_error() . "</error>\n";
return $xmlerr;
}
}
Now let's use this class:
include("phpsqlprocessor.inc.php");/** create a new processor **/
/** connect to localhost, and select the "test" database
$p->connect("localhost", "olly", "", "test");
$p->set_doc_element("stories");
$p->add_doc_attr("source", "slashdot");
$p->set_row_element("story");
$p->set_query("select * from stories");
if (!$p->execute()) {
print $p->get_error();
exit();
}
print $p->get_xml();
<slashdot>
$parser = xslt_create(); if (!$parser) print xslt_error();$xsl = join ('', file ('./myslashdot.xsl'));
$h = xslt_process($xsl, $xml, $result);
//print xslt_error();
print $result;
xslt_free($parser);
?>
If nothing happens, the xslt_error() function can be used to understand
where the error may come from (from a syntax error in the XSL stylesheet
for example).
<linuxtoday> <story> <title>LinuxProgramming: Blackdown releases Java3D 1.2.1_01 for Linux</title> <url>http://linuxtoday.com/news_story.php3?ltsn=2001-05-27-006-20-NW-SW</url> <time>May 27, 2001, 16:26:41</time> <author>mhall</author> <topic>News,Software</topic> <comments>0</comments> </story> <story> <title>LinuxProgramming: International PHP conference 2001 Call for Papers</title> <url>http://linuxtoday.com/news_story.php3?ltsn=2001-05-27-006-20-PR-CY-SW</url> <time>May 27, 2001, 16:00:42</time> <author>mhall</author> <topic>Press Releases,Community,Software</topic> <comments>0</comments> </story> . . . </linuxtoday>We won't need the date..
These news are not exactly in the right format, so we need to transform them a little bit. Here is the XSL stylesheet to do so:
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output omit-xml-declaration="yes" indent="yes"/><xsl:template match="/">
<xsl:template match="linuxtoday/story">
<story_title><xsl:value-of select="title"/></story_title>
<story_url><xsl:value-of select="url"/></story_url>
</xsl:template>
</xsl:stylesheet>
Ressources:
http://www.gingerall.com
Détail de l'élement <xsl:output/> http://www.w3schools.com/xsl/el_output.asp
http://www.linux-mag.com/cgi-bin/printer.pl?issue=2001-03&article=mysql
Linux-Magazine Installing, Configuring, and Using MySQL (the next killer
app)