JSP Tag Library for Fast and Easy XML Parsing
The following JSP examples process XML documents using
- Devsphere XML JSP Tag Library (XJTL)
- Apache/Sun JSP Standard Tag Library (JSTL)
XJTL supports the SAX and DOM standards hiding their complexities from the JSP developer. JSTL supports DOM, XPath and XSLT. The XML processing methods of XJTL and JSTL are different, but this doesn't mean that you can't use these two JSP tag libraries together since they complement each other.
The JSTLParse.jsp , XJTLParse.jsp and XJTLParse2.jsp examples process the sample.xml file whose content is listed below:
<?xml version="1.0" encoding="UTF-8"?>
<person id='js890'>
<name>John Smith</name>
<email>John.Smith@yahoo.com</email>
<phone>650-123-4567</phone>
<address city='Palo Alto' state='CA' zip='94303' country='USA'>
<line1>JS Information Systems, Inc.</line1>
<line2>1001 San Antonio Road</line2>
</address>
</person>
The JSTLTable.jsp , XJTLTable.jsp and XJTLTable2.jsp examples process the big.xml file that contains 10,000 XML fragments with the same structure as sample.xml .
<?xml version='1.0' encoding='US-ASCII'?>
<database>
<person id='000000'>
...
</person>
<person id='000001'>
...
</person>
<person id='000002'>
...
</person>
....................
<person id='009999'>
...
</person>
</database>
We use "c", "x" and "p" as prefixes for the actions of JSTL Core, JSTL XML and XJTL Process, respectively:
<%@ taglib prefix="c" uri="http://java.sun.com/jstl/core" %>
<%@ taglib prefix="x" uri="http://java.sun.com/jstl/xml" %>
<%@ taglib prefix="p" uri="http://devsphere.com/xml/taglib/process" %>
JSTLParse.jsp
http://localhost:8080/xjtldemo/process/JSTLParse.jsp
This JSTL example parses an XML file called sample.xml and queries the obtained DOM tree with XPath.
<c:import var="xml" url="sample.xml"/>
<x:parse varDom="dom" xml=""/>
<x:set var="attrPersonId" select="string($dom/person/@id)"/>
<x:set var="dataName" select="string($dom/person/name/text())"/>
<x:set var="dataEmail" select="string($dom/person/email/text())"/>
<x:set var="dataPhone" select="string($dom/person/phone/text())"/>
<x:set var="attrAddressCity" select="string($dom/person/address/@city)"/>
<x:set var="attrAddressState" select="string($dom/person/address/@state)"/>
<x:set var="attrAddressZip" select="string($dom/person/address/@zip)"/>
<x:set var="attrAddressCountry" select="string($dom/person/address/@country)"/>
<x:set var="dataAddressLine1" select="string($dom/person/address/line1/text())"/>
<x:set var="dataAddressLine2" select="string($dom/person/address/line2/text())"/>
The <c:import> action loads the sample.xml file and stores its content into a JSP variable called xml . The <x:parse> action parses the XML content and creates a new JSP variable called dom that holds the resulted DOM tree.
There are 10 pieces of information maintained in the DOM tree as attribute values or character data. Each such information can be identified by an XPath and stored in a JSP variable with the help of the <x:set> action. For example, the "string($dom/person/name/text())" XPath identifies the name of the person whose contact information in stored in our XML document. The <x:set> action stores the person's name into a JSP variable called dataName .
XJTLParse.jsp
http://localhost:8080/xjtldemo/process/XJTLParse.jsp
This is the XJTL version of the previous JSTL example. It doesn't create a DOM tree in memory and it does not need XPath to identify the information from the XML document. It uses instead the very efficient SAX parsing method internally and it creates the JSP variables on the fly. This is the XML content that has to be parsed:
<?xml version="1.0" encoding="UTF-8"?>
<person id='js890'>
<name>John Smith</name>
<email>John.Smith@yahoo.com</email>
<phone>650-123-4567</phone>
<address city='Palo Alto' state='CA' zip='94303' country='USA'>
<line1>JS Information Systems, Inc.</line1>
<line2>1001 San Antonio Road</line2>
</address>
</person>
The JSP code uses the XJTL actions to process the above XML content.
<p:parse systemId="sample.xml" ignoreSpaces="true">
<p:element varAttr="attrPerson" testName="person">
<p:element testName="name"><p:data varData="dataName"/></p:element>
<p:element testName="email"><p:data varData="dataEmail"/></p:element>
<p:element testName="phone"><p:data varData="dataPhone"/></p:element>
<p:element varAttr="attrAddress" testName="address">
<p:element testName="line1"><p:data varData="dataAddressLine1"/></p:element>
<p:element testName="line2"><p:data varData="dataAddressLine2"/></p:element>
</p:element>
</p:element>
</p:parse>
The <p:parse> action of XJTL starts the parsing of the XML document using SAX and it iterates over the SAX events generated by the XML parser. Those SAX events are intercepted for processing by the <p:element> and <p:data> actions in the body of <p:parse> .
The ignoreSpaces flag is set to true in order to ignore the irrelevant indenting spaces between the XML elements. This feature of the <p:parse> action can be used to speed the parsing.
There is one <p:element> action for each element of the XML document. Those XJTL actions form a tree structure similar to the hierarchy of the XML elements. The body of a <p:element> action is evaluated only if its testName attribute has the same value as the name of the current element of the SAX parsing. For example, <p:element testName="email"> is executed only for the <email> XML element.
The <p:element> action has additional attributes named testQname and testUri that can be used to test the qualified name and the namespace URI of an XML element. If these tests aren't enough, you may use the test attribute that accepts an arbitrary conditional expression just like the <c:if> action of JSTL.
When an XML element has attributes, the corresponding <p:element> action creates a JSP variable that holds a java.util.Map object containing the name-value attribute pairs. The default name of this JSP variable is attr and it can be changed by adding a varAttr attribute to the <p:element> action. For example, the <p:element> that processes the <address> element has varAttr="attrAddress" . The attributes of the XML element are accessed later in the JSP page using JSTL:
<BR>City: <c:out value=""/>
<BR>State: <c:out value=""/>
<BR>ZIP: <c:out value=""/>
<BR>Country: <c:out value=""/>
The <p:element> also exports variables that hold the local name, qualified name and namespace URI of the processed XML element, but these aren't used in this example. Also, a very useful feature of XJTL is the ability to export a DOM tree in the middle of the SAX-based processing. The XJTLTable2.jsp example will show how this works.
When an XML element wraps some character data, the corresponding <p:element> contains a <p:data> action, which exports a JSP variable holding the XML data. The default name of this JSP variable is data and it can be changed by adding a varData attribute to the <p:data> action. For example, the <p:element> that processes the <phone> element contains a <p:data> action that has varData="dataPhone" . The character data from the XML element is accessed later in the JSP page using JSTL:
<BR>Phone: <c:out value=""/>
XJTLParse2.jsp
http://localhost:8080/xjtldemo/process/XJTLParse2.jsp
The previous example shows how easy is to parse an XML document and retain its information into JSP variables. You can use the XJTL actions (<p:element> and <p:data> ) to create JSP variables that hold the attribute values and the character data. XJTL also has two other actions called <p:start> and <p:end> whose bodies are evaluated when the processing of their parent <p:element> starts and ends.
As mentioned earlier XJTL uses the SAX parsing method internally. The <p:parse> action iterates over the SAX events generated between a startDocument() event and an endDocument() event. The <p:element> action iterates over the SAX events generated between a startElement() event and an endElement() event. Multiple events can be consumed in a single iteration by the sub-actions contained by <p:parse> and <p:element> . You should always keep in mind that these are iteration tags and their body may be evaluated multiple times.
The <p:start> and <p:end> actions can be included within <p:parse> to handle the startDocument() and endDocument() SAX events. The body of <p:parse> may be evaluated multiple times, but its <p:start> and <p:end> sub-actions will be executed only once when the parsing starts and ends, respectively.
The <p:start> and <p:end> actions can also be included within <p:element> to handle the startElement() and endElement() SAX events. You'll normally use <p:start> if you want to process the attributes of an element right after they become available as JSP variables. The <p:end> action is executed after the entire content of an XML element was processed and you have access to all JSP variables created by the sub-actions of the parent <p:element> .
XJTL has default names for the JSP variables that hold the local name, qualified name, namespace URI, attributes and character data of an element. These default names are: name, qname, uri, attr and data . If you want to process the attributes and data of multiple elements as in XJTLParse.jsp , you have to provide your own names for these variables. Otherwise, they overwrite each other and the old values are lost. The XJTLParse.jsp example has unique variable names for each element: attrPerson, dataName, dataEmail, dataPhone, attrAddress, dataAddressLine1, dataAddressLine2 .
The following JSP fragment doesn't provide its own names for the JSP variables, but it handles all SAX events. Each <p:start> , <p:end> and <p:data> action contains a comment that specifies the name of the handled SAX event followed by the JSP variables that are created.
<p:parse systemId="sample.xml" ignoreSpaces="true">
<p:start> <%-- startDocument --%> </p:start>
<p:element testName="person">
<p:start> <%-- startElement name="person", attr.id="js890" --%> </p:start>
<p:element testName="name">
<p:start> <%-- startElement name="name" --%> </p:start>
<p:data> <%-- characters data="John Smith" --%> </p:data>
<p:end> <%-- endElement name="name" --%> </p:end>
</p:element>
<p:element testName="email">
<p:start> <%-- startElement name="email" --%> </p:start>
<p:data> <%-- characters data="John.Smith@yahoo.com" --%> </p:data>
<p:end> <%-- endElement name="email" --%> </p:end>
</p:element>
<p:element testName="phone">
<p:start> <%-- startElement name="phone" --%> </p:start>
<p:data> <%-- characters data="650-123-4567" --%> </p:data>
<p:end> <%-- endElement name="phone" --%> </p:end>
</p:element>
<p:element testName="address">
<p:start> <%-- startElement name="address", attr.city="Palo Alto",
attr.state="CA", attr.zip="94303", attr.country="USA" --%> </p:start>
<p:element testName="line1">
<p:start> <%-- startElement name="line1" --%> </p:start>
<p:data> <%-- characters data="JS Information Systems, Inc." --%> </p:data>
<p:end> <%-- endElement name="line1" --%> </p:end>
</p:element>
<p:element testName="line2">
<p:start> <%-- startElement name="line2" --%> </p:start>
<p:data> <%-- characters data="1001 San Antonio Road" --%> </p:data>
<p:end> <%-- endElement name="line2" --%> </p:end>
</p:element>
<p:end> <%-- endElement name="address" --%> </p:end>
</p:element>
<p:end> <%-- endElement name="person" --%> </p:end>
</p:element>
<p:end> <%-- endDocument --%> </p:end>
</p:parse>
Usually, only some of the SAX events must to be handled. XJTL allows you to ignore the events that you don't need. The structure of the XJTLParse2.jsp example looks like this:
<p:parse systemId="sample.xml" ignoreSpaces="true">
<p:start> ... </p:start>
<p:element testName="person">
<p:start> ... </p:start>
<p:element testName="name">
<p:data> ... </p:data>
</p:element>
<p:element testName="email">
<p:data> ... </p:data>
</p:element>
<p:element testName="phone">
<p:data> ... </p:data>
</p:element>
<p:element testName="address">
<p:start> ... </p:start>
<p:element testName="line1">
<p:data> ... </p:data>
</p:element>
<p:element testName="line2">
<p:data> ... </p:data>
</p:element>
</p:element>
</p:element>
<p:end> ... </p:end>
</p:parse>
JSTLTable.jsp
http://localhost:8080/xjtldemo/process/JSTLTable.jsp
There are cases when a relational or object database is exported using an XML format and the resulted document must be processed. This happens, for example, when different application based on incompatible technologies must be integrated somehow. XML is a good choice for solving this type of problems because it is language and platform neutral.
<c:import var="xml" url="big.xml"/>
<x:parse varDom="dom" xml=""/>
<x:forEach var="person" select="$dom/database/person">
<x:set var="attrPersonId" select="string($person/@id)"/>
<x:set var="dataName" select="string($person/name/text())"/>
<x:set var="dataEmail" select="string($person/email/text())"/>
<x:set var="dataPhone" select="string($person/phone/text())"/>
<x:set var="attrAddressCity" select="string($person/address/@city)"/>
<x:set var="attrAddressState" select="string($person/address/@state)"/>
<x:set var="attrAddressZip" select="string($person/address/@zip)"/>
<x:set var="attrAddressCountry" select="string($person/address/@country)"/>
<x:set var="dataAddressLine1" select="string($person/address/line1/text())"/>
<x:set var="dataAddressLine2" select="string($person/address/line2/text())"/>
...
</x:forEach>
The JSTLTable.jsp example is similar to JSTLParse.jsp , but it processes a file containing 10,000 XML "fragments" with the same structure. If we want to use JSTL, we have to parse this large XML document and create a DOM structure that takes a lot of memory. Then we iterate over the XML fragments using the <x:forEach> action of JSTL.
Obviously, this is not a scalable solution. No matter how much memory we have, we can get an XML document that won't fit into the computer's memory as a DOM tree. Then the application will crash throwing an OutOfMemoryError . The XJTLTable.jsp example will show how to solve the scalability issue.
XJTLTable.jsp
http://localhost:8080/xjtldemo/process/XJTLTable.jsp
The XJTLTable.jsp example solves the scalability problem described earlier and it processes the same big.xml file as JSTLTable.jsp .
<?xml version='1.0' encoding='US-ASCII'?>
<database>
<person id='000000'>
...
</person>
<person id='000001'>
...
</person>
<person id='000002'>
...
</person>
....................
<person id='009999'>
...
</person>
</database>
Processing 10,000 XML fragments with XJTL is as simple and efficient as processing only one small piece of XML. The XJTL actions use the SAX parsing method internally and no data structures like DOM are created in memory.
<p:parse systemId="big.xml" ignoreSpaces="true">
<p:element testName="database">
<p:element varAttr="attrPerson" testName="person">
<p:element testName="name"><p:data varData="dataName"/></p:element>
<p:element testName="email"><p:data varData="dataEmail"/></p:element>
<p:element testName="phone"><p:data varData="dataPhone"/></p:element>
<p:element varAttr="attrAddress" testName="address">
<p:element testName="line1"><p:data varData="dataAddressLine1"/></p:element>
<p:element testName="line2"><p:data varData="dataAddressLine2"/></p:element>
</p:element>
<p:end>
...
</p:end>
</p:element>
</p:element>
</p:parse>
The first <p:element> action of XJTLTable.jsp will handle the root element of the XML element, which is <database> . All SAX events that are generated between the <database> start tag and the </database> end tag will be handled inside the first <p:element> action, which iterates over those SAX events. Therefore, the second <p:element> action will handle all <person> elements that start the XML fragments. Inside the second <p:element> action, there are other XJTL actions that handle the sub-elements of <person> , and so on.
The <p:element> and <p:data> actions of XJTL export JSP variables that contain the information of each XML fragment. Those JSP variables can be used within the <p:end> action that is executed after each fragment is parsed. After processing the information of an XML fragment, the XJTL actions are executed again and the JSP variables are overwritten with the new information from the next XML fragment. The old values of the JSP variables become available for the garbage collector, which will free the memory occupied by the already processed information. This allows us to process any XML document that can be split into reasonably large fragments.
XJTLTable2.jsp
http://localhost:8080/xjtldemo/process/XJTLTable2.jsp
XJTL lets you get a DOM tree at any point during the processing of the XML content. You just have to add a varDom attribute to a <p:element> action, which will use the SAX events to build a DOM structure internally, instead of looping over the SAX events. Therefore, a <p:element> action with a varDom attribute may not contain other XJTL actions. The element handled by the <p:element> action will be the root of the constructed DOM tree. All elements, character data, processing instructions and comments that are contained directly or indirectly by the root element will be part of the DOM tree.
<p:parse systemId="big.xml" ignoreSpaces="true">
<p:element testName="database">
<p:element varDom="person" testName="person">
<x:set var="attrPersonId" select="string($person/@id)"/>
<x:set var="dataName" select="string($person/name/text())"/>
<x:set var="dataEmail" select="string($person/email/text())"/>
<x:set var="dataPhone" select="string($person/phone/text())"/>
<x:set var="attrAddressCity" select="string($person/address/@city)"/>
<x:set var="attrAddressState" select="string($person/address/@state)"/>
<x:set var="attrAddressZip" select="string($person/address/@zip)"/>
<x:set var="attrAddressCountry" select="string($person/address/@country)"/>
<x:set var="dataAddressLine1" select="string($person/address/line1/text())"/>
<x:set var="dataAddressLine2" select="string($person/address/line2/text())"/>
...
</p:element>
</p:element>
</p:parse>
The XJTL support for DOM is useful, for example, when an action of some tag library requires DOM nodes as attributes. The XJTLTable2.jsp example gets each XML fragment of the big.xml file as a DOM tree that is processed with JSTL exactly like in JSTLTable.jsp . However, each DOM tree is made available to the garbage collector right after processing, since the person JSP variable that holds the DOM tree is overwritten at each iteration. With XJTL, you can use DOM without having scalability problems.
|