Answering Key Questions
<< Choosing the JDK
XML Parsing Benchmark
Use SAX for low-level XML processing, DOM when you need the whole document in memory and mix these two APIs in any other case.
The SAX parse API is very fast, but difficult to use. The DOM scalability is limited by the computer's memory, while SAX and SAXDOMIX can process arbitrarily large documents.
The mixing of SAX and DOM is the right compromise in many cases. See the introduction of the SAXDOMIX framework.
Use Crimson for SAX parse and Xerces for DOM parse. For mixed SAX-DOM parse, application-specific testing could be performed in order to choose the right parser.
SAX parse, mixed SAX-DOM parse and DTD-based validation are very fast with Crimson.
DOM parse is surprisingly slow with Crimson (the JAXP 1.1 reference implementation).
Xerces builds DOM trees two times faster than Crimson. Xerces also needs less memory, despite its rich set of features.
Disable the validation and the namespace support when you don't need them.
The namespaces and validation have little effect on the SAX performance in the Crimson's case, while the validation with Xerces has become slower and slower with each new version.
Xerces performs the DOM parse much better than Crimson, whether the validation and the namespace support are enabled or not.
Strangely, the mixed SAX-DOM parse is faster with Crimson when the validation is enabled. This isn't true for SAX-only or DOM-only parse with Crimson.
<< Choosing the JDK
XML Parsing Benchmark
|