|
XML with Ruby |
|
|
Author: Published: |
|
|
XML is a well documented, widely used, and well supported format for encoding data. It is heavily used on the web and with languages such as Java. XML format is stable and well documented. The downside is it is rather verbose but that is also what makes it readable by humans as well as machines.The wide support in almost every language for XML makes it extremely portable. Until recently using XML with Ruby was not a high performance option. The two most common and highest performing Ruby Gems were Nokogiri and LibXML. Both provide parsing to Ruby Object representation of an XML document as well as a SAX like stream parser. The Ox gem was created to address the need for a more optimized XML parser so that the advantages of XML could be made available in Ruby without suffering the a performance impact. The results of performance tests between Nokogiri, LibXML, and Ox were performed under various conditions. Results along with the differences in the way the APIs are used are as follows. In Memory It is often easier to deal with a single large XML Object when dealing with XML. One mode supported by all three Ruby XML gems is parsing either a String or a file into an in-memory representation of the parsed document. The first performance tests were done against these APIs for both parsing and dumping back to XML. XML Document Parsing to an Object def node_to_dict(element) doc = Nokogiri::XML::Document.parse(xml) doc = LibXML::XML::Document.string(xml) doc = Ox.parse(xml) |
|||||||||||||
|
Test results taken from perf_gen.rb in the Ox project on GitHub are: Parsing 1000 times with Ox took 1.895335 seconds. Nokogiri parse 1000 times took 3.555163 seconds. LibXML parse 1000 times took 3.668447 seconds. >>> Ox is 1.9 faster than Nokogiri parsing. >>> Ox is 1.9 faster than LibXML parsing. |
Parsing Results
|
||||||||||||
|
XML Document to XML String xml = doc.to_xml(:indent => 2) # Nokogiri xml = doc.to_s() # LibXML xml = Ox.dump(doc, :indent => 2) # Ox |
|||||||||||||
|
Test results taken from perf_gen.rb in the Ox project on GitHub are: Ox dumping 1000 times with ox took 0.333532 seconds. Nokogiri to_xml 1000 times took 7.036567 seconds. LibML to_s 1000 times took 0.668848 seconds. >>> Ox is 21.1 faster than Nokkgiri to_xml. >>> Ox is 2.0 faster than LibXML to_xml. |
To XML Results
|
||||||||||||
|
Summary
Ox was clearly faster parsing and writing XML when compared to Nokogiri and LibXML. Nokogiri was exceptionally poor at writing the XML document. Nokogiri and LibXML provide more functionality outside of parsing and writing with XPath support and other ancillary features. For raw performance Ox was far better. Sax Parsing For large XML documents or for streaming IO a SAX like callback parser is more appropriate. Again, all three Ruby XML gems support SAX like parsing. Two test modes were employed for SAX parsing. Since the whole idea behind a callback parser is to only process the parts of the document of interest to the application a minimal validation parse was tested first followed by a more comprehensive test will stubs for all callbacks. Validation Sax Parsing Nokogiri class NoSax < Nokogiri::XML::SAX::Document def error(message); puts message; end def warning(message); puts message; end end handler = Nokogiri::XML::SAX::Parser.new(NoSax.new()) start = Time.now $iter.times do input = StringIO.new($xml_str) handler.parse(input) input.close end $no_time = Time.now - start LibXML class LxSax include LibXML::XML::SaxParser::Callbacks end start = Time.now $iter.times do input = StringIO.new($xml_str) parser = LibXML::XML::SaxParser.io(input) parser.callbacks = $all_cbs ? LxAllSax.new() : LxSax.new() parser.parse input.close end $lx_time = Time.now - start Ox class OxSax < ::Ox::Sax def error(message, line, column); puts message; end end start = Time.now handler = OxSax.new() $iter.times do input = StringIO.new($xml_str) Ox.sax_parse(handler, input) input.close end $ox_time = Time.now - start |
|||||||||||||
|
Test results taken from perf_sax.rb in the Ox project on GitHub are: A 1000 KByte XML file was parsed 100 times for this test. File IO SAX parsing 100 times with Ox took 0.369009 seconds. File IO SAX parsing 100 times with Nokogiri took 14.637185 seconds. File IO SAX parsing 100 times with LibXML took 4.913712 seconds. >>> Ox is 39.7 faster than Nokogiri SAX parsing using file IO. >>> Ox is 13.3 faster than LibXML SAX parsing using file IO. |
SAX Validate Results
|
||||||||||||
|
In the comprehensive callback tests with all callback methods defined the results for each are: |
|||||||||||||
|
A 1000 KByte XML file was parsed 100 times for this test. File IO SAX parsing 100 times with Ox took 1.456263 seconds. File IO SAX parsing 100 times with Nokogiri took 14.670855 seconds. File IO SAX parsing 100 times with LibXML took 5.272377 seconds. >>> Ox is 10.1 faster than Nokogiri SAX parsing using file IO. >>> Ox is 3.6 faster than LibXML SAX parsing using file IO. |
SAX All CBs Results
|
||||||||||||
|
Summary
Ox was significantly faster than Nokogiri and LibXML. The tests highlight another factor that comes into play with Ox. Neither LibXML or Nokogiri gain any performance advantage in ignoring uninteresting parts of an XML document. Comments for example are always processed. With Ox, only the parts of the XML document that are of interest have to be processed in Ruby. This made big difference in performance between the validation SAX parsing and the comprehensive SAX parsing tests. Even with a full set of callbacks Ox is still many times faster than both Nokogiri and LibXML. Note: Tests were run on an 2.8 GHz iMac with a Core i7 CPU and Mac OS X 10.6.8. |