Sax Mapper: SQL batch inserts from XML with SAX parsing in Ruby

Sax Mapper:  SQL batch inserts from XML with SAX parsing in Ruby

As part of my work with IMS Global on educational data specifications, I’ve been working on a REST binding of the Learner Information Services (LIS) protocol, that I call Simple LIS. I wrote the reference implementation with Hpricot DOM parsing and ActiveRecord, so it had speed and memory issues when run in production.

I’ve now got a branch that runs with a SAX parser, using a gem I wrote that builds off of Paul Dix‘s SAXMachine, and adds a DataMapper layer. The gem is called SAXualReplication, and you can install it with gem install MikeSofaer-sax-mapper

Here’s an example class, from Simple LIS:

class Person
include SAXMapper
element :sourced_id, :required => true
element :given, :as => :given_name, :required => true
element :family, :as => :family_name, :required => true
element :email, :required => true

table “people”
tag :person
key_column :sourced_id

end

You can then call people = Person.parse_multiple(xml) on an XML file and you will get an array of Person objects.

Person.save people will save all the objects into the database in a single query, using DataObjects bind variables with a splatted array, to minimize wasted objects (thanks Dan Kubb for the suggestion)

The key_column method tells SAXMapper to use the value in that column as a remote primary key. You need to add a unique index on that column in your DB, and repeated values in that field will act as UPDATE not INSERT

The Hpricot version of Simple LIS uses ofer 400MB of RAM loading in 100K people objects. The SAXMapper version uses less than 80, so that’s a nice little win.

Related Posts Plugin for WordPress, Blogger...