Working With XML, jQuery, and JavaScript
Sunday, August 29, 2010
A recent requirement of mine was to convert large CSV files to XML, and then build an accompanying interface to update & manage the data. jQuery was most useful for traversing & manipulating the large DOM's, but I did run across a few gotcha's worthy of discussion. There doesn't seem to be much info on the interwebs about XML and jQuery outside of an XHR request, so here's what I picked up along the way.
You cannot pass an XML string to jQuery's constructor
$($.parseXML('<xml></xml>'))
Well _technically _you can, but it's not supported and jQuery will not treat the string as XML. You'll still be working with an HTML document object. jQuery has no idea/doesn't care that the elements are not valid HTML elements though, so this might be just fine as long as your XML remains detached from the DOM. To feed XML to jQuery correctly, you must pass a parsed XML string (a la XML object) to jQuery's constructor.
A new feature ticket exists to add cross-browser XML compatibility to jQuery, whose summary yields another good nugget of info:
As documented, the $("<html string>") method only supports parsing HTML strings, but users often try to parse XML strings with it. This doesn't work since jQuery uses the .innerHTML property for parsing and implicitly enforces HTML rules on the string. IE in particular will throw errors or misparse XML strings passed to $() in this way. Furthermore, the second you try to pass any sizable XML or HTML string to jQuery's constructor, Firefox 3 is going to fail with a "script stack space quota is exhausted" error (see the bug report filed with Mozilla).
As Dave Methuin (core team member) explains:
Firefox seems to be dying on the constructor check for HTML, which uses a RegExp?:
match = quickExpr.exec( selector );
The string you're passing in is long and html-ish, and I suspect Firefox is backtracking heavily trying to match the huge string. We can either assume a long (>1K chars) string must be HTML or come up with a RegExp? that doesn't need to backtrack so much. So how do you convert a string to an XML object? It depends on your browser.
Parse an XML string into a DOM
The following only applies if you're manually parsing an string. If using XMLHttpRequest the browser does all this automatically - just use the responseXML property.
$.parseXML() in jQuery 1.5+.
To re-iterate, jQuery alone is not capable of parsing XML strings. You will need to leverage the DOM parsing method baked into the browser, which all support in one form another. Firefox, Chrome, and Safari have a DOMParser object, and IE uses their proprietary ActiveX object. To create a cross-browser solution:
This snippet checks for the absence ofDOMParser in the window scope, and if true, defines it. If the browser has an ActiveXObject (IE), the MLXML.DomDocument method is used to convert the string to an DOM. If ActiveXObject isn't present but XMLHttpRequest is (fail-safe), a pseudo AJAX request is created and the parsed string is returned in the responseXML property of XMLHttpRequest.
Now that we have a normalized DOMParser, parsing XML is easy:
Converting an XML DOM back to a string
Again, IE has a different implementation than everyone else, but it's still trivial to normalize. Firefox/Chrome/Safari use the XMLSerializer.serializeToString method, whereas IE has an "xml" property on the XML object.
$.data() is unreliable
First a note on how $.data actually works. When jQuery is loaded, $.expando is set to the current timestamp prefixed with "jQuery". Expando becomes the name of a property set on elements when data is added via $.data/$.fn.data. The expando's value is a UUID and acts as jQuery's link back to the proper store in $.cache - where the data is actually stored. Each key in the $.cache object is the UUID.
That said, the ActiveX object that IE uses for XML parsing has issue with jQuery trying to set expando properties on elements. And instead of working around this issue, it appears that the solution will be to document the fact that $.data doesn't work cross-browser on XML documents.
Therefore, avoid setting data on XML documents.
getElementById will always return null
If part of your performance plan involves setting id attributes to leverage the speed of getElementById, think again. getElementById does not work on XML documents.
As documented in Mozilla's MDC:
DOM implementation must have information that says which attributes are of type ID. Attributes with the name "id" are not of type ID unless so defined in the document's DTD. The id attribute is defined to be of ID type in the common cases of XHTML, XUL, and other. Implementations that do not know whether attributes are of type ID or not are expected to return null. "ID" is a data type, and in an XML document, the browser has no clue that an attribute named "id" should be interpreted as such. The same applies for the class attribute and
getElementsByClassName, but have no fear;getElementsByTagName(andquerySelector/querySelectorAllin modern browsers) will continue to work as expected.
jQuery's Sizzle (and probably all other selector engines) account for this, so $("#id") will continue to work as expected. Just don't expect the same performance as getElementById.
The only benefit I see to passing raw XML to jQuery(), without parsing it into an XML DOM first, is the fact that getElementById will continue to work. I still wouldn't recommend it though.
$.isXMLDoc()
This little utility method exists for testing whether or not a DOM node is within an XML document:
Useful when you want to be certain you're correctly passing an XML DOM to jQuery.
No XML love in WebWorkers
Initially, part of my performance strategy in this project was to leverage WebWorkers to avoid locking up the main UI thread. After consulting the brilliance at Bocoup, Rick Waldron noted that XHR objects used within workers will only return responseText, and never responseXML. This is by design as responseXML is not considered thread safe. The DOMparser object is not available inside workers either, so unless you want to roll your own XML parser (hint: you don't) do not try to request or parse XML.
References
Most of these links are scattered throughout this post, but if you want them all in one place:
- jQuery ticket for "Error 'script stack space quota is exhausted'" error http://dev.jquery.com/ticket/6796
- Mozilla ticket for "Error 'script stack space quota is exhausted'" https://bugzilla.mozilla.org/show_bug.cgi?id=579656
- jQuery ticket for $.data fails on XML documents in IE http://dev.jquery.com/ticket/6890
- MDC DOMParser Object https://developer.mozilla.org/en/DOMParser
- MDC XMLSerializer Object https://developer.mozilla.org/en/XMLSerializer
- MDC getElementById https://developer.mozilla.org/en/document.getElementById
- jQuery's constructor API documentation http://api.jquery.com/jQuery/
- $.isXMLDoc() API documentation http://api.jquery.com/jQuery.isXMLDoc/
- Add cross-browser XML support to jQuery - feature enhancement ticket http://dev.jquery.com/ticket/6693 If you know of any other XML gotcha's, please leave them in the comments below!