Tidbits

My posts have been infrequent, in part because I've been working on lots of things to talk about. I'm in the last stages of putting together info on creating NSStatusItems (tools which show up in the menu bar across all applications in OS X) in PyObjC. I've also got some cool Quicktime and iSight tools coming soon. And I've renamed ZenPaint to DrawingBoard, but it's working and just waiting for a little GUI cleanup before I post the first binary and source.

Two of my back-burner projects, better blue-screening, and easy lightsabre effects, have been done by others recently. Inspired by the same BoingBoing piece on rotoscoping your own lightsabres as I was, but Naked Software actually sat down and wrote the code. It's pretty slick, too. For blue-screen effects (and many more), check out Sam Kass' Quartz Composer Compositions. Very neat stuff, Tiger-only though [Update: now Leopard]. Some of the compositions require a newer system with a higher-end video card than my three-year-old PowerBook.

But to be honest, the real point of this post is not to tease with coming attractions, but to point out my first paid publication. My friend David Mertz asked me to collaborate with him on his XML Matters column for IBM developerWorks, and my first column went live last Friday: Beyond the DOM.

I've wanted to be a writer for as long as I can remember, with poetry notebooks and 200 pages of a novel gathering dust on my bookshelves, so finally getting around to finishing something and having it published leaves me pleased as punch. And more will be forthcoming.

DOM back to XML in Python

OK, time to crank up to speed, it's been a lot longer than I intended between posts.

In the last episode we learned how to initialize many of the DOMs and DOM-like tools for Python from an XML document. Today we're going to see how to convert these back to XML from the DOM. So fasten your seatbelts and let's go.

import domParse
from xml.dom.ext.c14n import Canonicalize
def stringMinidom(filename):
    return Canonicalize(domParse.parseMinidom(filename))

def string4Dom(filename):
    return Canonicalize(domParse.parse4Dom(filename))

def stringDomlette(filename):
    return Canonicalize(domParse.parseDomlette(filename))

def stringLibXml(filename):
    # pretty-printed, which may not be what you want,
    # depending on the XML in question
    return domParse.parseLibXml(filename).serialize(encoding='utf-8', format=True)
    # 4DOM c14n breaks because libXML doesn't give you a DOM
    # return Canonicalize(domParse.parseLibXml(filename))

def stringPxDom(filename):
    import pxdom
    serializer = pxdom.LSSerializer()
    return serializer.writeToString(domParse.parsePxDom(filename))
    #return Canonicalize(domParse.parsePxDom(filename))

def main(filename):
    print '4DOM:', string4Dom(filename)
    print 'Domlette:', stringDomlette(filename)
    print 'MiniDom:', stringMinidom(filename)
    print 'LibXml:', stringLibXml(filename)
    print 'PxDom:', stringPxDom(filename)
if __name__ == '__main__': main(domParse.small_filename)

As you can see, there's not much to it. This codes does require that you've installed the PyXML package, but if you're serious about XML in Python, that will already be the case. In our next outing we can explore some of the less DOM-like, but more Pythonic ways to play with XML.

PyXML: http://pyxml.sourceforge.net/

4Suite: http://4suite.org/index.xhtml

libxml: http://www.xmlsoft.org/ (instructions for the python bindings are linked from this page)

pxdom: http://www.doxdesk.com/software/py/pxdom.html

You may now return your trays to their upright positions.

Initializing a DOM in Python

There are many DOM options in Python, and I have trouble remembering how to load a document into the various DOMs. Here are a few common ones, although there are many variations on them (loading from URL or string, different configurations, etc.). This should provide a starting point.

# Examples for reading in various DOMs from an XML file
# MiniDOM
def parseMinidom(filename):
    try:
        from xml.dom.minidom import parse
        doc = parse(filename)
        return doc
    except Exception, e:
        return 'parseMinidom() failed with exception %s' % e
# 4DOM
def parse4Dom(filename):
    try:
        from xml.dom.ext.reader.Sax2 import Reader
        f = file(filename)
        reader = Reader(validate=0, keepAllWs=0, catName=None)
        doc = reader.fromStream(f) # slow!
        f.close()
        return doc
    except Exception, e:
        return 'parse4Dom() failed with exception %s' % e
# Domlette
def parseDomlette(filename):
    try:
        from Ft.Xml.Domlette import NonvalidatingReader as reader
        f = file(filename)
        uri = 'file:///%s' % filename # suppress warning
        doc = reader.parseStream(f, uri)
        f.close()
        return doc
    except Exception, e:
        return 'parseDomlette() failed with exception %s' % e
# libXml
def parseLibXml(filename):
    try:
        import libxml2
        f = file(filename)
        data = f.read()
        f.close()
        doc = libxml2.parseDoc(data)
        return doc
    except Exception, e:
        return 'parseLibXml() failed with exception %s' % e
# pxDom
def parsePxDom(filename):
    try:
        import pxdom
        doc = pxdom.parse(filename)
        return doc
    except Exception, e:
        return 'parsePxDom() failed with exception %s' % e
def main():
    import sys
    filename = sys.argv[1]
    print '4DOM:', parse4Dom(filename)
    print 'Domlette:', parseDomlette(filename)
    print 'MiniDom:', parseMinidom(filename)
    print 'LibXml:', parseLibXml(filename)
    print 'PxDom:', parsePxDom(filename)
if __name__ == '__main__': main()