Basic XML Parsing With Python and LXML
Recently I’ve been developing an API using python and Django for work, which uses XML responses to speak to clients. One of my goals for the client was to be able to easily parse the XML responses that the server sends, so that I could appropriately handle errors.
Fortunately, python has many tools for building and parsing XML. During my research, I tested several options, but found that the well supported library LXML was a perfect match for what I needed. Unfortunately, I had a hard time figuring this out, as examples to just parse XML content was lacking in the official tutorial, and there were no good resources online with code samples.
So let’s take a quick peek at a sample XML document, then we’ll analyze some simple LXML code to see how it works. Of course, before you can run any of these code samples, you’ll need to download and install LXML (there are packages available on most Linux systems already).
Here’s are two sample XML responses that our server may send to the clients:
<?xml version="1.0" encoding="utf-8"?>
<response version="1.0">
<code>200</code>
<id>50</id>
</response>
<?xml version="1.0" encoding="utf-8"?>
<response version="1.0">
<code>400</code>
<errors>
<error>This API call requires a HTTP POST.</error>
</errors>
</response>
The first XML document shows a successful response. The code
tag contains an
HTTP 200 OK
code, which means the operation succeeded, and the id
tag
contains an identification number which the client needs to store for later
processing.
Now, let’s quickly analyze the situation. We have an XML tag which will be
available whether the operation succeeds or fails, which is the code
tag.
So, for the client to be able to distinguish between a success or failure, it
needs to first figure out what code
was returned.
Let’s write some python-lxml
code to quickly output the value of the code
tag:
from lxml import etree
response = 'some xml response here'
try:
doc = etree.XML(response.strip())
code = doc.findtext('code')
print code
except etree.XMLSyntaxError:
print 'XML parsing error.'
In our code, we first build a XML document from our XML response string, then
use the findtext()
method (provided by LXML) to retrieve the value of the
code
tag. Run this example, play around with it, and you’ll see that you can
pull up any value you want from the XML document.
In my client code, the program flow looks something like:
from sys import exit
from lxml import etree
response = 'some xml response here'
try:
doc = etree.XML(response.strip())
code = doc.findtext('code')
print code
except etree.XMLSyntaxError:
print 'XML parsing error.'
exit(1)
if code == '200':
# store the id somewhere
id = doc.findtext('id')
else:
# handle errors, do more stuff
print doc.findtext('error')
exit(1)
exit(0)
This lets me perform error checking on the XML response to make sure that everything went smoothly. If something goes wrong during the API call, then I will know about it, and can perform other actions to fix any problems or report errors.
If you’d like to do more advanced XML parsing with LXML, read the official tutorial. It is very verbose, but contains excellent examples which clearly demonstrate how to both parse and generate XML.
So the next time you’re looking for a quick way to process XML, check out LXML.
PS: If you read this far, you might want to follow me on Bluesky or GitHub and subscribe via RSS or email below (I'll email you new articles when I publish them).