Programming Adventures: KMZ/KML file parsing with Python

Python version: 2.7.5
Source: kmz_parser.py

Brief:
This totorial describes a method for writing a python script to extracting coordinate and label information from kmz/kml files then exporting to a csv file.

1. Unzip the KMZ and extract doc.kml

Keyhole Markup Language (KMZ) files are google earth files that can contain points and lines and shapes from google earth. They are simply zipped archives. Inside they contain a plain text xml file doc.kml. To look at this file rename your .kmz to .zip, extract it and open doc.kml.

Python provides many nice built in libaries, the first we are going to use is zipfile.

from zipfile import ZipFile

filename = 'test.kmz'

kmz = ZipFile(filename, 'r')
kml = kmz.open('doc.kml', 'r')

This opens the doc.kml file as a standard file for reading. You can now parse the file.

2. Examine the KML file to determine the type of information you want and how it's stored.

IDLE (Python included editor) is a good editor for viewing kml files. For extracting the names of items and their grids we need to look at three tags, <Placemark>, <name> and <coordinates>. <Placemark> tags surround each item, inside they have a <name> and <coordinates> tag. Of note is the parser we are going to use calls tags "Elements".

3. Write a SAX handler

Simple API for XML (SAX) allows us to parse XML files. Python naturally has a built in library for this, xml.sax. To make this work we use ineritance and create our own custom xlm.sax.handler.ContentHandler class. To understand how the SAX parser will work, when we feed it our file and our ContentHandler object it will call the methods within our ContentHandler object at certian times. To make sure that it knows the names of our functions we are required to create an child class of the ContentHandler class that contains dummy methods for all these different events. By overriding the functionality of these functions we can make the parser do our work when it reaches each of these events. The documentation on the ContentHandler base class in the python documentation has the names of these methods and a description of when they are called. The ones we are interested in are these:

__init__(self)

constructor, called when the object is created

startElement(self, name, attributes)

called at start elements (i.e. '<Placemark>', and <name>, etc.)

characters(self, data)

called at text between elements

endElement(self, name)

called at end elements (i.e. '</Placemark>', and <name>, etc.)

The data we are going to capture will be stored in a nested dictionary object. Each Placemark's <name> attribute data will be a key maped to a second dictionary object. Inside this object each Element name will become a key mapped to the data contained in that Element. This will allow us to extract all the data contained within each placemark, including the 'coordinates' attribute. See code below:

import xml.sax, xml.sax.handler
class PlacemarkHandler(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.inName = False # handle XML parser events
        self.inPlacemark = False
        self.mapping = {}
        self.buffer = ""
        self.name_tag = ""

    def startElement(self, name, attributes):
        if name == "Placemark": # on start Placemark tag
            self.inPlacemark = True
            self.buffer = ""
        if self.inPlacemark:
            if name == "name": # on start title tag
                self.inName = True # save name text to follow

    def characters(self, data):
        if self.inPlacemark: # on text within tag
            self.buffer += data # save text if in title

    def endElement(self, name):
        self.buffer = self.buffer.strip('\n\t')

        if name == "Placemark":
            self.inPlacemark = False
            self.name_tag = "" #clear current name

        elif name == "name" and self.inPlacemark:
            self.inName = False # on end title tag
            self.name_tag = self.buffer.strip()
            self.mapping[self.name_tag] = {}
        elif self.inPlacemark:
            if name in self.mapping[self.name_tag]:
                self.mapping[self.name_tag][name] += self.buffer
            else:
                self.mapping[self.name_tag][name] = self.buffer
        self.buffer = ""

4. Create a Parser, set the Handler, and parse the file.

To parse the file we need to create a parser object, set it's handler object to an instance of the custom object we created, execute the parse function on the file, and close the file. After this our mapping dictionary is ready to be used.

parser = xml.sax.make_parser()
handler = PlacemarkHandler()
parser.setContentHandler(handler)
parser.parse(kml)
kmz.close()

5. Build the CSV table for output

The mapping created cointains a great amount of data that we don't need, however there is one thing of note within there. Points contain the tag <LookAt>, lines contain <LineString>, and shapes contain <Polygon>. By testing for these values we are able to sort our output table so all the points will be together, then the lines, then the polygons. Below is a function to build the table:

def build_table(mapping):
    sep = ','

    output = 'Name' + sep + 'Coordinates\n'
    points = ''
    lines = ''
    shapes = ''
    for key in mapping:
        coord_str = mapping[key]['coordinates'] + sep

        if 'LookAt' in mapping[key]: #points
            points += key + sep + coord_str + "\n"
        elif 'LineString' in mapping[key]: #lines
            lines += key + sep + coord_str + "\n"
        else: #shapes
            shapes += key + sep + coord_str + "\n"

output += points + lines + shapes
return output

6. Save the new file, output the data.

outstr = build_table(handler.mapping)
out_filename = filename[:-3] + "csv" #output filename same as input plus .csv
f = open(out_filename, "w")
f.write(outstr)
f.close()
print outstr

14 comments:

Eyes aroundMarch 26, 2014 at 1:47 AM
I found it very helpful when working with kml, and it's to the point. When executing the script an error occured " NameError: name 'outstr' is not defined, how do you go about it?, thanks
UnknownNovember 6, 2014 at 7:44 AM
Good script, this is what i was looking for, thanks for post it
StoneEnvNovember 18, 2014 at 5:12 PM
Seriously! This is awesome! I'd been writing my own code to parse KML-- didn't even know SAX handlers existed. Thanks!
UnknownFebruary 13, 2015 at 5:19 AM
A simple great job!

For who have non-ascii characters in the KMZ file, use the "codecs" Python library to open the output CSV using the desired encoding format.
Jeff and AlliciaMay 2, 2015 at 9:26 PM
This is amazing. I'm a complete beginner in Python and this was a huge jump start to my project. I'm running Python 3.4 and only had to add parenthesis around outstr in the final command. After that, this thing run like a charm and organized my KMZ data perfectly. Top marks, Tyler!
James ReddAugust 12, 2015 at 6:43 AM
Thank you for posting all of this, Tyler. It works when I run it but the output CSV file is empty. Any ideas why that might be?
Leandro de OliveiraMarch 7, 2019 at 12:30 PM
Fantastic! Thank's.
NopeJuly 2, 2019 at 12:14 AM
Hi, I have an error when trying to unzip the KMZ file, can anyone please help me with "Bad magic number for central directory" - error when using ZipFile?
agpJune 24, 2020 at 2:03 PM
good one helped me a lot
Aryan KumarAugust 31, 2020 at 3:04 AM
This comment has been removed by the author.
Aryan KumarAugust 31, 2020 at 3:12 AM
I'm getting this error while running the code.
Any help would be appreciated.

"C:\Users\Aryan Kumar\AppData\Local\Programs\Python\Python38\python.exe" "C:/Users/Aryan Kumar/Desktop/Python/KMLtoCSV/KMLtoCSV.py"
Traceback (most recent call last):
File "C:/Users/Aryan Kumar/Desktop/Python/KMLtoCSV/KMLtoCSV.py", line 54, in
parser.parse(kml)
File "C:\Users\Aryan Kumar\AppData\Local\Programs\Python\Python38\lib\xml\sax\expatreader.py", line 111, in parse
xmlreader.IncrementalParser.parse(self, source)
File "C:\Users\Aryan Kumar\AppData\Local\Programs\Python\Python38\lib\xml\sax\xmlreader.py", line 125, in parse
self.feed(buffer)
File "C:\Users\Aryan Kumar\AppData\Local\Programs\Python\Python38\lib\xml\sax\expatreader.py", line 217, in feed
self._parser.Parse(data, isFinal)
File "C:\A\31\s\Modules\pyexpat.c", line 461, in EndElement
File "C:\Users\Aryan Kumar\AppData\Local\Programs\Python\Python38\lib\xml\sax\expatreader.py", line 336, in end_element
self._cont_handler.endElement(name)
File "C:/Users/Aryan Kumar/Desktop/Python/KMLtoCSV/KMLtoCSV.py", line 44, in endElement
if name in self.mapping[self.name_tag]:
KeyError: ''

Monday, June 24, 2013

KMZ/KML file parsing with Python