Python version: 2.7.5
Source:
kmz_parser.py
Brief:
This totorial describes a method for writing a python script to extracting coordinate and label information from kmz/kml files then exporting to a csv file.
1. Unzip the KMZ and extract doc.kml
Keyhole Markup Language (KMZ) files are google earth files that can contain points and lines and shapes from google earth. They are simply zipped archives. Inside they contain a plain text
xml file doc.kml. To look at this file rename your .kmz to .zip, extract it and open doc.kml.
Python provides many nice built in libaries, the first we are going to use is
zipfile.
from zipfile import ZipFile
filename = 'test.kmz'
kmz = ZipFile(filename, 'r')
kml = kmz.open('doc.kml', 'r')
This opens the doc.kml file as a standard file for reading. You can now parse the file.
2. Examine the KML file to determine the type of information you want and how it's stored.
IDLE (Python included editor) is a good editor for viewing kml files. For extracting the names of items and their grids we need to look at three tags, <Placemark>, <name> and <coordinates>. <Placemark> tags surround each item, inside they have a <name> and <coordinates> tag. Of note is the parser we are going to use calls tags "Elements".
3. Write a SAX handler
Simple API for XML (SAX) allows us to parse XML files. Python naturally has a built in library for this,
xml.sax. To make this work we use
ineritance and create our own custom
xlm.sax.handler.ContentHandler class. To understand how the SAX parser will work, when we feed it our file and our ContentHandler object it will call the methods within our ContentHandler object at certian times. To make sure that it knows the names of our functions we are required to create an child class of the ContentHandler class that contains dummy methods for all these different events. By overriding the functionality of these functions we can make the parser do our work when it reaches each of these events. The
documentation on the ContentHandler base class in the python documentation has the names of these methods and a description of when they are called. The ones we are interested in are these:
- __init__(self)
- constructor, called when the object is created
- startElement(self, name, attributes)
- called at start elements (i.e. '<Placemark>', and <name>, etc.)
- characters(self, data)
- called at text between elements
- endElement(self, name)
- called at end elements (i.e. '</Placemark>', and <name>, etc.)
The data we are going to capture will be stored in a nested
dictionary object. Each Placemark's <name> attribute data will be a key maped to a second dictionary object. Inside this object each Element name will become a key mapped to the data contained in that Element. This will allow us to extract all the data contained within each placemark, including the 'coordinates' attribute. See code below:
import xml.sax, xml.sax.handler
class PlacemarkHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self.inName = False # handle XML parser events
self.inPlacemark = False
self.mapping = {}
self.buffer = ""
self.name_tag = ""
def startElement(self, name, attributes):
if name == "Placemark": # on start Placemark tag
self.inPlacemark = True
self.buffer = ""
if self.inPlacemark:
if name == "name": # on start title tag
self.inName = True # save name text to follow
def characters(self, data):
if self.inPlacemark: # on text within tag
self.buffer += data # save text if in title
def endElement(self, name):
self.buffer = self.buffer.strip('\n\t')
if name == "Placemark":
self.inPlacemark = False
self.name_tag = "" #clear current name
elif name == "name" and self.inPlacemark:
self.inName = False # on end title tag
self.name_tag = self.buffer.strip()
self.mapping[self.name_tag] = {}
elif self.inPlacemark:
if name in self.mapping[self.name_tag]:
self.mapping[self.name_tag][name] += self.buffer
else:
self.mapping[self.name_tag][name] = self.buffer
self.buffer = ""
4. Create a Parser, set the Handler, and parse the file.
To parse the file we need to create a parser object, set it's handler object to an instance of the custom object we created, execute the parse function on the file, and close the file. After this our mapping dictionary is ready to be used.
parser = xml.sax.make_parser()
handler = PlacemarkHandler()
parser.setContentHandler(handler)
parser.parse(kml)
kmz.close()
5. Build the CSV table for output
The mapping created cointains a great amount of data that we don't need, however there is one thing of note within there. Points contain the tag <LookAt>, lines contain <LineString>, and shapes contain <Polygon>. By testing for these values we are able to sort our output table so all the points will be together, then the lines, then the polygons. Below is a function to build the table:
def build_table(mapping):
sep = ','
output = 'Name' + sep + 'Coordinates\n'
points = ''
lines = ''
shapes = ''
for key in mapping:
coord_str = mapping[key]['coordinates'] + sep
if 'LookAt' in mapping[key]: #points
points += key + sep + coord_str + "\n"
elif 'LineString' in mapping[key]: #lines
lines += key + sep + coord_str + "\n"
else: #shapes
shapes += key + sep + coord_str + "\n"
output += points + lines + shapes
return output
6. Save the new file, output the data.
outstr = build_table(handler.mapping)
out_filename = filename[:-3] + "csv" #output filename same as input plus .csv
f = open(out_filename, "w")
f.write(outstr)
f.close()
print outstr