Python version: 2.7.5
Source: kmz_parser.py
Brief:
This totorial describes a method for writing a python script to extracting coordinate and label information from kmz/kml files then exporting to a csv file.
Python provides many nice built in libaries, the first we are going to use is zipfile.
from zipfile import ZipFile
filename = 'test.kmz'
kmz = ZipFile(filename, 'r')
kml = kmz.open('doc.kml', 'r')
This opens the doc.kml file as a standard file for reading. You can now parse the file.
The data we are going to capture will be stored in a nested dictionary object. Each Placemark's <name> attribute data will be a key maped to a second dictionary object. Inside this object each Element name will become a key mapped to the data contained in that Element. This will allow us to extract all the data contained within each placemark, including the 'coordinates' attribute. See code below:
import xml.sax, xml.sax.handler
class PlacemarkHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self.inName = False # handle XML parser events
self.inPlacemark = False
self.mapping = {}
self.buffer = ""
self.name_tag = ""
def startElement(self, name, attributes):
if name == "Placemark": # on start Placemark tag
self.inPlacemark = True
self.buffer = ""
if self.inPlacemark:
if name == "name": # on start title tag
self.inName = True # save name text to follow
def characters(self, data):
if self.inPlacemark: # on text within tag
self.buffer += data # save text if in title
def endElement(self, name):
self.buffer = self.buffer.strip('\n\t')
if name == "Placemark":
self.inPlacemark = False
self.name_tag = "" #clear current name
elif name == "name" and self.inPlacemark:
self.inName = False # on end title tag
self.name_tag = self.buffer.strip()
self.mapping[self.name_tag] = {}
elif self.inPlacemark:
if name in self.mapping[self.name_tag]:
self.mapping[self.name_tag][name] += self.buffer
else:
self.mapping[self.name_tag][name] = self.buffer
self.buffer = ""
Source: kmz_parser.py
Brief:
This totorial describes a method for writing a python script to extracting coordinate and label information from kmz/kml files then exporting to a csv file.
1. Unzip the KMZ and extract doc.kml
Keyhole Markup Language (KMZ) files are google earth files that can contain points and lines and shapes from google earth. They are simply zipped archives. Inside they contain a plain text xml file doc.kml. To look at this file rename your .kmz to .zip, extract it and open doc.kml.Python provides many nice built in libaries, the first we are going to use is zipfile.
from zipfile import ZipFile
filename = 'test.kmz'
kmz = ZipFile(filename, 'r')
kml = kmz.open('doc.kml', 'r')
This opens the doc.kml file as a standard file for reading. You can now parse the file.
2. Examine the KML file to determine the type of information you want and how it's stored.
IDLE (Python included editor) is a good editor for viewing kml files. For extracting the names of items and their grids we need to look at three tags, <Placemark>, <name> and <coordinates>. <Placemark> tags surround each item, inside they have a <name> and <coordinates> tag. Of note is the parser we are going to use calls tags "Elements".3. Write a SAX handler
Simple API for XML (SAX) allows us to parse XML files. Python naturally has a built in library for this, xml.sax. To make this work we use ineritance and create our own custom xlm.sax.handler.ContentHandler class. To understand how the SAX parser will work, when we feed it our file and our ContentHandler object it will call the methods within our ContentHandler object at certian times. To make sure that it knows the names of our functions we are required to create an child class of the ContentHandler class that contains dummy methods for all these different events. By overriding the functionality of these functions we can make the parser do our work when it reaches each of these events. The documentation on the ContentHandler base class in the python documentation has the names of these methods and a description of when they are called. The ones we are interested in are these:
- __init__(self)
- constructor, called when the object is created
- startElement(self, name, attributes)
- called at start elements (i.e. '<Placemark>', and <name>, etc.)
- characters(self, data)
- called at text between elements
- endElement(self, name)
- called at end elements (i.e. '</Placemark>', and <name>, etc.)
The data we are going to capture will be stored in a nested dictionary object. Each Placemark's <name> attribute data will be a key maped to a second dictionary object. Inside this object each Element name will become a key mapped to the data contained in that Element. This will allow us to extract all the data contained within each placemark, including the 'coordinates' attribute. See code below:
import xml.sax, xml.sax.handler
class PlacemarkHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self.inName = False # handle XML parser events
self.inPlacemark = False
self.mapping = {}
self.buffer = ""
self.name_tag = ""
def startElement(self, name, attributes):
if name == "Placemark": # on start Placemark tag
self.inPlacemark = True
self.buffer = ""
if self.inPlacemark:
if name == "name": # on start title tag
self.inName = True # save name text to follow
def characters(self, data):
if self.inPlacemark: # on text within tag
self.buffer += data # save text if in title
def endElement(self, name):
self.buffer = self.buffer.strip('\n\t')
if name == "Placemark":
self.inPlacemark = False
self.name_tag = "" #clear current name
elif name == "name" and self.inPlacemark:
self.inName = False # on end title tag
self.name_tag = self.buffer.strip()
self.mapping[self.name_tag] = {}
elif self.inPlacemark:
if name in self.mapping[self.name_tag]:
self.mapping[self.name_tag][name] += self.buffer
else:
self.mapping[self.name_tag][name] = self.buffer
self.buffer = ""
4. Create a Parser, set the Handler, and parse the file.
To parse the file we need to create a parser object, set it's handler object to an instance of the custom object we created, execute the parse function on the file, and close the file. After this our mapping dictionary is ready to be used.
parser = xml.sax.make_parser()
handler = PlacemarkHandler()
parser.setContentHandler(handler)
parser.parse(kml)
kmz.close()
handler = PlacemarkHandler()
parser.setContentHandler(handler)
parser.parse(kml)
kmz.close()
5. Build the CSV table for output
The mapping created cointains a great amount of data that we don't need, however there is one thing of note within there. Points contain the tag <LookAt>, lines contain <LineString>, and shapes contain <Polygon>. By testing for these values we are able to sort our output table so all the points will be together, then the lines, then the polygons. Below is a function to build the table:
def build_table(mapping):
sep = ','
output = 'Name' + sep + 'Coordinates\n'
points = ''
lines = ''
shapes = ''
for key in mapping:
coord_str = mapping[key]['coordinates'] + sep
if 'LookAt' in mapping[key]: #points
points += key + sep + coord_str + "\n"
elif 'LineString' in mapping[key]: #lines
lines += key + sep + coord_str + "\n"
else: #shapes
shapes += key + sep + coord_str + "\n"
sep = ','
output = 'Name' + sep + 'Coordinates\n'
points = ''
lines = ''
shapes = ''
for key in mapping:
coord_str = mapping[key]['coordinates'] + sep
if 'LookAt' in mapping[key]: #points
points += key + sep + coord_str + "\n"
elif 'LineString' in mapping[key]: #lines
lines += key + sep + coord_str + "\n"
else: #shapes
shapes += key + sep + coord_str + "\n"
output += points + lines + shapes
return output
return output
6. Save the new file, output the data.
outstr = build_table(handler.mapping)
out_filename = filename[:-3] + "csv" #output filename same as input plus .csv
f = open(out_filename, "w")
f.write(outstr)
f.close()
print outstr
out_filename = filename[:-3] + "csv" #output filename same as input plus .csv
f = open(out_filename, "w")
f.write(outstr)
f.close()
print outstr
I found it very helpful when working with kml, and it's to the point. When executing the script an error occured " NameError: name 'outstr' is not defined, how do you go about it?, thanks
ReplyDeleteyou need to put brackets around outstr.
Deleteprint (outstr)
Good script, this is what i was looking for, thanks for post it
ReplyDeleteSeriously! This is awesome! I'd been writing my own code to parse KML-- didn't even know SAX handlers existed. Thanks!
ReplyDeleteA simple great job!
ReplyDeleteFor who have non-ascii characters in the KMZ file, use the "codecs" Python library to open the output CSV using the desired encoding format.
This comment has been removed by the author.
DeleteWhen output to file, I do below change and work.
Deletewith open(out_filename, "w") as f:
f.write(outstr.encode('utf8'))
This is amazing. I'm a complete beginner in Python and this was a huge jump start to my project. I'm running Python 3.4 and only had to add parenthesis around outstr in the final command. After that, this thing run like a charm and organized my KMZ data perfectly. Top marks, Tyler!
ReplyDeleteThank you for posting all of this, Tyler. It works when I run it but the output CSV file is empty. Any ideas why that might be?
ReplyDeleteFantastic! Thank's.
ReplyDeleteHi, I have an error when trying to unzip the KMZ file, can anyone please help me with "Bad magic number for central directory" - error when using ZipFile?
ReplyDeletegood one helped me a lot
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI'm getting this error while running the code.
ReplyDeleteAny help would be appreciated.
"C:\Users\Aryan Kumar\AppData\Local\Programs\Python\Python38\python.exe" "C:/Users/Aryan Kumar/Desktop/Python/KMLtoCSV/KMLtoCSV.py"
Traceback (most recent call last):
File "C:/Users/Aryan Kumar/Desktop/Python/KMLtoCSV/KMLtoCSV.py", line 54, in
parser.parse(kml)
File "C:\Users\Aryan Kumar\AppData\Local\Programs\Python\Python38\lib\xml\sax\expatreader.py", line 111, in parse
xmlreader.IncrementalParser.parse(self, source)
File "C:\Users\Aryan Kumar\AppData\Local\Programs\Python\Python38\lib\xml\sax\xmlreader.py", line 125, in parse
self.feed(buffer)
File "C:\Users\Aryan Kumar\AppData\Local\Programs\Python\Python38\lib\xml\sax\expatreader.py", line 217, in feed
self._parser.Parse(data, isFinal)
File "C:\A\31\s\Modules\pyexpat.c", line 461, in EndElement
File "C:\Users\Aryan Kumar\AppData\Local\Programs\Python\Python38\lib\xml\sax\expatreader.py", line 336, in end_element
self._cont_handler.endElement(name)
File "C:/Users/Aryan Kumar/Desktop/Python/KMLtoCSV/KMLtoCSV.py", line 44, in endElement
if name in self.mapping[self.name_tag]:
KeyError: ''