Working with OpenOffice/LibreOffice Spreadsheets with Python
Working with OpenOffice/LibreOffice Spreadsheets with Python One improvement of OpenOffice was to introduce Python scripting beside VBA one. You can do internal or external scripting. External scripting is done via Python UNO interface, it's like CORBA objects (...). But resources on web are poor and sparse. Only two websites have a clear and complete information :
https://www.wzdftpd.net/downloads/oowall/pyUnoServerV2.py
http://stuvel.eu/ooo-python
This is a mini HOWTO you can use in your external scripts First you have to start server side OOo/LO :
libreoffice "--accept=socket,host=localhost,port=2002;urp;" --invisible
If you don't want to see OOo/LO interface, add --headless. WARNING: You need to close ALL OOo/LO instances before starting server !
Next, load a document :
def connect(port, filename): # get the uno component context from the PyUNO runtime localContext = uno.getComponentContext() # create the UnoUrlResolver resolver = localContext.ServiceManager.createInstanceWithContext( "com.sun.star.bridge.UnoUrlResolver", localContext) # connect to the running office ctx = resolver.resolve("uno:socket,host=localhost,port=" + str(port) + ";urp;StarOffice.ComponentContext") smgr = ctx.ServiceManager # get the central desktop object DESKTOP =smgr.createInstanceWithContext("com.sun.star.frame.Desktop", ctx) url = unohelper.systemPathToFileUrl( os.path.abspath(filename)) doc = DESKTOP.loadComponentFromURL(url, '_blank', 0, ()) return doc
You can get sheets inside document by creating an enumeration :
doc = connect(port, filename) sheets = doc.getSheets() sheet_enum = sheets.createEnumeration() while sheet_enum.hasMoreElements(): sheet = sheet_enum.nextElement() print sheet.getName()
Retrieve cells :
cell = sheet.getCellByPosition(col, row)
You can use following methods on cell objects : XCell
To retrieve cell type (CellContentType) :
cell.getType()
For me object (or enumeration) comparison fails, so I use string comparison :
if cell.getType().value != 'EMPTY':
cell.getValue() will return cell float value (0.0 if cell is empty or text). Most of the case you need to cast it into int value : int(cell.getValue()) or do all your code with float values !!
Be careful, sometimes cells values are formated with text but contains float/integer !! value = value_cell.getString() will return "0x45"
Now you have all basics to do a spreadsheet parser ! If you don't know how to handle an object, juste print it and look at its supportedInterfaces dictionary, OOo API doc will tells how to handle them.