Programmation

GcaptchaZ

Monday, 19 November 2012
|
Écrit par
Grégory Soutadé

GcaptchaZ est un petit soft qui permet de générer des CAPTCHA en ligne de commande. C'est la transcription en C du générateur PHP de Logz avec quelques modifications. Je trouve que les CAPTCHA générés sont beaucoup plus sympa et compréhensibles que les reCAPTCHA de Google et c'est 'à priori' le seul générateur en ligne de commande, les CAPTCHA étant normalement générés de manière dynamique et utilisés une seule fois.

Au début je voulais l'intégrer dans Dynastie, l'idée étant de générer un nombre fixe de CAPTCHA pour les commentaires, en gardant la correspondance fichier <--> résultat. Pour cela j'utilise une petite astuce : le nom du fichier devait être le résultat du CAPTCHA chiffré en AES 128 avec le MD5 d'une passphrase (ce qui évite aussi de lister les captchas si on les nomme avec un id incrémental). La seconde étape consiste à les répartir aléatoirement dans les commentaires et de vérifier qu'une adresse IP ne résout pas plus de trois fois le même CAPTCHA (en modifiant l'id dans le formulaire des commentaires).

Finalement il ne sera pas intégré (du moins pour le moment). En effet, lors de mes recherches je suis tombé sur une astuce plus intelligente. Le but des CAPTCHA est de n'être déchiffré que par un humain, ce qui a le désavantage d'énerver les utilisateurs quand ils sont incompréhensibles. Donc un CAPTCHA n'arrêtera pas un humain qui veut spammer un forum/blog... Par contre, les robots spammeurs remplissent automatiquement le formulaire de commentaire, l'astuce consiste donc à mettre un champ email invisible et de vérifier côté serveur s'il est rempli ou non.

Logo GcaptchaZ

Fast software DES implementation

Wednesday, 31 October 2012
|
Écrit par
Grégory Soutadé

This is a fast software DES implementation for 64 bits machines thats supports intel SSE instructions (SSE, SSE2, SSE3 and SSSE3). The x86_64 architecture lacks support for bit manipulation and the ones introduced by AMD are not used here. The main SSE instruction used is "pmovmskb" that will extract the most significant bit of each byte inside an XMM register. That allows to do linear permutations with few instructions, but for random permutations a simple mask-shift is used. The next generation of processors with AVX2 should includes some bit manipulation instructions that will even more speed up DES computation.

Currently, my implementation is about 10 times faster than the one in OpenSSL 1.0.1c, assembly optimized but 32 bits centered. The package includes a basic implementation in full C (des.c) for algorithm comprehension and the fast implementation in fast_des.c. fast_des.c is linked with the libcrypto.a of OpenSSL (configured with linux-x86_64) to do benchmarks. The benchmark is to do 1 000 000 full DES computation. I obtained these values :

fast_des.c Init value 123456789abcdef Fast des Time 0 29333426 85e813540f0ab405 Des Time 0 362639005 85e813540f0ab405 des.c : Init value 123456789abcdef Time 0 236431885 85e813540f0ab405

My configuration is compounded by an intel core i5 m450 @ 2.4Ghz, 4Gb of RAM, Linux 64, gcc 4.7.1. One funny thing is that if I compile fast_des.c with -O3 flag, performances dramatically falls just under OpenSSL implementation. On an core 2 duo E7500 @ 2.93GHz :

Init value 123456789abcdef Fast des Time 0 110037322 85e813540f0ab405 Des Time 0 352010444 85e813540f0ab405

Source file can be found here.

Working with OpenOffice/LibreOffice Spreadsheets with Python

Monday, 22 October 2012
|
Écrit par
Grégory Soutadé

Working with OpenOffice/LibreOffice Spreadsheets with Python One improvement of OpenOffice was to introduce Python scripting beside VBA one. You can do internal or external scripting. External scripting is done via Python UNO interface, it's like CORBA objects (...). But resources on web are poor and sparse. Only two websites have a clear and complete information :
https://www.wzdftpd.net/downloads/oowall/pyUnoServerV2.py
http://stuvel.eu/ooo-python

This is a mini HOWTO you can use in your external scripts First you have to start server side OOo/LO :

libreoffice "--accept=socket,host=localhost,port=2002;urp;" --invisible

If you don't want to see OOo/LO interface, add --headless. WARNING: You need to close ALL OOo/LO instances before starting server !

Next, load a document :

def connect(port, filename): # get the uno component context from the PyUNO runtime localContext = uno.getComponentContext() # create the UnoUrlResolver resolver = localContext.ServiceManager.createInstanceWithContext( "com.sun.star.bridge.UnoUrlResolver", localContext) # connect to the running office ctx = resolver.resolve("uno:socket,host=localhost,port=" + str(port) + ";urp;StarOffice.ComponentContext") smgr = ctx.ServiceManager # get the central desktop object DESKTOP =smgr.createInstanceWithContext("com.sun.star.frame.Desktop", ctx) url = unohelper.systemPathToFileUrl( os.path.abspath(filename)) doc = DESKTOP.loadComponentFromURL(url, '_blank', 0, ()) return doc

You can get sheets inside document by creating an enumeration :

doc = connect(port, filename) sheets = doc.getSheets() sheet_enum = sheets.createEnumeration() while sheet_enum.hasMoreElements(): sheet = sheet_enum.nextElement() print sheet.getName()

Retrieve cells :

cell = sheet.getCellByPosition(col, row)

You can use following methods on cell objects : XCell

To retrieve cell type (CellContentType) :

cell.getType()

For me object (or enumeration) comparison fails, so I use string comparison :

if cell.getType().value != 'EMPTY':

cell.getValue() will return cell float value (0.0 if cell is empty or text). Most of the case you need to cast it into int value : int(cell.getValue()) or do all your code with float values !!

Be careful, sometimes cells values are formated with text but contains float/integer !! value = value_cell.getString() will return "0x45"

Now you have all basics to do a spreadsheet parser ! If you don't know how to handle an object, juste print it and look at its supportedInterfaces dictionary, OOo API doc will tells how to handle them.

How to load UTF8 data with python minidom ?

Wednesday, 22 August 2012
|
Écrit par
Grégory Soutadé

For the dynastie project, I need to load data encoded in UTF-8 with Python minidom XML parser. But when I wrote node.toxml('utf-8') to display the XML tree, I get this error :

UnicodeDecodeError at /generate/1

'ascii' codec can't decode byte 0xc2 in position 187: ordinal not in range(128)

In facts Python thinks that all data in XML tree are in ASCII and try to encode it into UTF-8 (or anything else you supplied). The solution is to use your own writer that will convert all non utf-8 strings in unicode string which can be then re-encoded in every format (like utf-8). This doesn't appears in Python 3 because, in Python 3, all strings are already in unicode. Add the following class to your code :

class UnicodeWriter(codecs.StreamWriter): encode = codecs.utf_8_encode def __init__(self): self.value = u'' def write(self, object): if not type(object) == unicode: self.value = self.value + unicode(object, 'utf-8') else: self.value = self.value + object return self.value def reset(self): self.value = u'' def getvalue(self): return self.value

And our node.toxml('utf-8') becomes :

writer = UnicodeWriter() node.writexml(writer) writer.getvalue().encode('utf-8')