Fast software DES implementation
This is a fast software DES implementation for 64 bits machines thats supports intel SSE instructions (SSE, SSE2, SSE3 and SSSE3). The x86_64 architecture lacks support for bit manipulation and the ones introduced by AMD are not used here. The main SSE instruction used is "pmovmskb" that will extract the most significant bit of each byte inside an XMM register. That allows to do linear permutations with few instructions, but for random permutations a simple mask-shift is used. The next generation of processors with AVX2 should includes some bit manipulation instructions that will even more speed up DES computation.
Currently, my implementation is about 10 times faster than the one in OpenSSL 1.0.1c, assembly optimized but 32 bits centered. The package includes a basic implementation in full C (des.c) for algorithm comprehension and the fast implementation in fast_des.c. fast_des.c is linked with the libcrypto.a of OpenSSL (configured with linux-x86_64) to do benchmarks. The benchmark is to do 1 000 000 full DES computation. I obtained these values :
fast_des.c Init value 123456789abcdef Fast des Time 0 29333426 85e813540f0ab405 Des Time 0 362639005 85e813540f0ab405 des.c : Init value 123456789abcdef Time 0 236431885 85e813540f0ab405
My configuration is compounded by an intel core i5 m450 @ 2.4Ghz, 4Gb of RAM, Linux 64, gcc 4.7.1. One funny thing is that if I compile fast_des.c with -O3 flag, performances dramatically falls just under OpenSSL implementation. On an core 2 duo E7500 @ 2.93GHz :
Init value 123456789abcdef Fast des Time 0 110037322 85e813540f0ab405 Des Time 0 352010444 85e813540f0ab405
Source file can be found here.