This directory contains various ciphers and digests pulled out of SSLeay. There is x86 assember for rc4, des, blowfish, cast5, md5 and sha1. On a pentium the md5 takes 337 cycles per block, and is faster than the speed listed in 'Even Faster Hashing on the Pentium' (345 cycles). The sha1 inner loop, is the same speed, (837 cycles). Blowfish has an inner loop that is 9 cycles per round. There is a faster version for the pentium pro, but the default version is probably the best option. There are 2 variants on the CAST5 implementation. One has 13 cycles per round, and the other has 14. The 13 cycle version unfortunatly runs %30 slower on a pentium pro than the 14 cycle version. RC4 processes 8 bytes per 70 cycles. cbc mode of these ciphers is implemented via assembler, but not inline code. If you want another %2-3 speedup, you could easily remove the function call overhead at the expense of increased code size for the library. Anyway, what does this mean in the real world? Using the SSLeay 'speed' test program, under linux on a pentium 100, built on Tue Nov 4 02:52:29 EST 1997 options:bn(64,32) md2(int) rc4(ptr,int) des(ptr,risc1,16,long) idea(int) blowfish(ptr2) C flags:gcc -DL_ENDIAN -DTERMIO -DBN_ASM -O3 -fomit-frame-pointer -m486 -Wall -Wuninitialized -DMD5_ASM -DSHA1_ASM The 'numbers' are in 1000s of bytes per second processed. type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md5 993.15k 5748.27k 11944.70k 16477.53k 18287.27k sha1 563.24k 2851.67k 5363.71k 6879.23k 7441.07k rc4 7876.70k 10400.85k 10825.90k 10943.49k 10745.17k des cbc 2047.39k 2188.25k 2188.29k 2239.49k 2233.69k des ede3 660.55k 764.01k 773.55k 779.21k 780.97k idea cbc 653.93k 708.48k 715.43k 719.87k 720.90k rc2 cbc 648.08k 702.23k 708.78k 711.00k 709.97k blowfish cbc 3764.39k 4288.66k 4375.04k 4497.07k 4423.68k cast cbc 2757.14k 2993.75k 3035.31k 3078.90k 3055.62k blowfish cbc [*] 3258.81k 3673.47k 3767.30k 3774.12k 3719.17k cast cbc [**] 2677.05k 3164.78k 3273.05k 3287.38k 3244.03k [*] pentium pro specific version [**] pentium specific version For a pentum pro 200, Windows 95, SSLeay with DLLs built on Tue Nov 4 08:57:30 EST 1997 options:bn(64,32) md2(int) rc4(idx,int) des(idx,cisc,4,long) idea(int) blowfish(ptr2) C flags:cl /W3 /WX /G5 /Ox /O2 /Ob2 /Gs0 /GF /Gy /nologo -DWIN32 -DL_ENDIAN -DBN_ASM -DMD5_ASM -DSHA1_ASM /MD The 'numbers' are in 1000s of bytes per second processed. type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md5 2251.85k 11966.63k 22944.77k 29916.58k 32729.64k sha1 1398.85k 6621.89k 11831.61k 14722.02k 15863.48k rc4 15744.38k 21239.13k 22093.89k 22419.84k 22564.58k des cbc 4147.03k 4571.95k 4654.13k 4673.84k 4654.13k des ede3 1564.92k 1631.00k 1642.47k 1646.21k 1641.86k idea cbc 2582.06k 2888.64k 2936.78k 2953.74k 2949.58k rc2 cbc 1646.37k 1782.23k 1800.59k 1805.24k 1806.96k blowfish cbc 6052.39k 7025.63k 7123.48k 7172.20k 7147.76k cast cbc 6000.43k 6978.88k 7123.48k 7147.76k 7123.48k blowfish cbc [*] 6404.43k 7304.13k 7508.36k 7627.94k 7477.61k cast cbc [**] 4404.82k 4909.89k 5000.44k 4993.28k 5013.06k [*] pentium pro specific version [**] pentium specific version Work still to be done. - Test vectors for various modes of CAST5 need to be put into casttest.c - More work on C code variants of the CAST inner loop. - More testing of the various assember implementations. - General code cleanups eric 15-Nov-1997 08-Jan-1998 - fixes to md5 and sha1 for bignendian machines.