[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: tosfs, faster hash function...
Julian Reschke writes:
> Re inodes: everybofy with large TOS filesystems should verify that
> the (computed) inodes indeed are unique (as I said: it's highly
> probale, but it can't be guaranteed). If you have gnu-find or my
> find.ttp (mupftl02.tos, ftp.uni-muenster.de, pub/atari/Gemini),
> you can try the following script:
>
> find \ -printf "%i\n" > inodes.all
> sort inodes.all > inodes.tot
> uniq inodes.tot > inodes.uni
> wc -l inodes.tot inodes.uni
or diff -u ...
> rm inodes.tot inodes.uni inodes.all
>
..but thats not why i followup :)
>+
>+ #define UPDC32(octet, crc) (((crc) << 8) ^ crctab[((crc) >> 24) ^ (octet)])
hmm shifts are slow on 68000... here is one that should be a bit faster,
from dbz (cnews):
/*
* This is a simplified version of the pathalias hashing function.
* Thanks to Steve Belovin and Peter Honeyman
*
* hash a string into a long int. 31 bit crc (from andrew appel).
* the crc table is computed at run time by crcinit() -- we could
* precompute, but it takes 1 clock tick on a 750.
*
* This fast table calculation works only if POLY is a prime polynomial
* in the field of integers modulo 2. Since the coefficients of a
* 32-bit polynomial won't fit in a 32-bit word, the high-order bit is
* implicit. IT MUST ALSO BE THE CASE that the coefficients of orders
* 31 down to 25 are zero. Happily, we have candidates, from
* E. J. Watson, "Primitive Polynomials (Mod 2)", Math. Comp. 16 (1962):
* x^32 + x^7 + x^5 + x^3 + x^2 + x^1 + x^0
* x^31 + x^3 + x^0
*
* We reverse the bits to get:
* 111101010000000000000000000000001 but drop the last 1
* f 5 0 0 0 0 0 0
* 010010000000000000000000000000001 ditto, for 31-bit crc
* 4 8 0 0 0 0 0 0
*/
#define POLY 0x48000000L /* 31-bit polynomial (avoids sign problems) */
static long CrcTable[128];
/*
- crcinit - initialize tables for hash function
*/
static void
crcinit()
{
register int i, j;
register long sum;
for (i = 0; i < 128; ++i) {
sum = 0L;
for (j = 7 - 1; j >= 0; --j)
if (i & (1 << j))
sum ^= POLY >> j;
CrcTable[i] = sum;
}
DEBUG(("crcinit: done\n"));
}
/*
- hash - Honeyman's nice hashing function
*/
static long
hash(name, size)
register char *name;
register int size;
{
register long sum = 0L;
while (size--) {
sum = (sum >> 7) ^ CrcTable[(sum ^ (*name++)) & 0x7f];
}
DEBUG(("hash: returns (%ld)\n", sum));
return(sum);
}
i have not done tests which one is better for filenames but there should
be no difference.
>+
>+ static unsigned long
>+ filename_crc (const char *filename)
>+ {
>+ unsigned long s = 0;
>+ unsigned int n = 0;
>+
>+ /* skip x: */
>+ filename += 2;
>+
>+ while (*filename) {
>+ s = UPDC32 (*filename++, s);
>+ n++;
>+ }
>+
>+ while (n != 0) {
>+ s = UPDC32 (n & 0377, s);
>+ n >>= 8;
>+ }
>+
>+ return s;
>+ }
>+
>+ #endif
does the second loop improve results?
just a thought...
Juergen
PS: there is a problem too with using start clusters: 0 byte files...
--
J"urgen Lock / nox@jelal.north.de / UUCP: ..!uunet!unido!uniol!jelal!nox
...ohne Gewehr
PGP public key fingerprint = 8A 18 58 54 03 7B FC 12 1F 8B 63 C7 19 27 CF DA
- References:
- tosfs
- From: Julian Reschke <reschke@GOEDEL.UNI-MUENSTER.DE>