[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tosfs, faster hash function...



Julian Reschke writes:

> Re inodes: everybofy with large TOS filesystems should verify that
> the (computed) inodes indeed are unique (as I said: it's highly
> probale, but it can't be guaranteed). If you have gnu-find or my
> find.ttp (mupftl02.tos, ftp.uni-muenster.de, pub/atari/Gemini),
> you can try the following script:
> 
> find \ -printf "%i\n" > inodes.all
> sort inodes.all > inodes.tot
> uniq inodes.tot > inodes.uni
> wc -l inodes.tot inodes.uni

 or diff -u ...

> rm inodes.tot inodes.uni inodes.all
> 
 ..but thats not why i followup :)

>+ 
>+ #define UPDC32(octet, crc) (((crc) << 8) ^ crctab[((crc) >> 24) ^ (octet)])

 hmm shifts are slow on 68000...  here is one that should be a bit faster,
from dbz (cnews):

/*
 * This is a simplified version of the pathalias hashing function.
 * Thanks to Steve Belovin and Peter Honeyman
 *
 * hash a string into a long int.  31 bit crc (from andrew appel).
 * the crc table is computed at run time by crcinit() -- we could
 * precompute, but it takes 1 clock tick on a 750.
 *
 * This fast table calculation works only if POLY is a prime polynomial
 * in the field of integers modulo 2.  Since the coefficients of a
 * 32-bit polynomial won't fit in a 32-bit word, the high-order bit is
 * implicit.  IT MUST ALSO BE THE CASE that the coefficients of orders
 * 31 down to 25 are zero.  Happily, we have candidates, from
 * E. J.  Watson, "Primitive Polynomials (Mod 2)", Math. Comp. 16 (1962):
 *	x^32 + x^7 + x^5 + x^3 + x^2 + x^1 + x^0
 *	x^31 + x^3 + x^0
 *
 * We reverse the bits to get:
 *	111101010000000000000000000000001 but drop the last 1
 *         f   5   0   0   0   0   0   0
 *	010010000000000000000000000000001 ditto, for 31-bit crc
 *	   4   8   0   0   0   0   0   0
 */

#define POLY 0x48000000L	/* 31-bit polynomial (avoids sign problems) */

static long CrcTable[128];

/*
 - crcinit - initialize tables for hash function
 */
static void
crcinit()
{
	register int i, j;
	register long sum;

	for (i = 0; i < 128; ++i) {
		sum = 0L;
		for (j = 7 - 1; j >= 0; --j)
			if (i & (1 << j))
				sum ^= POLY >> j;
		CrcTable[i] = sum;
	}
	DEBUG(("crcinit: done\n"));
}

/*
 - hash - Honeyman's nice hashing function
 */
static long
hash(name, size)
register char *name;
register int size;
{
	register long sum = 0L;

	while (size--) {
		sum = (sum >> 7) ^ CrcTable[(sum ^ (*name++)) & 0x7f];
	}
	DEBUG(("hash: returns (%ld)\n", sum));
	return(sum);
}

 i have not done tests which one is better for filenames but there should
be no difference.

>+ 
>+ static unsigned long
>+ filename_crc (const char *filename)
>+ {
>+ 	unsigned long s = 0;
>+ 	unsigned int n = 0;
>+ 	
>+ 	/* skip x: */
>+ 	filename += 2;
>+ 	
>+ 	while (*filename) {
>+ 		s = UPDC32 (*filename++, s);
>+ 		n++;
>+ 	}
>+ 
>+ 	while (n != 0) {
>+ 		s = UPDC32 (n & 0377, s);
>+ 		n >>= 8;
>+ 	}
>+ 	
>+ 	return s;
>+ }
>+ 
>+ #endif

 does the second loop improve results?

 just a thought...
	Juergen

PS: there is a problem too with using start clusters: 0 byte files...
-- 
J"urgen Lock / nox@jelal.north.de / UUCP: ..!uunet!unido!uniol!jelal!nox
								...ohne Gewehr
PGP public key fingerprint =  8A 18 58 54 03 7B FC 12  1F 8B 63 C7 19 27 CF DA