This is the mail archive of the cygwin-developers@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: key64_t? ino64_t?


On 14 May 2003 19:09:24 +1000, "Robert Collins" <rbcollins@cygwin.com>
said:

> Actually, both C and C++ implement synthetic operator == for structs. No
> class magic needed. They simply compare the bit pattern of the two
> structs.

Okay, but now that I think about it, key1 < key2 may actually be needed
(inside cygserver; certainly no user needs to compare two keys). I'm
thinking of sorted lists, key lookup, that kind of thing.  (Naturally,
there are different mechanisms for key lookup if the keys are stored in a
tree structure, which don't involve '<', but that's another topic)

> > But that kills packing (e.g. a key_t object is no longer '72 bits of id
> > data') because key_t now has vtables, ctor, dtor, etc etc. 
> 
> No, none of the above apply.

Correct.  Unless I'm right about '<'

> Aliasing isn't bad. However: we *must* prevent clashes. Probability has
> nothing to do with it. I can't comment on the Linux implementation: I
> haven't reviewed it.

Why "must" we?  I agree that clashes are bad -- but "guaranteed unique
keys"?  Does SuS2 really require that?  Because, if you give me a finite
key bitlength, I can always create a scenario that breaks the
"guarantee".  I do not think it is possible to always eliminate the
possibility of clashing -- ALL you can ever do is make it as unlikely an
event as possible.

> I don't like the patch : Hashing is bad. We hash today ... if we are
> going to change *at all*, lets Do It Right.

Well, discussion died down, and it appears that the consensus is to go
ahead with 64bit key_t prior to 1.5.0, and I want to see 1.5.0, so...I
kickstarted the discussion again.  Looks like that worked.

> /* pseduo code
> assuming we have a typedef long key_t;
> */
> key_t
> ftok (const char *path, int id)
> {
>     return cygserver_ftok(canonical_path(path), id);
> }
> 
> /*
> in cygserver
> */
> 
> key_t
> cygserver_ftok(const char *path, int id)
> {
>     /* 24 bits for the trie node, 8 bits for the userspace id. */
>     ftokMutex.lock();
>     FTOKNode * aNode = FTOKNode::Keys.find(path);
>     if (!aNode) {
>         aNode = new FTOKNode();
>         FTOKNode::Keys.insert (path, aNode);
>     }
>     ftokMutex.unLock();
>     return aNode->key || id;
> }
> 
> where 
> FTOKNode::FTOKNode() 
> {
>    key = NextKey++;
> }
> 
> Keys can be a tree or a trie, for most common uses (say less than a 1000
> unique ftok's) either will perform *just fine*. A trie would be needed
> for larger implementations...

I don't like this.  Without the full implementation, I'm just whistling
in the dark here -- but there are a lot of gotchas in this code.  All of
them can be coded around, within the Tree/Trie implementation.  But more
code == more complexity.

What's the benefit of the extra complexity?

* guaranteed no-clash until you run out of bits.

What's the *value* of this benefit?

* very low: you make an extremely unlikely event even more unlikely --
but still not impossible.  I *can* construct a scenario in which this
tree-based thing fails.

Is the value worth the cost (complexity)?

I don't think so, but that's just my opinion.

> Should give a reasonably performing implementation with space for
> 16.7Million ftok() calls on unique paths. It will use space linear to
> the path length of each unique path presented to ftok().
> 
> This is what I meant by a lookaside table.

See, to me this is a tree-based name hash.  You say 'lookaside' and I
think L2 cache... <g>

--Chuck
--
  Charles Wilson
  cygwin at removespam cwilson dot fastmail dot fm


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]