This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] glob: Simplify the interface for the GLOB_ALTDIRFUNC callback gl_readdir


> I'm trying to get clarification if getdents on Linux can ever return 
> zero d_ino values.  It's going to be difficult to work this out due to 
> the VFS layer.  My hunch is that inode 0 is more likely to mean  I can't 
> tell you the inode number right now here , and not  please skip this entry .

I don't know where your hunch comes from, but I don't believe it for a
second.  The d_ino==0 convention has been universal in Unix since before
Linux existed.  Any filesystem that returns d_ino==0 entries for getdents
will have those entries completely ignored by userland, so I doubt anyone
has implemented a filesystem that way.

> In a sense, with glob64, supplying readdir64 as the callback does not 
> actually change behavior if the d_ino == 0 is left out because that 
> readdir64 implementation would not be POSIX-compliant (d_ino has to be 
> the file serial number if the member exists).
>
> I think we have to filter in readdir in order to comply with POSIX if 
> the alleged historic race (d_ino is 0 during removal) can happen on 
> Linux or Hurd.

It was never a race.  In filesystems where it happened, it was the case
that removing an entry often consisted of nothing but clearing its d_ino
field in the on-disk directory.  Only some subsequent operation on the same
directory, like creating a new entry, might overwrite deleted entries or
cause a compaction.  And originally, the on-disk directory format was just
read directly using 'read' (there was no separate getdirentries/getdents
system call), so it was a userland responsibility to ignore the deleted
entries.

Hmm.  Perhaps no readdir function has actually returned d_ino==0 entries
since before 1003.1-1988.  The GNU implementations have indeed always
filtered out such entries.  But code using readdir has always had the
checks.

I don't have the ancient 1003.1 editions in front of me right now (can
probably check next week when I'm back home).  I'm not sure whether there
was ever a standard for readdir that said anything about d_ino different
from what the current POSIX.1 says (which indeed disallows d_ino==0
entries).  My recollection is that early versions of 1003.1 did not specify
d_ino/d_fileno at all.

> +An implementation of @code{gl_readdir} should initialize the
> +@code{struct dirent} object to zero, up to the @code{d_name} member.
> +The @code{d_ino} member must be set to a non-zero value, otherwise
> +@code{glob} may ignore the returned directory entry.  

This wording is awkward.  It seems to contradict itself, as it first
implies that d_ino should be zero and then says it must be nonzero.  I'm
not convinced this "zero the object" advice is the way to go anyway.  If
there is struct padding in struct dirent, it doesn't matter that it be
zero.  If it has d_namlen, it (now) doesn't matter what it contains at all.
Perhaps it's better to say just which fields glob examines and that the
protocol requires only that those fields be set.

Regardless of what we settle on as best practice, I think we want a little
code example here that demonstrates it.

> +The @code{struct dirent} object can be overwritten by a subsequent call
> +to the @code{gl_readdir} callback function on the same directory stream
> +created by the @code{gl_opendir} callback function.  It should be
> +deallocated by the @code{gl_closedir} callback function.

I think it would be better to word this in terms of what glob does, rather
than what the callback should do.  e.g.

@code{glob} examines the @code{struct dirent} object only immediately after
each call to your @code{gl_readdir} callback function and does not store
the pointer returned.  That pointer is not expected to remain valid after a
subsequent call to the @code{gl_readdir} or @code{gl_closedir} callback.  A
common way to implement these callbacks it to reuse the same buffer each
time the @code{gl_readdir} function is called and free the buffer in the
@code{gl_closedir} function.


The code changes look fine to me.


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]