This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Hang with 20051018 (3rd version) snapshot while building OOo


Volker Quetschke wrote:
Christopher Faylor wrote:
On Wed, Oct 19, 2005 at 03:45:30PM -0400, Volker Quetschke wrote:
(snip)
Given the number of changes that have been made to cygwin, particularly
in /proc handling, it's very difficult for me to believe that you are
not seeing *any* differences in behavior and
Well, there are differences in the frequency of occurrence of the hangs.

I'm wondering if you're
actually seeing what you think you're seeing, i.e., I'm wondering if the
process is just timing out and you are attributing it coming "unstuck"
to the fact that you're doing "ls /proc/*/fd".  I can't see any reason
why inspecting /proc should cause any kind of special behavior in the
latest snapshots since /proc handling now occurs in its own thread.

I can completely understand your worries. My problem is that I cannot reproduce the problem myself and all I can do is ask the people who have this problem to try get some debug information.

I just asked for a confirmation that it really is the "ls /proc/*/fd"
that "unstucks" the process. I don't believe that "/usr/bin/tcsh -fc pwd"
needs a long time to finish so that we're getting a coincidence there.
I got some information back:
It is done like this, the build is running/hanging in one shell (1).

When it hangs, start a new tcsh shell (2) and get the ps and cygcheck
information. Then open a new bash (3) and start "strace -p <pidhang>"
Now in (2) start
		while 1
			ls /proc/<pidhang>/fd
		end
until the strace is ready.

Some details: The build is running on a local NTFS drive. It's a dedicated
machine, not much is running beside the build.

He wrote that 20051019 also produced a hang and that I'll get the next ;)
strace.

Clueless

Volker


Having said that, I never realized that before, maybe the problem really
lies in this special command. I mean due to some historic quirks every
makefile in the OOo tree has a line that sets a macro to the current path
using that command, but there are still lots of other commands (also executed
in a tcsh shell) in these makefiles that I never heard of to hang.
(I'll also verify that what I just said is really true, it's just an idea.)


I could almost convince myself that there was a race in /proc handling
before but I could never convince myself that doing something like "ls /proc/*/fd"
would have any effect on it.  Nevertheless, I did make some changes to
eliminate the potential source of hangs in this code.  So, I can't
understand why you wouldn't see something different.


I have no clue either, especially as I also cannot reproduce and therefore
cannot pinpoint the problem. :(

Anyway, thanks for all your efforts!

Volker



--
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D

Attachment: signature.asc
Description: OpenPGP digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]