This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Reliable old script loses data on new Cygwin installation


René Berber wrote:
> [snip]
> > I have searched FAQs and mailing lists for problems with
> > "timeout" and the like but find nothing obviously relevant.
> [snip]
> 
> I have seen that problem and it has nothing to do with Cygwin.  The
> problem is with SATA drives and Window's asynchronous unbuffered disk
> I/O, and my Adaptec 1210SA SATA card and driver (actually, I know the
> driver is the culprit, but newer drivers are so bad that they can't even
> be installed).
> 
> With application that use asynchronous unbuffered disk I/O the disk
> stops responding after a while, and Windows pops up an error panel that
> shows the "timeout" message.
> 
> How does is happen in your setting?  I don't know.  Notice that I said
> "applications", Windows doesn't use the problematic mode, I've only seen
> a couple of applications using it when you configure them to use the
> fastest disk I/O possible, one used it all the time (Azureus?) so it was
> unusable with that disk.  Could the problem be caused by something else
> that runs at the same time as your script?

Thank you, René.  

At first we did think it was a problem with the SATA disk --
and it does seem like a plausible explanation for the error
message:

     "Windows - Device TimeOut"
     The specified I/O operation on \Device\Harddisk7\DR10 was not completed before the 
     time-out period expired.

However, the other problem (see below) has occurred --
sporadically -- on three different machines, all running
German or English-language versions of XP, two with SATA
disks and one with an ATA disk, all with freshly downloaded
installations of cygwin.  The line that causes the problem is:

> 	gawk '$1 !~ /^LINX/ $3' >|/tmp/sht2080.tmp; mv /tmp/sht2080.tmp huh2
...
> What I get is error messages like the following:
> 
>     mv: cannot create regular file `huh2': Permission denied
>     gawk: cmd. line:1: fatal: cannot open file `huh2' for reading (No such file or directory)
>     gawk: cmd. line:1: fatal: cannot open file `huh2' for reading (No such file or directory)
> 
> What I then find is that data has been lost.  If I interrupt
> the script right after the error message I find files 
> (such as "huh2") that have a length of zero -- OR I find a file
> listed with a correct-looking length but garbage contents.
> For example, the text file (before running the script):
> 
>     - 2007-10-28  20:25       4010  german        
> 
> comes out the other end looking like
> 
>     - 2007-10-28  20:31       4010  german        
> 
> but "od german" shows the _entire_ contents of the file to be:
> 
>     0000000 000000 000000 000000 000000 000000 000000 000000 000000
>     *
>     0007640 000000 000000 000000 000000 000000
>     0007652

It seems plausible (to me as a non-expert) that an asynchronous
unbuffered disk could be responsible for this problem too.
However, I am getting this error on _also_ on an older machine
with an ATA disk.

The three test machines on which the problem is occurring have
two things in common:

-- They all have some version of XP with the most recent Cygwin
   installation plus Firefox, OpenOffice, and Java and nothing else.

-- They are all faster than the machines I have been using over the
   years.

A colleague of mine suspects that the Korn shell script on
Cygwin is running so fast that occasionally the next command
is being executed before the buffer is written to disk.  Is it
possible that the shell is somehow creating the file "german"
(above), with its file name and length, a split second before
the contents are written to disk, then the next command is
being run too quickly, the script gets tripped up but keeps
running, and data is lost?

As this is happening both on a SATA disk and an ATA disk, I
can't help wondering whether cygwin is perhaps too efficient
for the faster hardware.  My colleague suggests I modify the
script to add 500 milliseconds of wait time between

    gawk '$1 !~ /^LINX/ $3' >|/tmp/sht2080.tmp
    
and

    mv /tmp/sht2080.tmp huh2

However, he says that this could conceivably solve the problem
for this script, but if the problem is that Cygwin is too fast
for the hardware I could still get this problem while using,
say, "mv".  Can this explanation be ruled out?

Tom

-- 
Tom Baker - tbaker@tbaker.de - baker@sub.uni-goettingen.de

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]