On a UNIX system, when an application reads from a file it gets exactly what's in the file on disk and the converse is true for writing. The situation is different in the DOS/Windows world where a file can be opened in one of two modes, binary or text. In the binary mode the system behaves exactly as in UNIX. However on writing in text mode, a NL (\n, ^J) is transformed into the sequence CR (\r, ^M) NL.
This can wreak havoc with the seek/fseek calls since the number of bytes actually in the file may differ from that seen by the application.
The mode can be specified explicitly as explained in the Programming section below. In an ideal DOS/Windows world, all programs using lines as records (such as bash, make, sed ...) would open files (and change the mode of their standard input and output) as text. All other programs (such as cat, cmp, tr ...) would use binary mode. In practice with Cygwin, programs that deal explicitly with object files specify binary mode (this is the case of od, which is helpful to diagnose CR problems). Most other programs (such as cat, cmp, tr) use the default mode.
The Cygwin system gives us some flexibility in deciding how files are to be opened when the mode is not specified explicitly. The rules are evolving, this section gives the design goals.
If the filename is specified as a POSIX path and it appears to reside on a file system that is mounted (i.e. if its pathname starts with a directory displayed by mount), then the default is specified by the mount flag. If the file is a symbolic link, the mode of the target file system applies.
If the file is specified via a MS-DOS pathname (i.e., it contains a backslash or a colon), the default is binary.
Pipes and non-file devices are opened in binary mode,
except if the
CYGWIN environment variable contains
In b20.1 of 12/98, a file will be opened in binary mode if any of the following conditions hold:
binary mode is specified in the open call
the filename is a MS-DOS filename
the file resides on a binary mounted partition
the file is not a disk file
When redirecting, the Cygwin shells uses rules (a-e). For
these shells the relevant value of
CYGWIN is that at the time
the shell was launched and not that at the time the program is executed.
Non-Cygwin shells always pipe and redirect with binary mode. With
non-Cygwin shells the commands cat filename | program
and program < filename are not equivalent when
filename is on a text-mounted partition.
To illustrate the various rules, we provide scripts to delete CRs from files by using the tr program, which can only write to standard output. The script
#!/bin/sh # Remove \r from the file given as argument tr -d '\r' < "$1" > "$1".nocr
will not work on a text mounted systems because the \r will be reintroduced on writing. However scripts such as
#!/bin/sh # Remove \r from the file given as argument tr -d '\r' | gzip | gunzip > "$1".nocr
and the .bat file
REM Remove \r from the file given as argument @echo off tr -d \r < %1 > %1.nocr
work fine. In the first case (assuming the pipes are binary) we rely on gunzip to set its output to binary mode, possibly overriding the mode used by the shell. In the second case we rely on the DOS shell to redirect in binary mode.
UNIX programs that have been written for maximum portability will know the difference between text and binary files and act appropriately under Cygwin. For those programs, the text mode default is a good choice. Programs included in official Cygwin distributions should work well in the default mode.
Text mode makes it much easier to mix files between Cygwin and Windows programs, since Windows programs will usually use the CRLF format. Unfortunately you may still have some problems with text mode. First, some of the utilities included with Cygwin do not yet specify binary mode when they should. Second, you will introduce CRs in text files you write, which can cause problems when moving them back to a UNIX system.
If you are mounting a remote file system from a UNIX machine, or moving files back and forth to a UNIX machine, you may want to access the files in binary mode. The text files found there will normally be in UNIX NL format, and you would want any files put there by Cygwin programs to be stored in a format understood by UNIX. Be sure to remove CRs from all Makefiles and shell scripts and make sure that you only edit the files with DOS/Windows editors that can cope with and preserve NL terminated lines.
Note that you can decide this on a disk by disk basis (for
example, mounting local disks in text mode and network disks in binary
mode). You can also partition a disk, for example by mounting
c: in text mode, and
in binary mode.
open() function call, binary mode can be
specified with the flag
O_BINARY and text mode with
O_TEXT. These symbols are defined in
fopen() function call, binary mode can be
specified by adding a
b to the mode string. Text mode is specified
by adding a
t to the mode string.
The mode of a file can be changed by the call
fd is a file
descriptor (an integer) and
O_TEXT. The function
on the mode before the call, and
EOF on error.