This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: cant access to files more than 128 utf-8 symbol long names
- From: Andrey Repin <anrdaemon at yandex dot ru>
- To: Corinna Vinschen <cygwin at cygwin dot com>
- Date: Wed, 11 Dec 2013 11:04:39 +0400
- Subject: Re: cant access to files more than 128 utf-8 symbol long names
- Authentication-results: sourceware.org; auth=none
- References: <52A6BFA4 dot 9010101 at spektr-rfs dot ru> <20131210102755 dot GQ2527 at calimero dot vinschen dot de>
- Reply-to: Andrey Repin <cygwin at cygwin dot com>
Greetings, Corinna Vinschen!
> The problem here is about NAME_MAX. NAME_MAX is per POSIX[1] the
> "maximum number of bytes in a filename (not including the terminating
> null)."
Does this mean that POSIX standard is not compatible with real life?
No surprise I was having hard times copying a rather simple directory
structure to a UNIX servers. Just 2 levels deep with 4-5 words in each
element name.
> Note the word *bytes*. Not characters, bytes. UTF-8 chars are 1 to 4
> bytes in length. Thus, the maximum number of UTF-8 chars in a filename
> is potentially less than NAME_MAX:
> A filename of chars only from the basic latin charset (1 byte in UTF-8)
> may consist of NAME_MAX characters, a filename solely constructed from
> chars of the latin-1 supplement (2 byte chars) may consist of NAME_MAX /
> 2 characters, a filename constructed from emoticons (4 byte chars) only
> of NAME_MAX / 4 chars.
> Ok, so we all know that Windows is not using a byte representation of
> filenames, rather the OS uses UTF-16 to store and handle filenames
> internally. Filename on Windows filesystems may consist of 255 UTF-16
> chars[2].
> How do you represent this in a byte-oriented POSIX system? What do you
> set NAME_MAX to? You can't get it right due to the unfortunate multibyte
> vs. UTF-16 encoding issue.
> To cover all UTF-8 chars, NAME_MAX would have to be 1020. But then,
> applications relying on NAME_MAX will be surprised by ENAMETOOLONG
> errors for perfectly valid POSIX filenames.
> If you make it 255, applications will be surprised by ENAMETOOLONG
> errors for perfectly valid Windows filenames.
> If you make it 255 on the application level but then return filenames
> longer than 255 multibyte chars to the application, they will crash
> due to buffer overflow issues. After all, NAME_MAX is a contractual
> obligation.
> There was also the backward compatibility issue. Back in the pre-Cygwin
> 1.7 days, when Cygwin used the ANSI file API, NAME_MAX was already 255.
> Changing that to a bigger value might have resulted in the
> aforementioned application crashes due to buffer overflows as well.
> So we decided to keep NAME_MAX at the same value as it always was, 255.
> This restricts the actual filename length when using multibyte
> characters just as on any other POSIX system with the downside that,
> occasionally, a Windows filename will be too long to handle.
> Sorry if that is frustrating in your current situation, but this
> isn't something we can just change at a whim and go ahead. It would
> break compatibility with all existing Cygwin executables.
> Corinna
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html
> [2] However, this does *not* cover NFS or other filesystems using a
> byte representation for storing filenames.
--
WBR,
Andrey Repin (anrdaemon@yandex.ru) 11.12.2013, <10:55>
Sorry for my terrible english...
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple