This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")

From: Igor Pechtchanski <pechtcha at cs dot nyu dot edu>
To: Stephan Mueller <smueller at exchange dot microsoft dot com>
Cc: cygwin at cygwin dot com, Krzysztof Duleba <krzysan at skrzynka dot pl>
Date: Wed, 27 Jul 2005 23:04:53 -0400 (EDT)
Subject: RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")
References: <23AA05B1B7171647BC38C5D761900EA40223C84E@DF-SEADOG-MSG.exchange.corp.microsoft.com>
Reply-to: cygwin at cygwin dot com

On Wed, 27 Jul 2005, Stephan Mueller wrote:

> Igor Pechtchanski wrote:
>
> " (I wrote:)
> " > End result is that the perl internal representation in the example
> " > above probably only needs about 200MB of space, and not double that,
> " > as suggested.
> "
> " Umm, that was unclear from the description on the perlunicode manpage.
> " That, combined with Perl actually taking up 500M of memory with one
> " string of 200,000,000 characters, led me to believe that Perl uses
> " UCS-2 internally.
> "
> " Do you have another explanation for the doubled memory consumption?
> " 	Igor
>
> The admittedly old perl pages (perl 5.6) I have handy right now include
> the following near the top of the perlunicode page.  I strongly doubt
> this has changed in 5.8.
>
>   Byte and Character semantics
>
>     Beginning with version 5.6, Perl uses logically wide characters to
>     represent strings internally. This internal representation of strings
>     uses the UTF-8 encoding.

Yep, it has.  Here's all the Cygwin 5.8.7 manpage has to say on the
matter:

   Byte and Character Semantics

   Beginning with version 5.6, Perl uses logically-wide characters to rep-
   resent strings internally.

   In future, Perl-level operations will be expected to work with charac-
   ters rather than bytes.

There is also some text on encodings, but all it says is that an explicit
"use utf8" pragma is needed to recognize byte strings as UTF-8.  It says
nothing about the internal representation of strings.

> I've also found text suggesting the same in Chapter 15 of the Camel
> book.
>
> Unfortunately, I don't have another explanation for the doubled memory
> consumption.

It could be that the default encoding has changed, and could be forced
back to utf8 by the "use utf8" pragma...  The Perl maintainer might be in
a better position to comment on this.

FWIW, neither "use utf8" nor "use bytes" seems to change the memory
consumption of that sample script.
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha@cs.nyu.edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor@watson.ibm.com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

If there's any real truth it's that the entire multidimensional infinity
of the Universe is almost certainly being run by a bunch of maniacs. /DA

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

References:
- RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")
  - From: Stephan Mueller

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]