This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")


On Wed, 27 Jul 2005, Stephan Mueller wrote:

> "Igor Pechtchanski wrote:
> "
> " On Thu, 28 Jul 2005, Krzysztof Duleba wrote:
> " > > > I've simplified the test case. It seems that Cygwin perl can't
> " > > > handle too much memory. For instance:
> " > > >
> " > > > $ perl -e '$a="a"x(200 * 1024 * 1024); sleep 9'
> " > > >
> " > > > OK, this could have failed because $a might require 200 MB of
> " > > > continuous space.
> " > >
> " > > Actually, $a requires *more* than 200MB of continuous space.  Perl
> " > > characters are 2 bytes, so you're allocating at least 400MB of space!
> " >
> " > Right, UTF. I completely forgot about that.
> "
> " Unicode, actually.
>
> Unicode is a standard that defines 'code points' (numeric values) for a
> whole lot of different characters.  UTF-8 is a specific encoding of
> Unicode.  It has the nifty property that ASCII characters are encoded
> just as in ASCII -- one byte, with the high bit clear, and the low seven
> bits representing a character in the range 0..127.  Characters above the
> ASCII range require multiple bytes -- sometimes two, sometimes more. The
> algorithm is quite clever; find it in The Unicode Standard or with a
> quick Google search.

I'm very familiar with the algorithm and the UTF-8 encoding, thanks.

> Another popular encoding is UCS-2, which is roughly "16-bit words each
> holding one Unicode character".
>
> The latter is frequently what people think of as "Unicode".

Yes, that's the one I meant.  Sorry for being imprecise.

> The former is what perl uses internally to encode characters.
>
> End result is that the perl internal representation in the example above
> probably only needs about 200MB of space, and not double that, as
> suggested.

Umm, that was unclear from the description on the perlunicode manpage.
That, combined with Perl actually taking up 500M of memory with one string
of 200,000,000 characters, led me to believe that Perl uses UCS-2
internally.

Do you have another explanation for the doubled memory consumption?
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha@cs.nyu.edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor@watson.ibm.com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

If there's any real truth it's that the entire multidimensional infinity
of the Universe is almost certainly being run by a bunch of maniacs. /DA

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]