This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")


On Fri, 29 Jul 2005, Yitzchak Scott-Thoennes wrote:

> On Wed, Jul 27, 2005 at 05:07:23PM -0700, Stephan Mueller wrote:
> > "Igor Pechtchanski wrote:
> > "
> > " On Thu, 28 Jul 2005, Krzysztof Duleba wrote:
> > " > > > I've simplified the test case. It seems that Cygwin perl can't
> > " > > > handle too much memory. For instance:
> > " > > >
> > " > > > $ perl -e '$a="a"x(200 * 1024 * 1024); sleep 9'
> > " > > >
> > " > > > OK, this could have failed because $a might require 200 MB of
> > " > > > continuous space.
> > " > >
> > " > > Actually, $a requires *more* than 200MB of continuous space.  Perl
> > " > > characters are 2 bytes, so you're allocating at least 400MB of
> > space!
> > " >
> > " > Right, UTF. I completely forgot about that.
> > "
> > " Unicode, actually.
> >
> > Unicode is a standard that defines 'code points' (numeric values) for a
> > whole lot of different characters.  UTF-8 is a specific encoding of
> > Unicode.  It has the nifty property that ASCII characters are encoded
> > just as in ASCII -- one byte, with the high bit clear, and the low seven
> > bits representing a character in the range 0..127.  Characters above the
> > ASCII range require multiple bytes -- sometimes two, sometimes more.
> > The algorithm is quite clever; find it in The Unicode Standard or with a
> > quick Google search.
> >
> > Another popular encoding is UCS-2, which is roughly "16-bit words each
> > holding one Unicode character".
> >
> > The latter is frequently what people think of as "Unicode".  The former
> > is what perl uses internally to encode characters.
> >
> > End result is that the perl internal representation in the example above
> > probably only needs about 200MB of space, and not double that, as
> > suggested.
>
> Correct; perl uses UTF-8 (actually, an extension of UTF-8 which allows
> codepoints up to 2**72-1).

As I said before, it might be nice if this were clearer from the
perlunicode man page.

> However code like the above does end up using twice the space; it's
> allocated once to store the result of the x operation and again when
> it's copied to $a.

D'oh!  I forgot that this was an assignment, not an initialization.  I
feel properly chastised. :-)
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha@cs.nyu.edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor@watson.ibm.com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

If there's any real truth it's that the entire multidimensional infinity
of the Universe is almost certainly being run by a bunch of maniacs. /DA

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]