This is the mail archive of the binutils@sources.redhat.com mailing list for the binutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: parallelized 'ld'?

From: Cynbe ru Taren <cynbe at muq dot org>
To: Paul Hilfinger <hilfingr at EECS dot Berkeley dot EDU>
Cc: binutils at sources dot redhat dot com, "Alexander Smundak" <asmundak at cisco dot com>
Date: 15 Aug 2003 12:44:27 -0500
Subject: Re: parallelized 'ld'?
References: <200307150234.h6F2YsNW028337@tully.CS.Berkeley.EDU>

(Sorry for delayed reply -- I've been swamped by other stuff.)

I've chatted with another of our engineers (Alexander Smundak,
asmundak at cisco dot com) and it appears that my presumption
that 'ld' &kith would already be well-tuned at the monoprocessor
level is untrue, at least for programs in the 1-10 megaline size
range:  Use of algorithms and datastructures which do not scale
well into this range appears to be a pervasive problem.

In this particular case, expanding one fix-sized hash table
bfd_hash_table) from 4K to 64K slots chopped our link time down from
3.5 to 1.5 minutes: We were getting hashtable chains literally
hundreds of items long, resulting in exactly the sort of linear search
behavior that hash tables are supposed to avoid.  (Yes, I'm mailing a
more specific report to bug-binutils@gnu.org.)

Alexander made some further observations suggesting a general lack
of attention to good scalability:

* "gcc generates enormous amount of the debugging information.
   Our .sun file is about 200Mb, whereas its useful payload is about 20Mb,
   the rest is just debugging information. And the debugging format we use
   is STABS, which is way more compact than DWARF2, which is the default
   for the newer versions of gcc. When we tried to generate DWARF2, our
   .sun file size grew to almost 1Gb."

* "gdb used to take inordinate amount of time to load on our platform. The
   fix  was trivial, which likely means that few people ever use GDB on
   such large executables."

* "make generates either too much output or no output at all. This leads to
   people developing half-baked solutions such as
      if QUIET=1, echo "Compiling foo.c..."
   Jam (make alternative) uses IMHO way better approach: by default, it
   prints just the name of each target as it was built; if build fails,
   it prints the offending command."

It is clearly unrealistic to expect the GNU maintainers to
put major effort into supporting the relatively small number
of users who work with megaline scale programs.

But it shouldn't it be simply standard good programming practice to
use algorithms and datastructures which scale reasonably gracefully?

Automatically expanding hashtables when they reach ludicrous overload
levels isn't that hard -- I've been doing it without a second thought
in everything I've written for the last twenty years.  No?

Or if one is ABSOLUTELY in love with hashtable size and location being
constant in order to --maybe-- shave off a clock cycle or two per
access, an overflow binary tree is cheap insurance.

I believe usability studies show that users prefer consistently good
perfomance to mostly-great-but-occasionally-horrible performance --
robustness counts!  (One might file that under the Principle of Least
Surprise?)

Anyhow:- Looks like parallelizing the 'ld' core would be
premature -- there is indeed low hanging fruit elsewhere. :)

 -- Cynbe

Paul Hilfinger <hilfingr@EECS.Berkeley.EDU> writes:

> The more interesting question you should ask is "what makes our ld's
> so slow?", to be asked just after "how slow is it?".  
> 
> You're doing a bunch of additions to relocate stuff, but in a
> 10,000,000-line C program, how much time are we talking about?
> Suppose it's 100,000,000 additions (which I think is a very high
> estimate).  Just how much processor time does it take to extract a bit
> field, increment its contents, and store it back 100,000,000 times?
> I'm glad you asked: on my several-year old UltraSparc, about 4
> seconds.  How about the time to perform, let's say, 10,000,000
> external-symbol lookups and definitions using a hashtable?  Using G++
> strings and hash_maps, perhaps 25 seconds on the same equipment.  How
> about to read and write 20MB of .o and executable files?  About a
> second.  What kinds of numbers are you seeing for object file sizes
> and ld times?
> 
> Paul Hilfinger

Follow-Ups:
- Re: parallelized 'ld'?
  - From: DJ Delorie
- Re: parallelized 'ld'?
  - From: Zack Weinberg

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]