This is the mail archive of the binutils@sources.redhat.com mailing list for the binutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Use hashtab.[ch] hashtable for IA-64 loc_hash_table

From: David Mosberger <davidm at napali dot hpl dot hp dot com>
To: Jakub Jelinek <jakub at redhat dot com>
Cc: davidm at HPL dot HP dot COM, Alan Modra <amodra at bigpond dot net dot au>,binutils at sources dot redhat dot com
Date: Thu, 11 Sep 2003 13:50:53 -0700
Subject: Re: [PATCH] Use hashtab.[ch] hashtable for IA-64 loc_hash_table
References: <Pine.LNX.4.44.0309091743500.13736-100000@wotan.suse.de><16222.51765.814960.827769@napali.hpl.hp.com><20030910072010.GA1443@bubble.modra.org><16222.54088.44588.613549@napali.hpl.hp.com><20030910091244.GB1443@bubble.modra.org><16223.24453.458479.124065@napali.hpl.hp.com><20030911175902.GC12344@sunsite.ms.mff.cuni.cz>
Reply-to: davidm at HPL dot HP dot COM

Hi Jakub,

>>>>> On Thu, 11 Sep 2003 19:59:02 +0200, Jakub Jelinek <jakub@redhat.com> said:

  Jakub> On Wed, Sep 10, 2003 at 10:29:41AM -0700, David Mosberger wrote:
  >> Each histogram sample counts as 1.00056m seconds
  >> % time      self     cumul     calls self/call  tot/call name
  >> 17.90      0.88      0.88      170k     5.21u     5.60u get_dyn_sym_info
  >> 9.69      0.48      1.36     1.05M      457n      494n vfprintf
  >> 7.24      0.36      1.72     1.79M      200n      201n sec_merge_hash_lookup
  >> 5.98      0.30      2.01     5.93M     49.7n     49.7n _IO_str_overflow

  >> get_dyn_sym_info() is ia64-specific and looks like it's doing a linear
  >> list-traversal.  Seems like a fairly obvious candidate for
  >> optimization.

  >> The time spent in vfprintf and _IO_str_overflow is also interesting:
  >> those come from get_local_sym_hash(), which does:

  >> sprintf (addr_name, "%x:%lx", ..)

  >> Surely we can do better than that... ;-)

  Jakub> The following patch gets rid of that sprintf (and generally,
  Jakub> using string keys when the keys are really pair of two
  Jakub> (typically small) integers).

Cool!

  Jakub> It compiled, no make check regressions and built glibc.
  Jakub> Unfortunately, I don't have access to sufficiently idle
  Jakub> system to be able to do any useful benchmarking.

Happy to help with that: it shaved another 2 seconds out of the
"minium kernel rebuild"!

$ time make CROSS_COMPILE=/opt/gcc-pre3.4/bin/ vmlinux
make[1]: `arch/ia64/kernel/asm-offsets.s' is up to date.
  CHK     include/linux/compile.h
  CPP     arch/ia64/kernel/gate.lds.s
  AS      arch/ia64/kernel/gate.o
  GATE arch/ia64/kernel/gate.so
  AS      arch/ia64/kernel/gate-data.o
  GATE arch/ia64/kernel/gate-syms.o
  LD      arch/ia64/kernel/built-in.o
  CC      kernel/configs.o
  LD      kernel/built-in.o
  GEN     .version
  CHK     include/linux/compile.h
  UPD     include/linux/compile.h
  CC      init/version.o
  LD      init/built-in.o
  LD      .tmp_vmlinux1
  KSYM    .tmp_kallsyms1.S
  AS      .tmp_kallsyms1.o
  LD      .tmp_vmlinux2
  KSYM    .tmp_kallsyms2.S
  AS      .tmp_kallsyms2.o
  LD      vmlinux

 real    0m23.148s
 user    0m16.874s
 sys     0m1.879s

The link-time still dominates, but it's getting to the point where I
need figure out why gate.so and gate-syms.o get rebuilt each time...

In terms of the linker's execution profile, here is where things stand
now (this is still with hash.c:DEFAULT_SIZE set to 65521):

% time      self     cumul     calls self/call  tot/call name
 25.34      0.87      0.87      124k     7.05u     7.15u get_dyn_sym_info
 10.29      0.36      1.23     1.75M      203n      204n sec_merge_hash_lookup
  5.22      0.18      1.41     3.90M     46.2n     46.2n ld:__umoddi3
  4.32      0.15      1.56         -         -         - elf64_ia64_relocate_sect
ion
  4.00      0.14      1.70      524k      263n      271n elf64_ia64_global_dyn_sy
m_thunk

The call-graph info for get_dyn_sym_info() isn't terribly interesting,
because most of the time is spent in get_dyn_sym_info() itself (again,
I suspect this is due to the linear list traversal).

The call-graph info for sec_merge_hash_lookup() is more useful:

% time   self	   children  called

         0.149      840u      736k              sec_merge_add
         0.206     1.16m     1.02M              _bfd_merged_section_offset
11.2     0.355     2.00m     1.75M          sec_merge_hash_lookup
         0.117      0.00     1.91M/1.98M        memcmp
         2.00m      0.00     80.9k/80.9k        sec_merge_hash_newfunc
         82.6m      0.00     1.79M/3.90M        ld:__umoddi3

So sec_merge_hash_lookup() burns 0.355 seconds on it's own, and
another 0.117 seconds through calls to memcmp().  The other calls are
relatively cheap (2msec for sec_merge_hash_new_func and 82.6msec for
ld:__umoddi3).

	--david

Follow-Ups:
- Re: [PATCH] Use hashtab.[ch] hashtable for IA-64 loc_hash_table
  - From: Michael Matz

References:
- [patch] ld speedup 1/3 (suffix merge)
  - From: Michael Matz
- Re: [patch] ld speedup 1/3 (suffix merge)
  - From: David Mosberger
- Re: [patch] ld speedup 1/3 (suffix merge)
  - From: Alan Modra
- Re: [patch] ld speedup 1/3 (suffix merge)
  - From: David Mosberger
- Re: [patch] ld speedup 1/3 (suffix merge)
  - From: Alan Modra
- Re: [patch] ld speedup 1/3 (suffix merge)
  - From: David Mosberger
- [PATCH] Use hashtab.[ch] hashtable for IA-64 loc_hash_table
  - From: Jakub Jelinek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]