This is the mail archive of the
mailing list for the Archer project.
Re: FYI GDB on-disk .debug cache (mmapcache) [Re: Tasks]
On Sat, 09 Aug 2008 00:19:18 +0200, Tom Tromey wrote:
> >>>>> "Jan" == Jan Kratochvil <email@example.com> writes:
> Jan> I would try dwarf2_build_psymtabs_easy() myself now, so far just
> Jan> with the public symbols (regressing GDB). If it would be fast
> Jan> GCC can provide even indexes for the static symbols.
> I re-did my profiling on the pubnames case. The do-nothing
> dwarf2_build_psymtabs_easy does cut down the CPU time a lot.
> It does still read the contents of debug_pubnames, so the mystery time
> does not disappear:
> /usr/bin/time using _hard: 45.73user 2.74system 1:18.35elapsed 61%CPU
> /usr/bin/time using _easy: 8.84user 3.01system 0:56.95elapsed 20%CPU
> Of course it is hard to say what the improvement would really look
> like when the _easy stuff is actually in place. 20 seconds maximum
> improvement here ... that is nice but hardly in "awesome" territory.
> It seems weird that the elapsed time does not vary as much as the user
> time. I wonder what that means.
I still think you were seeking with the disk, right? I borrowed one big iron
box in RH with 32GB of RAM. F9.x86_64 ooffice with all the document types
_easy(): (211 .so libraries symbols read)
9.52user 2.22system 0:11.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
43.2%: d_demangle (& the associated d_print* inefficiencies)
_hard(): (210 .so libraries symbols read)
29.54user 2.30system 0:31.87elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
non-cached (sync; echo 3 > /proc/sys/vm/drop_caches) - "small iron":
39.34user 4.47system 1:06.66elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
This means for big iron (machines with many GB of RAM):
* We can save almost 63% of the time by implementing _easy() for gdb+gcc.
* From the remaining 37% we still can save a lot by optimizing the symtab
reading CPU overhead.
(both cases without introducing cache files)
With _easy() we will do a full read of only a neglible count of CUs.
For small iron (less than 1GB disk-cache + 1.4GB GDB):
* Cache files may improve it (like I attempted with mmapcache myself).
* SSDs (flash drives) - if we can assume them - have no seek time making the
small<->big iron difference less a pain.
For the disk cache files (like my mmapcache) possibility for non-SSD drives
for OOo with about ~200 shared libraries a single seek to the premapped cache
file makes 22ms*200==4.4s (22ms for lseek() with all its ext3 overhead).
Therefore the cache file may make sense to be a single file per execfile than
a cache file per each objfile.
> So -- at least to me it is not obvious what to do. Hiding the time
> (not reading anything until needed) is nice for attach, but, I think,
> won't hugely improve the user experience (unless we can somehow also
> avoid reading a lot of the data in all cases).
_easy() should avoid us reading almost all of the data.
> Maybe we could do something like your patch, but rather than mmap the
> data structures, store compressed data structures, on the theory that
> we would trade size on disk for some cpu.
Sure even the _easy() and symtabs reading could be optimized by some separate
cache files but this should be only the next step afterwards.
> Another thing I am curious about is seeing how elfutils fares on this
Besides elf_symtab_read() in general I find BFD under 2% with _easy().