This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] GDB performance testing infrastructure


Hi,
Here is a proposal of GDB performance testing infrastructure.
We'd like to know how people think about this, especially on,

  1) What performance issues this infrastructure can test or
handle,
  2) What does this infrastructure look like?  What it can do
and what it can't do.

I've written some micro-benchmarks, and run them in this
infrastructure prototype.  The results look reasonable and
interesting.

Table of Contents
_________________

1 Motivation and Goals
.. 1.1 Goals
2 Known works
3 Design
.. 3.1 Requirements
.. 3.2 Design
4 Example
.. 4.1 single step
.. 4.2 shared library





1 Motivation and Goals
======================

  GDB development process has no standard mechanism to show the
  performance of GDB snapshot or release is improved or worsened.  We
  run regression tests which address only questions of functionality.
  Performance regressions do show up periodically.

  We really needs performance testing in GDB development, especially for
  these following areas, to make sure there is no performance regression
  introduced in the development.

  * Remote debugging.  It is slower to read from the remote target, and
    worse, GDB reads the same memory regions in multiple times, or reads
    the consecutive memory by multiple packets.
  * Symbols.  Some of the performance problems in GDB are related to
    symbols.  When GDB is used to debug large programs in real life,
    such as LibreOffice, which has a huge number of symbols, it is a
    challenge to GDB to organize them in an efficient way.  We can also
    find some bugs reported in bugzilla, such as [PR15412], [PR14125],
    etc.  Issues are documented on [wiki].
  * Shared library.  When a program needs a large number of shared
    libraries, GDB will be slow.  Gary improved the performance in this
    area, but there is still an open bug on scalability ([PR15590]).
  * Tracepoint.  Tracepoint is designed to be efficient on collecting
    data in the inferior, so we need performance tests to guarantee that
    tracepoint is still efficient enough.  Note that we a test
    `gdb.trace/tspeed.exp', but there are still some rooms to improve.


  [PR15412] http://sourceware.org/bugzilla/show_bug.cgi?id=15412

  [PR14125] http://sourceware.org/bugzilla/show_bug.cgi?id=14125

  [wiki] http://sourceware.org/gdb/wiki/SymbolHandling

  [PR15590] http://sourceware.org/bugzilla/show_bug.cgi?id=15590


1.1 Goals
~~~~~~~~~

  The goals in this project are:

  1. Collect performance data of GDB in various areas under different
     supported configurations.  These areas or aspects include
     performing single step, thread-specific breakpoint, stack
     backtrace, symbol lookup, shared library load/unload etc.
     Configurations includes native debugging and remote debugging with
     GDBserver.  This framework include some micro-benchmarks and
     utilities to record the performance data, such as execution time
     and memory usage of micro-benchmarks.
  2. Detect performance regressions.  We collected the performance data
     of each micro-benchmark, and we need to detect or identify the
     performance regression by comparing with the previous run.  It is
     more powerful to associate it with continuous testing.


2 Known works
=============

  * [LNT] It was written for LLVM, but is *designed* to be usable for
    the performance testing of any software.  It is written in python,
    well-documented and easy to set up.  LNT spawn the compiler first
    and then target program, record the time usages of compiler and
    target program in json format.  No interaction is involved.  The
    performance data collection in LNT is relatively simple, because it
    is targeted to compiler.  The performance testing part is done, and
    the next step is to show the data and detect performance
    regressions.  LNT does a lot work here.  The performance data in
    json format can be imported to a database, and shown through [web].
    The performance regression will be highlighted in red.

  * [lldb] LLDB has a [performance.py] to measure the speed and memory
    usage of LLDB.  It captures the internal events, feeds some events
    and record the time usages.  It handles interactions by consuming
    debugging events, and take some actions accordingly.  It only
    collects performance data, doesn't detect performance regressions.

  * libstdc++-v3 There is directory performance in
    libstdc++-v3/testsuite/ and a header testsuite_performance.h in
    testsuite/util/.  Test cases are compiled with the header, and run
    with some large data set, to calculate the time usage.  It is
    suitable for performance testing for a library.


  [LNT] http://llvm.org/docs/lnt/index.html

  [web] http://llvm.org/perf/db_default/v4/nts/recent_activity

  [lldb] http://lldb.llvm.org/

  [performance.py]
  http://llvm.org/viewvc/llvm-project/lldb/trunk/examples/python/performance.py


3 Design
========

3.1 Requirements
~~~~~~~~~~~~~~~~

  + Drive GDB to do some operations and record the performance data.
    Especially to drive GDB for these cases:
    * Libraries are loaded or unloaded in a program, which has a large
      number shared libraries, 4096 libraries, for example,
    * Look up a symbol in a program which has a large number of symbols,
      1 million, for example,
    * Do single step, disassembly or other operations in remote
      debugging,
  + Both native debugging and remote debugging are supported.
  + Display the performance data in some format, plain text or html.
  + Detect the performance regressions.  In functional regression
    testing, we can simply diff the two `gdb.sum' files and get to know
    the regressions or progressions.  In performance testing, we need to
    analyze the performance data in two runs to find the regression
    instead of simply comparing them by diff.
  + Highlight regressions.  It makes sense to show the regression or
    progression greater than a certain threshold, 5%, for example.

  The first three requires are the minimum set, and can be met in a
  short term.  Our ultimate goal is to keep track of the performance of
  GDB, and improve its performance in some areas, instead of developing
  a full-functional performance testing framework.  In the long term, we
  can improve the framework gradually and meet the last two
  requirements.


3.2 Design
~~~~~~~~~~

  + Use `dejagnu' to invoke compiler to compile test case and start GDB
    (and/or GDBserver).  It is same as regression functional testing we
    do nowadays.  We choose `dejagnu' here because `dejagnu' handles GDB
    testing, especially when GDBserver is used, very well.  We don't
    have to re-invent the wheel in python.

  + GDB load a python script, in which some operations are performed and
    performance data (time and memory usage) is collected into a file.
    The performance test is driven by python, because GDB has a good
    python binding now.  We can use python too to collect performance
    data, process them and draw graph, which is very convenient.

  + Emulate the effect of large program, instead of using real large
    program.  Performance problem shows up when the program is *large*
    enough, in terms of a large number of symbols or shared libraries.
    Using real large program can trigger the problem, but other people
    are hard to reproduce it.  The test like this can be run regularly.

    1. When we test the performance of GDB handling shared library, we
       can use .exp script to generate a large number of c files,
       compile them to shared libraries, and let main executable load
       these libraries in order to measure the performance.

    2. When we test the performance of GDB reading symbols in and
       looking for symbols, we either can fake a lot of debug
       information in the executable or fake a lot of `objfile',
       `symtab' and `symbol' in GDB.  we may extend `jit.c' to add
       symbols on the fly.  `jit.c' is able to add `objfile' and
       `symtab' to GDB from external reader.  We can factor this part to
       add `objfile', `symtab', and `symbol' to GDB for the performance
       testing purpose.  However, I may be wrong.


4 Example
=========

4.1 single step
~~~~~~~~~~~~~~~

  For micro-benchmark `single-step', there are three source files,
  `single-step.c', `single-step.py' and `single-step.exp'.

  `single-step.exp' is similar to our regression tests in `gdb.python'
  directory,

  ,----
  | if ![runto_main] {
  |     return -1
  | }
  | 
  | set remote_python_file [remote_download host ${srcdir}/${subdir}/${testfile}.py]
  | 
  | gdb_test_no_output "python exec (open ('${remote_python_file}').read ())"
  | 
  | send_gdb "call \$perftest()\n"
  | set timeout 300
  | gdb_expect {
  |     -re "\"Done\".*${gdb_prompt} $" {
  |     }
  |     timeout {}
  | }
  | 
  | remote_file host delete ${remote_python_file}
  `----

  `single-step.py' is to drive GDB to do command `stepi' repeatedly and
  record the time usage.  Note that class `SingleStep' can be abstracted
  in a better way, for example, moving common code to class `TestCase',
  and extending it in class `SingleStep'.

  ,----
  | import gdb
  | import time
  | 
  | class SingleStep (gdb.Function):
  |   def __init__(self):
  |     # Each test has to register a convenience function 'perftest'.
  |     super (SingleStep, self).__init__ ("perftest")
  | 
  |   def execute_test(self):
  |     test_log = open ("perftest.log", 'a+');
  | 
  |     # Execute command 'stepi' in a number of times, and record the
  |     # time usage.
  |     for i in range(1, 5):
  |       start_time = time.clock()
  |       for j in range(0, i * 300):
  |         gdb.execute ("stepi");
  |       elapsed_time = time.clock() - start_time
  |       print >>test_log, 'single step %d in %s' % (i * 300, elapsed_time)
  | 
  |     test_log.close ()
  |   def invoke(self):
  |     self.execute_test()
  |     return "Done"
  | 
  | SingleStep ()
  `----

  * Run `single-step' with GDBserver
  ,----
  | $ make check RUNTESTFLAGS='--target_board=native-gdbserver single-step.exp'
  `----

  and the result `perftest.log' looks like, each row is about the time
  usage for doing a certain number of `stepi'
  ,----
  | single step 300 in 0.19
  | single step 600 in 0.35
  | single step 900 in 0.57
  | single step 1200 in 0.75
  `----

  * Run `single-step' without GDBserver

  ,----
  | $ make check RUNTESTFLAGS='--target_board=unix single-step.exp'
  `----

  and the result `perftest.log' looks like,

  ,----
  | single step 300 in 0.06
  | single step 600 in 0.08
  | single step 900 in 0.14
  | single step 1200 in 0.18
  `----


4.2 shared library
~~~~~~~~~~~~~~~~~~

  For micro-benchmark `solib', which is testing the performance of GDB
  handling shared libraries load and unload, there are three source
  files, `solib.c', `solib.py' and `solib.exp'.

  `solib.exp' is to generate many c files, and compile them into shared
  libraries.  `solib.c' is main program which load these libraries
  dynamically.  `solib.py' is a python script to call some inferior
  functions to load libraries and measure the time usages.

  Here is the performance data, and each row is about the time usage of
  handling loading and unloading a certain number of shared libraries.
  We can use this data to track the performance of GDB on handling
  shared libraries.

  ,----
  | solib 128 in 0.53
  | solib 256 in 1.94
  | solib 512 in 8.31
  | solib 1024 in 47.34
  | solib 2048 in 384.75
  `----

-- 
Yao (éå)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]