This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [python][patch] Python rbreak
- From: Simon Marchi <simon dot marchi at ericsson dot com>
- To: Phil Muldoon <pmuldoon at redhat dot com>, "gdb-patches at sourceware dot org" <gdb-patches at sourceware dot org>
- Date: Mon, 16 Oct 2017 18:22:04 -0400
- Subject: Re: [python][patch] Python rbreak
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=simon dot marchi at ericsson dot com;
- References: <f39f6365-fc45-ea7d-10dc-5e0053db5cbb@redhat.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On 2017-10-11 07:30 AM, Phil Muldoon wrote:
> This introduces a Python rbreak function to the Python API. Its
> functionality was designed to closely match that of the console rbreak
> but with a number of caveats.
>
> That being said, there's a bit of preamble first.
>
> My original design was to have the gdb.Breakpoint constructor take a
> regex pattern. While breakpoints can, and do, have multiple locations
> they are always tied in with one linespec. I looked at the
> linespec/location code and the relationship is pretty firmly embedded
> as a one to many relationship. One linespec can have multiple
> locations but only one linespec can be represented in the location
> object. (Or address or probe or whatever the location is
> representing). A linespec has to be definitive; it has to point to one
> symbol. I thought about modifying this to have multiple symbols and
> multiple locations but quickly decided this was really going to
> violate the design contract of linespecs/location. It would also be
> considerable and disruptive work. I looked then, again, at a
> breakpoint having multiple linespecs but again decided the amount of
> disruption this would generate was not worth a Python rbreak returning
> one breakpoint representing multiple symbols and multiple locations.
>
> So, instead, I've made the Python rbreak functionality a little more
> tuneable than the console command equivalent. The first tuneable is to
> allow the user to exclude mini symbols from the pattern matching. A
> loosely written pattern passed to the console rbreak command can
> generate 5000+ breakpoints from a simple C "hello world"
> program. rbreak "" will place a breakpoint on every single mini symbol
> representing a function. I can't possibly see why this would be
> desirable. In order to facilitate broadly-written patterns that seek to
> place a breakpoint on every user-defined function, the equivalent
> would be:
>
> gdb.rbreak ("", minisyms=False)
Hi Phil,
As Kevin noted (IIUC), this should be "minsyms" if we want to follow standard
GDB terminology.
>
> That would place a breakpoint on all functions that are actually
> defined in the inferior (and not those that are inserted by the
> compiler, linker, etc). The default for this keyword is False.
>
> The second tuneable is a throttle. Beyond the name (which I am unsure
> about but could not think of a better one), this allows the user to
> enter a fail-safe limit for breakpoint creation. So, for the following
> example, an inferior with ten user provided functions:
>
> gdb.rbreak ("", minisyms=False, throttle=5)
max_results? max_breakpoints?
> Would set no breakpoints and raise a runtime error. This functionality
> is more aimed at those who want to utilise Python rbreak from a
> toolset perspective (say for a GDB UI). The default for this keyword
> is unlimited.
>
> The last tuneable is the ability to limit the pattern matching to a
> set of gdb.Symtab objects. So, given the example below:
>
> gdb.rbreak ("", minisyms=False, throttle=5, symtabs=symTuple)
>
> where "symTuple" is a tuple of gdb.Symtab objects, the search/pattern
> matching would be restricted to those gdb.Symtab objects only. The
> default for this keyword (i.e, no keyword provided) is to search all
> symbol tables.
>
> All of the examples above have "" as the search pattern but you can,
> of course, specify any valid GDB regex there.
>
> On a review note I seek a little advice on the interface to
> search_symbols. If the user does provide a symtabs keyword, the code
> collects the name of the file attached to each symtab specified. It
> does that through the usual Python method
> (python_string_to_target_string). That returns a
> gdb::unique_xmalloc_ptr<char>. I planned to store these in a
> std::vector, however, I was unable to coerce a
> std::vector<gdb::unique_xmalloc_ptr<char>> into a char **p[]
> representation the search_symbols function requires. I ended up
> releasing the object (that results in a char *) and wrapping the
> std::vector in a simple type that provides a destructor to properly
> free the released strings when the object falls out of scope. I
> thought about just changing the search_symbols interface to accept a
> vector of (gdb) unique pointers but noted a C++ run had already been
> performed over the function. Any thoughts here would be appreciated.
>
> Long, windy, preamble over the patch follows at the end of the email.
>
> Cheers
>
> Phil
>
> --
>
> 2017-10-11 Phil Muldoon <pmuldoon@redhat.com>
>
> * python/python.c (gdbpy_rbreak): New function.
>
> 2017-10-11 Phil Muldoon <pmuldoon@redhat.com>
>
> * gdb.python/py-rbreak.exp: New file.
> * gdb.python/py-rbreak.c: New file.
> * gdb.python/py-rbreak-func2.c: New file.
>
> 2017-10-11 Phil Muldoon <pmuldoon@redhat.com>
>
> * python.texi (Basic Python): Add rbreak documentation.
>
> --
>
> diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
> index f661e489bb..b2d4516886 100644
> --- a/gdb/doc/python.texi
> +++ b/gdb/doc/python.texi
> @@ -243,6 +243,24 @@ were no breakpoints. This peculiarity was subsequently fixed, and now
> @code{gdb.breakpoints} returns an empty sequence in this case.
> @end defun
>
> +@findex gdb.rbreak
> +@defun gdb.rbreak (regex @r{[}, minisyms @r{[}, throttle, @r{[}, symtabs @r{]]]})
> +Return a Python tuple holding a collection of newly set
> +@code{gdb.Breakpoint} objects matching function names defined by the
> +@var{regex} pattern. If the @var{minisyms} keyword is @code{True},
> +all system functions (those not explicitly defined in the inferior)
> +will also be included in the match. The @var{throttle} keyword takes
> +an integer that defines the maximum number of pattern matches for
> +functions matched by the @var{regex} pattern. If the number of
> +matches exceeds the integer value of @var{throttle}, a
> +@code{RuntimeError} will be raised and no breakpoints will be created.
> +If @var{throttle} is not defined then there is no imposed limit on the
> +maximum number of matches and breakpoints to be created. The
> +@var{symtabs} keyword takes a Python iterable that yields a collection
> +of @code{gdb.Symtab} objects and will restrict the search to those
> +functions only contained within the @code{gdb.Symtab} objects.
> +@end defun
> +
> @findex gdb.parameter
> @defun gdb.parameter (parameter)
> Return the value of a @value{GDBN} @var{parameter} given by its name,
> diff --git a/gdb/python/python.c b/gdb/python/python.c
> index b04057ec4a..4d591df30c 100644
> --- a/gdb/python/python.c
> +++ b/gdb/python/python.c
> @@ -642,6 +642,191 @@ gdbpy_solib_name (PyObject *self, PyObject *args)
> return str_obj;
> }
>
> +/* Implementation of Python rbreak command. Take a REGEX and
> + optionally a MINISYMS, THROTTLE and SYMTABS keyword and return a
> + Python tuple that contains newly set breakpoints that match that
> + criteria. REGEX refers to a GDB format standard regex pattern of
> + symbols names to search; MINISYMS is an optional boolean (default
> + False) that indicates if the function should search GDB's minimal
> + symbols; THROTTLE is an optional integer (default unlimited) that
> + indicates the maximum amount of breakpoints allowable before the
> + function exits (note, if the throttle bound is passed, no
> + breakpoints will be set and a runtime error returned); SYMTABS is
> + an optional iterator that contains a set of gdb.Symtabs to
iterator or iterable? It would make sense to be able to pass a list here,
for example.
> + constrain the search within. */
> +
> +static PyObject *
> +gdbpy_rbreak (PyObject *self, PyObject *args, PyObject *kw)
> +{
> + /* A simple type to ensure clean up of a vector of allocated strings
> + when a C interface demands a const char *array[] type
> + interface. */
> + struct symtab_list_type
> + {
> + ~symtab_list_type ()
> + {
> + for (const char *elem: vec)
> + xfree ((void *) elem);
> + }
> + std::vector<const char *> vec;
> + };
> +
> + char *regex = NULL;
> + std::vector<symbol_search> symbols;
> + unsigned long count = 0;
> + PyObject *symtab_list = NULL;
> + PyObject *minisyms_p_obj = NULL;
> + int minisyms_p = 0;
> + unsigned int throttle = 0;
> + static const char *keywords[] = {"regex","minisyms", "throttle", "symtabs", NULL};
Nit: line too long and missing space.
> + symtab_list_type symtab_paths;
> +
> + if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "s|O!IO", keywords,
> + ®ex, &PyBool_Type,
> + &minisyms_p_obj, &throttle,
> + &symtab_list))
> + return NULL;
> +
> + /* Parse minisyms keyword. */
> + if (minisyms_p_obj != NULL)
> + {
> + int cmp = PyObject_IsTrue (minisyms_p_obj);
> + if (cmp < 0)
> + return NULL;
> + minisyms_p = cmp;
> + }
> +
> + /* The "symtabs" keyword is any Python iterable object that returns
> + a gdb.Symtab on each iteration. If specified, iterate through
> + the provided gdb.Symtabs and extract their full path. As
> + python_string_to_target_string returns a
> + gdb::unique_xmalloc_ptr<char> and a vector containing these types
> + cannot be coerced to a const char **p[] via the vector.data call,
> + release the value from the unique_xmalloc_ptr and place it in a
> + simple type symtab_list_type (which holds the vector and a
> + destructor that frees the contents of the allocated strings. */
> + if (symtab_list != NULL)
> + {
> + gdbpy_ref<> iter (PyObject_GetIter (symtab_list));
> +
> + if (iter == NULL)
> + return NULL;
> +
> + while (true)
> + {
> + gdbpy_ref<> next (PyIter_Next (iter.get ()));
> +
> + if (next == NULL)
> + {
> + if (PyErr_Occurred ())
> + return NULL;
> + break;
> + }
> +
> + gdbpy_ref<> obj_name (PyObject_GetAttrString (next.get (),
> + "filename"));
> +
> + if (obj_name == NULL)
> + return NULL;
> +
> + /* Is the object file still valid? */
> + if (obj_name == Py_None)
> + continue;
> +
> + gdb::unique_xmalloc_ptr<char> filename =
> + python_string_to_target_string (obj_name.get ());
> +
> + if (filename == NULL)
> + return NULL;
> +
> + /* Make sure there is a definite place to store the value of
> + s before it is released. */
> + symtab_paths.vec.push_back (nullptr);
> + symtab_paths.vec.back () = filename.release ();
> + }
> + }
> +
> + if (symtab_list)
> + {
> + const char **files = symtab_paths.vec.data ();
> +
> + symbols = search_symbols (regex, FUNCTIONS_DOMAIN,
> + symtab_paths.vec.size (), files);
> + }
> + else
> + symbols = search_symbols (regex, FUNCTIONS_DOMAIN, 0, NULL);
> +
> + /* Count the number of symbols (both symbols and optionally mini
> + symbols) so we can correctly size the tuple. */
> + for (const symbol_search &p : symbols)
> + {
> + /* Mini symbols included? */
> + if (minisyms_p)
> + {
> + if (p.msymbol.minsym != NULL)
> + count++;
> + }
Would it be easy to pass the minisyms_p flag to search_symbols, so that
we don't need to search in the minsym tables if we don't even care about
them?
> +
> + if (p.symbol != NULL)
> + count++;
> + }
> +
> + /* Check throttle bounds and exit if in excess. */
> + if (throttle != 0 && count > throttle)
> + {
> + PyErr_SetString (PyExc_RuntimeError,
> + _("Number of breakpoints exceeds throttled maximum."));
> + return NULL;
> + }
> +
> + gdbpy_ref<> return_tuple (PyTuple_New (count));
> +
> + if (return_tuple == NULL)
> + return NULL;
How do you decide if this function should return a tuple or a list?
Instinctively I would have returned a list, but I can't really explain
why.
> +
> + count = 0;
> +
> + /* Construct full path names for symbols and call the Python
> + breakpoint constructor on the resulting names. Be tolerant of
> + individual breakpoint failures. */
> + for (const symbol_search &p : symbols)
> + {
> + std::string symbol_name;
> +
> + /* Skipping mini-symbols? */
> + if (minisyms_p == 0)
> + if (p.msymbol.minsym != NULL)
> + continue;
> +
> + if (p.msymbol.minsym == NULL)
> + {
> + struct symtab *symtab = symbol_symtab (p.symbol);
> + const char *fullname = symtab_to_fullname (symtab);
> +
> + symbol_name = fullname;
> + symbol_name += ":";
> + symbol_name += SYMBOL_LINKAGE_NAME (p.symbol);
> + }
> + else
> + symbol_name = MSYMBOL_LINKAGE_NAME (p.msymbol.minsym);
> +
> + gdbpy_ref<> argList (Py_BuildValue("(s)", symbol_name.c_str ()));
> + gdbpy_ref<> obj (PyObject_CallObject ((PyObject *)
> + &breakpoint_object_type,
> + argList.get ()));
> +
> + /* Tolerate individual breakpoint failures. */
> + if (obj == NULL)
> + gdbpy_print_stack ();
> + else
> + {
> + PyTuple_SET_ITEM (return_tuple.get (), count, obj.get ());
The Python doc says that SET_ITEM steals a reference to obj. Isn't it
a problem, because gdbpy_ref also keeps the reference?
> + count++;
> + }
> + }
Hmm maybe this is a reason to use a list? If a breakpoint fails to
be created, the tuple will not be filled completely. What happens
to tuple elements that were not set?
With the list, you can simply PyList_Append.
> + return return_tuple.release ();
> +}
> +
> /* A Python function which is a wrapper for decode_line_1. */
>
> static PyObject *
> @@ -1912,7 +2097,9 @@ Return the name of the current target charset." },
> { "target_wide_charset", gdbpy_target_wide_charset, METH_NOARGS,
> "target_wide_charset () -> string.\n\
> Return the name of the current target wide charset." },
> -
> + { "rbreak", (PyCFunction) gdbpy_rbreak, METH_VARARGS | METH_KEYWORDS,
> + "rbreak (Regex) -> Tuple.\n\
> +Return a Tuple containing gdb.Breakpoint objects that match the given Regex." },
> { "string_to_argv", gdbpy_string_to_argv, METH_VARARGS,
> "string_to_argv (String) -> Array.\n\
> Parse String and return an argv-like array.\n\
> diff --git a/gdb/testsuite/gdb.python/py-rbreak-func2.c b/gdb/testsuite/gdb.python/py-rbreak-func2.c
> new file mode 100644
> index 0000000000..1de8bfd5b2
> --- /dev/null
> +++ b/gdb/testsuite/gdb.python/py-rbreak-func2.c
> @@ -0,0 +1,31 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> + Copyright 2017 Free Software Foundation, Inc.
> +
> + This program is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3 of the License, or
> + (at your option) any later version.
> +
> + This program is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with this program. If not, see <http://www.gnu.org/licenses/>. */
> +
> +int efunc1 ()
> +{
> + return 1;
> +}
> +
> +int efunc2 ()
> +{
> + return 2;
> +}
> +
> +int efunc3 ()
> +{
> + return 3;
> +}
> diff --git a/gdb/testsuite/gdb.python/py-rbreak.c b/gdb/testsuite/gdb.python/py-rbreak.c
> new file mode 100644
> index 0000000000..287aba6c03
> --- /dev/null
> +++ b/gdb/testsuite/gdb.python/py-rbreak.c
> @@ -0,0 +1,62 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> + Copyright 2013-2017 Free Software Foundation, Inc.
> +
> + This program is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3 of the License, or
> + (at your option) any later version.
> +
> + This program is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with this program. If not, see <http://www.gnu.org/licenses/>. */
> +
> +int func1 ()
As for GDB code, put the return type on its own line.
> +{
> + return 1;
> +}
> +
> +int func2 ()
> +{
> + return 2;
> +}
> +
> +int func3 ()
> +{
> + return 3;
> +}
> +
> +int func4 ()
> +{
> + return 4;
> +}
> +
> +int func5 ()
> +{
> + return 5;
> +}
> +
> +void func6 ()
> +{
> + return;
> +}
> +
> +void outside_scope ()
> +{
> + return;
> +}
> +
> +int main()
> +{
> + func1 (); /* Break func1. */
> + func2 ();
> + func3 ();
> + func4 ();
> + func5 ();
> + func6 ();
> + outside_scope ();
> +}
> diff --git a/gdb/testsuite/gdb.python/py-rbreak.exp b/gdb/testsuite/gdb.python/py-rbreak.exp
> new file mode 100644
> index 0000000000..9385aeae08
> --- /dev/null
> +++ b/gdb/testsuite/gdb.python/py-rbreak.exp
> @@ -0,0 +1,61 @@
> +# Copyright (C) 2017 Free Software Foundation, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program. If not, see <http://www.gnu.org/licenses/>.
> +
> +# This file is part of the GDB testsuite. It tests the mechanism
> +# exposing values to Python.
> +
> +load_lib gdb-python.exp
> +
> +standard_testfile py-rbreak.c py-rbreak-func2.c
> +
> +if {[prepare_for_testing "failed to prepare" ${testfile} [list $srcfile $srcfile2]] } {
> + return 1
> +}
> +
> +# Skip all tests if Python scripting is not enabled.
> +if { [skip_python_tests] } { continue }
> +
> +if ![runto_main] then {
> + fail "can't run to main"
> + return 0
> +}
> +
> +gdb_py_test_silent_cmd "py sl = gdb.rbreak(\"\",minisyms=False)" \
> + "Get all function breakpoints" 0
> +gdb_test "py print(len(sl))" "11" \
> + "Check number of returned breakpoints is 11"
> +gdb_py_test_silent_cmd "py sl = gdb.rbreak(\"main\.\*\",minisyms=False)" \
> + "Get main function breakpoint" 0
> +gdb_test "py print(len(sl))" "1" \
> + "Check number of returned breakpoints is 1"
> +gdb_py_test_silent_cmd "py sl = gdb.rbreak(\"func\.\*\",minisyms=False,throttle=10)" \
> + "Get functions matching func.*" 0
> +gdb_test "py print(len(sl))" "9" \
> + "Check number of returned breakpoints is 9"
> +gdb_test "py gdb.rbreak(\"func\.\*\",minisyms=False,throttle=1)" \
> + "Number of breakpoints exceeds throttled maximum.*" \
> + "Check throttle errors on too many breakpoints."
> +gdb_py_test_silent_cmd "py sl = gdb.rbreak(\"func1\",minisyms=True)" \
> + "Including mini symbols, get functions matching func.*" 0
> +gdb_test "py print(len(sl))" "2" \
> + "Check number of returned breakpoints is 2"
> +gdb_py_test_silent_cmd "python sym = gdb.lookup_symbol(\"efunc1\")" \
> + "Find a symbol in objfile" 1
> +gdb_py_test_silent_cmd "python symtab = sym\[0\].symtab" \
> + "Get backing symbol table" 1
> +gdb_py_test_silent_cmd "py sl = gdb.rbreak(\"func\.\*\",minisyms=False,throttle=10,symtabs=\[symtab\])" \
> + "Get functions matching func.* in one symtab only." 0
> +gdb_test "py print(len(sl))" "3" \
> + "Check number of returned breakpoints is 3"
>
I can't find a reference, but I think we want test names to start
with a lower case letter and not end with a dot. I'll see if we
can add this to the testcase cookbook wiki page.
Simon