This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [python][patch] Python rbreak

From: Simon Marchi <simon dot marchi at ericsson dot com>
To: Phil Muldoon <pmuldoon at redhat dot com>, "gdb-patches at sourceware dot org" <gdb-patches at sourceware dot org>
Date: Mon, 16 Oct 2017 18:22:04 -0400
Subject: Re: [python][patch] Python rbreak
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=simon dot marchi at ericsson dot com;
References: <f39f6365-fc45-ea7d-10dc-5e0053db5cbb@redhat.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99
On 2017-10-11 07:30 AM, Phil Muldoon wrote:
> This introduces a Python rbreak function to the Python API. Its
> functionality was designed to closely match that of the console rbreak
> but with a number of caveats.
> 
> That being said, there's a bit of preamble first.
> 
> My original design was to have the gdb.Breakpoint constructor take a
> regex pattern. While breakpoints can, and do, have multiple locations
> they are always tied in with one linespec. I looked at the
> linespec/location code and the relationship is pretty firmly embedded
> as a one to many relationship. One linespec can have multiple
> locations but only one linespec can be represented in the location
> object. (Or address or probe or whatever the location is
> representing). A linespec has to be definitive; it has to point to one
> symbol. I thought about modifying this to have multiple symbols and
> multiple locations but quickly decided this was really going to
> violate the design contract of linespecs/location. It would also be
> considerable and disruptive work. I looked then, again, at a
> breakpoint having multiple linespecs but again decided the amount of
> disruption this would generate was not worth a Python rbreak returning
> one breakpoint representing multiple symbols and multiple locations.
> 
> So, instead, I've made the Python rbreak functionality a little more
> tuneable than the console command equivalent. The first tuneable is to
> allow the user to exclude mini symbols from the pattern matching. A
> loosely written pattern passed to the console rbreak command can
> generate 5000+ breakpoints from a simple C "hello world"
> program. rbreak "" will place a breakpoint on every single mini symbol
> representing a function. I can't possibly see why this would be
> desirable. In order to facilitate broadly-written patterns that seek to
> place a breakpoint on every user-defined function, the equivalent
> would be:
> 
> gdb.rbreak ("", minisyms=False)

Hi Phil,

As Kevin noted (IIUC), this should be "minsyms" if we want to follow standard
GDB terminology.

> 
> That would place a breakpoint on all functions that are actually
> defined in the inferior (and not those that are inserted by the
> compiler, linker, etc). The default for this keyword is False.
> 
> The second tuneable is a throttle. Beyond the name (which I am unsure
> about but could not think of a better one), this allows the user to
> enter a fail-safe limit for breakpoint creation. So, for the following
> example, an inferior with ten user provided functions:
> 
> gdb.rbreak ("", minisyms=False, throttle=5)

max_results? max_breakpoints?

> Would set no breakpoints and raise a runtime error. This functionality
> is more aimed at those who want to utilise Python rbreak from a
> toolset perspective (say for a GDB UI). The default for this keyword
> is unlimited.
> 
> The last tuneable is the ability to limit the pattern matching to a
> set of gdb.Symtab objects. So, given the example below:
> 
> gdb.rbreak ("", minisyms=False, throttle=5, symtabs=symTuple)
> 
> where "symTuple" is a tuple of gdb.Symtab objects, the search/pattern
> matching would be restricted to those gdb.Symtab objects only. The
> default for this keyword (i.e, no keyword provided) is to search all
> symbol tables.
> 
> All of the examples above have "" as the search pattern but you can,
> of course, specify any valid GDB regex there.
> 
> On a review note I seek a little advice on the interface to
> search_symbols. If the user does provide a symtabs keyword, the code
> collects the name of the file attached to each symtab specified. It
> does that through the usual Python method
> (python_string_to_target_string). That returns a
> gdb::unique_xmalloc_ptr<char>. I planned to store these in a
> std::vector, however, I was unable to coerce a
> std::vector<gdb::unique_xmalloc_ptr<char>> into a char **p[]
> representation the search_symbols function requires. I ended up
> releasing the object (that results in a char *) and wrapping the
> std::vector in a simple type that provides a destructor to properly
> free the released strings when the object falls out of scope. I
> thought about just changing the search_symbols interface to accept a
> vector of (gdb) unique pointers but noted a C++ run had already been
> performed over the function. Any thoughts here would be appreciated.
> 
> Long, windy, preamble over the patch follows at the end of the email.
> 
> Cheers
> 
> Phil
> 
> --
> 
> 2017-10-11  Phil Muldoon  <pmuldoon@redhat.com>
> 
> 	* python/python.c (gdbpy_rbreak): New function.
> 
> 2017-10-11  Phil Muldoon  <pmuldoon@redhat.com>
> 
> 	* gdb.python/py-rbreak.exp: New file.
> 	* gdb.python/py-rbreak.c: New file.
> 	* gdb.python/py-rbreak-func2.c: New file.
> 
> 2017-10-11  Phil Muldoon  <pmuldoon@redhat.com>
> 
> 	* python.texi (Basic Python): Add rbreak documentation.
> 
> --
> 
> diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
> index f661e489bb..b2d4516886 100644
> --- a/gdb/doc/python.texi
> +++ b/gdb/doc/python.texi
> @@ -243,6 +243,24 @@ were no breakpoints.  This peculiarity was subsequently fixed, and now
>  @code{gdb.breakpoints} returns an empty sequence in this case.
>  @end defun
>  
> +@findex gdb.rbreak
> +@defun gdb.rbreak (regex @r{[}, minisyms @r{[}, throttle, @r{[}, symtabs @r{]]]})
> +Return a Python tuple holding a collection of newly set
> +@code{gdb.Breakpoint} objects matching function names defined by the
> +@var{regex} pattern.  If the @var{minisyms} keyword is @code{True},
> +all system functions (those not explicitly defined in the inferior)
> +will also be included in the match.  The @var{throttle} keyword takes
> +an integer that defines the maximum number of pattern matches for
> +functions matched by the @var{regex} pattern.  If the number of
> +matches exceeds the integer value of @var{throttle}, a
> +@code{RuntimeError} will be raised and no breakpoints will be created.
> +If @var{throttle} is not defined then there is no imposed limit on the
> +maximum number of matches and breakpoints to be created.  The
> +@var{symtabs} keyword takes a Python iterable that yields a collection
> +of @code{gdb.Symtab} objects and will restrict the search to those
> +functions only contained within the @code{gdb.Symtab} objects.
> +@end defun
> +
>  @findex gdb.parameter
>  @defun gdb.parameter (parameter)
>  Return the value of a @value{GDBN} @var{parameter} given by its name,
> diff --git a/gdb/python/python.c b/gdb/python/python.c
> index b04057ec4a..4d591df30c 100644
> --- a/gdb/python/python.c
> +++ b/gdb/python/python.c
> @@ -642,6 +642,191 @@ gdbpy_solib_name (PyObject *self, PyObject *args)
>    return str_obj;
>  }
>  
> +/* Implementation of Python rbreak command.  Take a REGEX and
> +   optionally a MINISYMS, THROTTLE and SYMTABS keyword and return a
> +   Python tuple that contains newly set breakpoints that match that
> +   criteria.  REGEX refers to a GDB format standard regex pattern of
> +   symbols names to search; MINISYMS is an optional boolean (default
> +   False) that indicates if the function should search GDB's minimal
> +   symbols; THROTTLE is an optional integer (default unlimited) that
> +   indicates the maximum amount of breakpoints allowable before the
> +   function exits (note, if the throttle bound is passed, no
> +   breakpoints will be set and a runtime error returned); SYMTABS is
> +   an optional iterator that contains a set of gdb.Symtabs to

iterator or iterable?  It would make sense to be able to pass a list here,
for example.

> +   constrain the search within.  */
> +
> +static PyObject *
> +gdbpy_rbreak (PyObject *self, PyObject *args, PyObject *kw)
> +{
> +  /* A simple type to ensure clean up of a vector of allocated strings
> +     when a C interface demands a const char *array[] type
> +     interface.  */
> +  struct symtab_list_type
> +  {
> +    ~symtab_list_type ()
> +    {
> +      for (const char *elem: vec)
> +	xfree ((void *) elem);
> +    }
> +    std::vector<const char *> vec;
> +  };
> +
> +  char *regex = NULL;
> +  std::vector<symbol_search> symbols;
> +  unsigned long count = 0;
> +  PyObject *symtab_list = NULL;
> +  PyObject *minisyms_p_obj = NULL;
> +  int minisyms_p = 0;
> +  unsigned int throttle = 0;
> +  static const char *keywords[] = {"regex","minisyms", "throttle", "symtabs", NULL};

Nit: line too long and missing space.	

> +  symtab_list_type symtab_paths;
> +
> +  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "s|O!IO", keywords,
> +					&regex, &PyBool_Type,
> +					&minisyms_p_obj, &throttle,
> +					&symtab_list))
> +    return NULL;
> +
> +  /* Parse minisyms keyword.  */
> +  if (minisyms_p_obj != NULL)
> +    {
> +      int cmp = PyObject_IsTrue (minisyms_p_obj);
> +      if (cmp < 0)
> +	return NULL;
> +      minisyms_p = cmp;
> +    }
> +
> +  /* The "symtabs" keyword is any Python iterable object that returns
> +     a gdb.Symtab on each iteration.  If specified, iterate through
> +     the provided gdb.Symtabs and extract their full path.  As
> +     python_string_to_target_string returns a
> +     gdb::unique_xmalloc_ptr<char> and a vector containing these types
> +     cannot be coerced to a const char **p[] via the vector.data call,
> +     release the value from the unique_xmalloc_ptr and place it in a
> +     simple type symtab_list_type (which holds the vector and a
> +     destructor that frees the contents of the allocated strings.  */
> +  if (symtab_list != NULL)
> +    {
> +      gdbpy_ref<> iter (PyObject_GetIter (symtab_list));
> +
> +      if (iter == NULL)
> +	return NULL;
> +
> +      while (true)
> +	{
> +	  gdbpy_ref<> next (PyIter_Next (iter.get ()));
> +
> +	  if (next == NULL)
> +	    {
> +	      if (PyErr_Occurred ())
> +		return NULL;
> +	      break;
> +	    }
> +
> +	  gdbpy_ref<> obj_name (PyObject_GetAttrString (next.get (),
> +							"filename"));
> +
> +	  if (obj_name == NULL)
> +	    return NULL;
> +
> +	  /* Is the object file still valid?  */
> +	  if (obj_name == Py_None)
> +	    continue;
> +
> +	  gdb::unique_xmalloc_ptr<char> filename =
> +	    python_string_to_target_string (obj_name.get ());
> +
> +	  if (filename == NULL)
> +	    return NULL;
> +
> +	  /* Make sure there is a definite place to store the value of
> +	     s before it is released.  */
> +	  symtab_paths.vec.push_back (nullptr);
> +	  symtab_paths.vec.back () = filename.release ();
> +	}
> +    }
> +
> +  if (symtab_list)
> +    {
> +      const char **files = symtab_paths.vec.data ();
> +
> +      symbols = search_symbols (regex, FUNCTIONS_DOMAIN,
> +				symtab_paths.vec.size (), files);
> +    }
> +  else
> +    symbols = search_symbols (regex, FUNCTIONS_DOMAIN, 0, NULL);
> +
> +  /* Count the number of symbols (both symbols and optionally mini
> +     symbols) so we can correctly size the tuple.  */
> +  for (const symbol_search &p : symbols)
> +    {
> +      /* Mini symbols included?  */
> +      if (minisyms_p)
> +	{
> +	  if (p.msymbol.minsym != NULL)
> +	    count++;
> +	}

Would it be easy to pass the minisyms_p flag to search_symbols, so that
we don't need to search in the minsym tables if we don't even care about
them?

> +
> +      if (p.symbol != NULL)
> +	count++;
> +    }
> +
> +  /* Check throttle bounds and exit if in excess.  */
> +  if (throttle != 0 && count > throttle)
> +    {
> +      PyErr_SetString (PyExc_RuntimeError,
> +		       _("Number of breakpoints exceeds throttled maximum."));
> +      return NULL;
> +    }
> +
> +  gdbpy_ref<> return_tuple (PyTuple_New (count));
> +
> +  if (return_tuple == NULL)
> +    return NULL;

How do you decide if this function should return a tuple or a list?
Instinctively I would have returned a list, but I can't really explain
why.

> +
> +  count = 0;
> +
> +  /* Construct full path names for symbols and call the Python
> +     breakpoint constructor on the resulting names.  Be tolerant of
> +     individual breakpoint failures.  */
> +  for (const symbol_search &p : symbols)
> +    {
> +      std::string symbol_name;
> +
> +      /* Skipping mini-symbols?  */
> +      if (minisyms_p == 0)
> +	if (p.msymbol.minsym != NULL)
> +	  continue;
> +
> +      if (p.msymbol.minsym == NULL)
> +	{
> +	  struct symtab *symtab = symbol_symtab (p.symbol);
> +	  const char *fullname = symtab_to_fullname (symtab);
> +
> +	  symbol_name = fullname;
> +	  symbol_name  += ":";
> +	  symbol_name  += SYMBOL_LINKAGE_NAME (p.symbol);
> +	}
> +      else
> +	symbol_name = MSYMBOL_LINKAGE_NAME (p.msymbol.minsym);
> +
> +      gdbpy_ref<> argList (Py_BuildValue("(s)", symbol_name.c_str ()));
> +      gdbpy_ref<> obj (PyObject_CallObject ((PyObject *)
> +					    &breakpoint_object_type,
> +					    argList.get ()));
> +
> +      /* Tolerate individual breakpoint failures.  */
> +      if (obj == NULL)
> +	gdbpy_print_stack ();
> +      else
> +	{
> +	  PyTuple_SET_ITEM (return_tuple.get (), count, obj.get ());

The Python doc says that SET_ITEM steals a reference to obj.  Isn't it
a problem, because gdbpy_ref also keeps the reference?

> +	  count++;
> +	}
> +    }

Hmm maybe this is a reason to use a list?  If a breakpoint fails to
be created, the tuple will not be filled completely.  What happens
to tuple elements that were not set?

With the list, you can simply PyList_Append.

> +  return return_tuple.release ();
> +}
> +
>  /* A Python function which is a wrapper for decode_line_1.  */
>  
>  static PyObject *
> @@ -1912,7 +2097,9 @@ Return the name of the current target charset." },
>    { "target_wide_charset", gdbpy_target_wide_charset, METH_NOARGS,
>      "target_wide_charset () -> string.\n\
>  Return the name of the current target wide charset." },
> -
> +  { "rbreak", (PyCFunction) gdbpy_rbreak, METH_VARARGS | METH_KEYWORDS,
> +    "rbreak (Regex) -> Tuple.\n\
> +Return a Tuple containing gdb.Breakpoint objects that match the given Regex." },
>    { "string_to_argv", gdbpy_string_to_argv, METH_VARARGS,
>      "string_to_argv (String) -> Array.\n\
>  Parse String and return an argv-like array.\n\
> diff --git a/gdb/testsuite/gdb.python/py-rbreak-func2.c b/gdb/testsuite/gdb.python/py-rbreak-func2.c
> new file mode 100644
> index 0000000000..1de8bfd5b2
> --- /dev/null
> +++ b/gdb/testsuite/gdb.python/py-rbreak-func2.c
> @@ -0,0 +1,31 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2017 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +int efunc1 ()
> +{
> +  return 1;
> +}
> +
> +int efunc2 ()
> +{
> +  return 2;
> +}
> +
> +int efunc3 ()
> +{
> +  return 3;
> +}
> diff --git a/gdb/testsuite/gdb.python/py-rbreak.c b/gdb/testsuite/gdb.python/py-rbreak.c
> new file mode 100644
> index 0000000000..287aba6c03
> --- /dev/null
> +++ b/gdb/testsuite/gdb.python/py-rbreak.c
> @@ -0,0 +1,62 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2013-2017 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +int func1 ()

As for GDB code, put the return type on its own line.

> +{
> +  return 1;
> +}
> +
> +int func2 ()
> +{
> +  return 2;
> +}
> +
> +int func3 ()
> +{
> +  return 3;
> +}
> +
> +int func4 ()
> +{
> +  return 4;
> +}
> +
> +int func5 ()
> +{
> +  return 5;
> +}
> +
> +void func6 ()
> +{
> +  return;
> +}
> +
> +void outside_scope ()
> +{
> +  return;
> +}
> +
> +int main()
> +{
> +  func1 (); /* Break func1.  */
> +  func2 ();
> +  func3 ();
> +  func4 ();
> +  func5 ();
> +  func6 ();
> +  outside_scope ();
> +}
> diff --git a/gdb/testsuite/gdb.python/py-rbreak.exp b/gdb/testsuite/gdb.python/py-rbreak.exp
> new file mode 100644
> index 0000000000..9385aeae08
> --- /dev/null
> +++ b/gdb/testsuite/gdb.python/py-rbreak.exp
> @@ -0,0 +1,61 @@
> +# Copyright (C) 2017 Free Software Foundation, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# This file is part of the GDB testsuite.  It tests the mechanism
> +# exposing values to Python.
> +
> +load_lib gdb-python.exp
> +
> +standard_testfile py-rbreak.c py-rbreak-func2.c
> +
> +if {[prepare_for_testing "failed to prepare" ${testfile} [list $srcfile $srcfile2]] } {
> +    return 1
> +}
> +
> +# Skip all tests if Python scripting is not enabled.
> +if { [skip_python_tests] } { continue }
> +
> +if ![runto_main] then {
> +    fail "can't run to main"
> +    return 0
> +}
> +
> +gdb_py_test_silent_cmd "py sl = gdb.rbreak(\"\",minisyms=False)" \
> +    "Get all function breakpoints" 0
> +gdb_test "py print(len(sl))" "11" \
> +    "Check number of returned breakpoints is 11"
> +gdb_py_test_silent_cmd "py sl = gdb.rbreak(\"main\.\*\",minisyms=False)" \
> +    "Get main function breakpoint" 0
> +gdb_test "py print(len(sl))" "1" \
> +    "Check number of returned breakpoints is 1"
> +gdb_py_test_silent_cmd "py sl = gdb.rbreak(\"func\.\*\",minisyms=False,throttle=10)" \
> +    "Get functions matching func.*" 0
> +gdb_test "py print(len(sl))" "9" \
> +    "Check number of returned breakpoints is 9"
> +gdb_test "py gdb.rbreak(\"func\.\*\",minisyms=False,throttle=1)" \
> +    "Number of breakpoints exceeds throttled maximum.*" \
> +    "Check throttle errors on too many breakpoints."
> +gdb_py_test_silent_cmd "py sl = gdb.rbreak(\"func1\",minisyms=True)" \
> +    "Including mini symbols, get functions matching func.*" 0
> +gdb_test "py print(len(sl))" "2" \
> +    "Check number of returned breakpoints is 2"
> +gdb_py_test_silent_cmd "python sym = gdb.lookup_symbol(\"efunc1\")" \
> +    "Find a symbol in objfile" 1
> +gdb_py_test_silent_cmd "python symtab = sym\[0\].symtab" \
> +    "Get backing symbol table" 1
> +gdb_py_test_silent_cmd "py sl = gdb.rbreak(\"func\.\*\",minisyms=False,throttle=10,symtabs=\[symtab\])" \
> +    "Get functions matching func.* in one symtab only." 0
> +gdb_test "py print(len(sl))" "3" \
> +    "Check number of returned breakpoints is 3"
> 

I can't find a reference, but I think we want test names to start
with a lower case letter and not end with a dot.  I'll see if we
can add this to the testcase cookbook wiki page.

Simon
Follow-Ups:
- Re: [python][patch] Python rbreak
  - From: Phil Muldoon
References:
- [python][patch] Python rbreak
  - From: Phil Muldoon
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]