This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] C++ify gdb/common/environ.c


Hey,

Thanks for the review, and really sorry about the delay.  I have a few
comments and questions below.

On Wednesday, April 19 2017, Pedro Alves wrote:

> On 04/18/2017 04:03 AM, Sergio Durigan Junior wrote:
>
>> -  if (e->allocated < i)
>> +const char *
>> +gdb_environ::get (const std::string &var) const
>> +{
>> +  try
>>      {
>> -      e->allocated = std::max (i, e->allocated + 10);
>> -      e->vector = (char **) xrealloc ((char *) e->vector,
>> -				      (e->allocated + 1) * sizeof (char *));
>> +      return (char *) this->m_environ_map.at (var).c_str ();
>>      }
>> -
>> -  memcpy (e->vector, environ, (i + 1) * sizeof (char *));
>> -
>> -  while (--i >= 0)
>> +  catch (const std::out_of_range &ex)
>
> Please use unordered_map::find instead.  In general,
> use "at" with exceptions when you're "sure" the element
> exists.  It'll be simpler, more efficient, and less
> roundabout, since at() essentially does a find() and then
> throws if not found.

I decided to go the easier way and not use unordered_map.  I confess I
was having second thoughts myself when I posted the patch, and your
message only made me more confident that I don't want to follow this
route.

>>      {
>> -      int len = strlen (e->vector[i]);
>> -      char *newobj = (char *) xmalloc (len + 1);
>> -
>> -      memcpy (newobj, e->vector[i], len + 1);
>> -      e->vector[i] = newobj;
>> +      return NULL;
>>      }
>>  }
>>  
>
>>  void
>> -set_in_environ (struct gdb_environ *e, const char *var, const char *value)
>> +gdb_environ::set (const std::string &var, const std::string &value)
>>  {
>> -  int i;
>> -  int len = strlen (var);
>> -  char **vector = e->vector;
>> -  char *s;
>> +  std::unordered_map<std::string, std::string>::iterator el;
>>  
>> -  for (i = 0; (s = vector[i]) != NULL; i++)
>> -    if (strncmp (s, var, len) == 0 && s[len] == '=')
>> -      break;
>> +  el = this->m_environ_map.find (var);
>
> In C++, it's a good practice to avoid first initializing,
> and then assigning separately, because the default constructor
> will then do useless work.  (And in some cases, there may not even
> be a default constructor.)
>
> I'm not a fan or "auto" everywhere, but with iterators,
> I find it OK:
>
>    auto el = this->m_environ_map.find (var);

Without the unordered_map, this is no longer a problem.  But of course,
thanks for the review.

>>  
>> -  if (s == 0)
>> +  if (el != this->m_environ_map.end ())
>>      {
>> -      if (i == e->allocated)
>> +      if (el->second == value)
>> +	return;
>> +      else
>>  	{
>> -	  e->allocated += 10;
>> -	  vector = (char **) xrealloc ((char *) vector,
>> -				       (e->allocated + 1) * sizeof (char *));
>> -	  e->vector = vector;
>> +	  /* If we found this item, it means that we have to update
>> +	     its value on the map.  */
>> +	  el->second = value;
>> +	  /* And we also have to update its value on the
>> +	     environ_vector.  For that, we just delete the item here
>> +	     and recreate it (with the new value) later.  */
>> +	  unset_in_vector (this->m_environ_vector, var);
>>  	}
>> -      vector[i + 1] = 0;
>>      }
>>    else
>> -    xfree (s);
>> -
>> -  s = (char *) xmalloc (len + strlen (value) + 2);
>> -  strcpy (s, var);
>> -  strcat (s, "=");
>> -  strcat (s, value);
>> -  vector[i] = s;
>> -
>> -  /* This used to handle setting the PATH and GNUTARGET variables
>> -     specially.  The latter has been replaced by "set gnutarget"
>> -     (which has worked since GDB 4.11).  The former affects searching
>> -     the PATH to find SHELL, and searching the PATH to find the
>> -     argument of "symbol-file" or "exec-file".  Maybe we should have
>> -     some kind of "set exec-path" for that.  But in any event, having
>> -     "set env" affect anything besides the inferior is a bad idea.
>> -     What if we want to change the environment we pass to the program
>> -     without afecting GDB's behavior?  */
>> -
>> -  return;
>> +    this->m_environ_map.insert (std::make_pair (var, value));
>> +
>> +  this->m_environ_vector.insert (this->m_environ_vector.end () - 1,
>> +				 xstrdup (std::string (var
>> +						       + '='
>> +						       + value).c_str ()));
>
> I don't really understand the "m_environ_vector.end () - 1" here.

This is needed because the last element of m_environ_vector needs to be
always NULL.  Therefore, this code is basically inserting the new
variable in the second-to-last position of the vector.  I made a comment
on top of it to clarify this part.

Also, there are other places where I need to iterate through the
elements of the vector, and I'm also using the "- 1" in these places.
I'll put comments where applicable.

> Also, that's a lot of unnecessary copying/duping in that last
> statement.  Using concat instead of xstrdup + operator+ would
> easily avoid it.

You're right, I've replaced it by a concat and it's simpler now.

> And you could easily avoid having to dup the string into both the
> map and the vector by making the map's value be a const char *
> instead of a std::string:
>
>   std::unordered_map<std::string, const char *>
>
> and then make a map's value point directly into the value
> part of the string that is owned by the vector (as in,
> the substring that starts at VALUE in "VAR=VALUE").

That was a good idea.

>>  }
>>  
>> -/* Remove the setting for variable VAR from environment E.  */
>> +/* See common/environ.h.  */
>>  
>>  void
>> -unset_in_environ (struct gdb_environ *e, const char *var)
>> +gdb_environ::unset (const std::string &var)
>>  {
>> -  int len = strlen (var);
>> -  char **vector = e->vector;
>> -  char *s;
>> +  this->m_environ_map.erase (var);
>> +  unset_in_vector (this->m_environ_vector, var);
>> +}
>
>
>> -extern struct gdb_environ *make_environ (void);
>> +class gdb_environ
>> +{
>> +public:
>> +  /* Regular constructor and destructor.  */
>> +  gdb_environ ();
>> +  ~gdb_environ ();
>>  
>> -extern void free_environ (struct gdb_environ *);
>> +  /* Reinitialize the environment stored.  This is used when the user
>> +     wants to delete all environment variables added by him/her.  */
>> +  void reinit ();
>
> This should mention that the initial state is copied from the host's
> environ.   I think the default ctor could use a similar comment.
> I was going to suggest to call this clear() instead, to go
> with the standard containers' terminology, until I realized what
> "init" really does.  Or maybe even find a more explicit name,
> reset_from_host_environ or some such.

Sorry, I guess I wasn't aware of the importance of specifying that the
variables come from the host.  Somehow I thought this was implied.

I guess reset_from_host_environ is a good name for the method; my other
option would be "reinit_using_host_environ", but that's longer.

> Perhaps an even clearer approach would be to make the default ctor
> not do a deep copy of the host's "environ", but instead add a
> static factor method that returned such a new gdb_environ, like:
>
> static gdb_environ
> gdb_environ::from_host_environ ()
> {
>    // build/return a gdb_environ that wraps the host's environ global.
> }
>
> Not sure.  Only experimenting would tell.

Sorry, I'm not sure I understand your suggestion.  Please correct me if
I'm wrong.

You're basically saying that the default ctor shouldn't actually do
anything; instead, the class should have this new from_host_environ
method which would be the "official" ctor.  The users would then call
from_host_environ directly when they wanted an instance of the class,
and the method would be responsible for initializing the object.  Is
that correct?

If yes, I think I fail to see the advantage of this method over having a
normal ctor (aside from explicitly naming the new ctor after the fact
that we're using the host environ to build the object).

... after some reading ...

Oh, I think I see what you're suggesting.  Instead of building one
gdb_environ every time someone requests it, we build just one and pass
it along.  OK, now it makes sense.

> Speaking of copying the host environ:
>
>>  _initialize_mi_cmd_env (void)
>>  {
>> -  struct gdb_environ *environment;
>> +  gdb_environ environment;
>>    const char *env;
>>  
>>    /* We want original execution path to reset to, if desired later.
>> @@ -278,13 +278,10 @@ _initialize_mi_cmd_env (void)
>>       current_inferior ()->environment.  Also, there's no obvious
>>       place where this code can be moved such that it surely run
>>       before any code possibly mangles original PATH.  */
>> -  environment = make_environ ();
>> -  init_environ (environment);
>> -  env = get_in_environ (environment, path_var_name);
>> +  env = environment.get (path_var_name);
>>  
>>    /* Can be null if path is not set.  */
>>    if (!env)
>>      env = "";
>>    orig_path = xstrdup (env);
>> -  free_environ (environment);
>>  }
>
> This usage of gdb_environ looks like pointless
> wrapping / dupping / freeing to me -- I don't see why we
> need to dup the whole host environment just to get at some
> env variable.  Using good old getenv(3) directly should do,
> and end up trimming off a bit of work from gdb's startup.

Absolutely.  I didn't pay close attention to this bit; I was just
mechanically converting the code.

> I could see perhaps wanting to avoid / optimize the linear
> walks that getenv must do, if we have many getenv calls in
> gdb, but then that suggests keeping a gdb_environ global that
> is initialized early on and is reused.  But that would miss
> any setenv call that changes the environment since that
> gdb_environ is created, so I'd prefer not do that unless
> we find a real need.

I'll make the code use getenv instead.

>> +
>> +private:
>> +  /* Helper function that initializes our data structures with the
>> +     environment variables.  */
>> +  void init ();
>> +
>> +  /* Helper function that clears our data structures.  */
>> +  void clear ();
>
> s/function/method/g throughout (ChangeLog too).

Ops, thanks.  Fixed.

>> +
>> +  /* A map representing the environment variables and their values.
>> +     It is easier to store them using a map because of the faster
>> +     lookups.  */
>
> It's not that it's easier to store them.  Not having a map
> would be even easier.  Just do a linear walk on the vector,
> just like getenv does.  You want to instead say that we're
> optimizing for get operations.  (I'm not sure this is the
> right trade off for this class, but I'll go along with it.)

Well, no need to worry about this anymore.  The new code doesn't use
unordered_map, as mentioned above.

>> +  std::unordered_map<std::string, std::string> m_environ_map;
>> +
>> +  /* A vector containing the environment variables.  This is useful
>> +     for when we need to obtain a 'char **' with all the existing
>> +     variables.  */
>
> This should say that the entries are in VAR=VALUE form, and what
> is the typical user that needs that format.

I'll expand the comment.

>> +  std::vector<char *> m_environ_vector;
>> +};
>>  
>>  #endif /* defined (ENVIRON_H) */
>
>>    else
>>      {
>> -      char **vector = environ_vector (current_inferior ()->environment);
>> +      const std::vector<char *> &vector =
>> +	current_inferior ()->environment.get_vector ();
>
> = goes on the next line.

Fixed.

>>  
>> -      while (*vector)
>> +      for (char *elem : vector)
>>  	{
>> -	  puts_filtered (*vector++);
>> +	  puts_filtered (elem);
>>  	  puts_filtered ("\n");
>>  	}
>>      }
>
> I'm not sure it's worth it to expose the std::vector implementation
> detail just for this loop.  I don't really understand how this
> can work though, given that the vector is NULL terminated.

I'm not exposing the std::vector anymore.  Instead, I'm just using the
regular 'char **' and looping until I find a NULL.

> I agree with Simon -- this is begging for some unit tests.  :-)

I'm writing them.  It's taking some time because it's the first time I'm
using the unittest framework.

Thanks,

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]