This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA] C++-ify parse_format_string


On 2017-11-24 07:54, Pedro Alves wrote:
On 11/24/2017 03:17 AM, Simon Marchi wrote:
On 2017-11-23 05:40 PM, Pedro Alves wrote:
On 11/23/2017 09:13 PM, Simon Marchi wrote:

I have a patch in some branch that does essentially the same thing, so
I was able to compare our approaches.  In my version, I removed the
big allocation that is shared among pieces, and made each piece have
its own std::string.  Unless we want to keep the current allocation
scheme for performance/memory usage reasons, I think that using
std::strings simplifies things in the parse_format_string function.
The format_pieces structure is replaced with an std::vector of
format_piece.

Sounds like a step backwards to me.  If it simplifies things, then
it sounds like it might be because we're missing some utility,
like string_view or something like that.

I am not sure I understand, can you expand a little bit about what you
have in mind?

I call it a step backwards because simplification is done at the
cost of efficiency (individual heap allocations, changing the
ownership model), when the original code avoided that. It's well known that
standard C++ is missing basic string manipulation utilities, but that's
no reason to force ourselves to consider std::string the ultimate tool, IMO. There's C++ the library, and then there's the C++ the language. We can't do anything about the latter, but we can extend the former. Later revisions of the standard are adding more facilities, like std::string_view in C++17, and now things like std::array_view / std::span (from gsl::span) have a very
good chance of making their way into C++20.  Many C++ projects end up
reinventing something like those types, hence the attempts at
standardization, making those types lingua franca for non-owning use cases.

Thanks for the detailed reply!

In my original reply, I said "Unless we want to keep the current allocation scheme for performance/memory usage reasons", removing the common allocation was done consciously. My thought was that the single allocation scheme had not been chosen for performance reason, but for simplicity. Computing how much to malloc for every piece would have been very annoying. Making each piece own its string made the code easier to reason about, but if the performance hit is not acceptable, I understand.

What I meant is that is changes things like this:

	    strncpy (current_substring, percent_loc, length_before_ll);
	    strcpy (current_substring + length_before_ll, "I64");
	    current_substring[length_before_ll + 3] =
	      percent_loc[length_before_ll + lcount];
	    current_substring += length_before_ll + 4;

into this

	    piece_string.assign (percent_loc, length_before_ll);
	    piece_string += "I64";
	    piece_string += percent_loc[length_before_ll + lcount];

Less magical number, less playing with offsets, etc.

I meant that you don't need for each piece_string to own its
own deep copy of the string to get rid of the magical numbers.
You can get that even with a couple C-like free-functions, like:

 // memcpy, null-terminate, and return advanced pointer.
 char *append (char *p, const char *str);
 char *append (char *p, const char *str, size_t len);

and then:

 char *p = current_substring;

 p = append (p, percent_loc, length_before_ll);
 p = append (p, "I64");
 p = append (p, percent_loc[length_before_ll + lcount]);

 current_substring = p;

You could wrap that in a class, like:

 /* Helper to build an null-terminated C string.  */
 struct zstring_builder
 {
    /* POS is a pointer to the position in buffer to work on.  */
    explicit zstring_builder (char *pos)
      : m_pos (pos)
    {}

    // memcpy, advance m_pos, null-terminate.
    zstring_builder &append (const char *s, size_type count);
    zstring_builder &append (const char* s);

    char *pos () { return m_pos; }

 private:
    char *m_pos;
 };

Or:

 /* Helper to build a null-terminated C string on top
    of a vector.  */
 struct zstring_builder
 {
    /* POS is a pointer to the position in buffer to work on.  */
    explicit zstring_builder (std::vector<char> &buf, size_t pos);

    // memcpy, advance m_pos, null-terminate.
    zstring_builder &append (const char *s, size_type count);
    zstring_builder &append (const char* s);

    char *pos () { return &m_vector[m_pos]; }

 private:
    std::vector &m_buf;
    size_t m_pos;
 };

I agree that something like this is good if we want to keep the current allocation scheme but make the code clearer. Since "append" continues the same string piece, I think we would need some kind of "finish" method (a bit like obstack) to be able to start a new string:

  zstring_builder builder (buf);
  builder.append ("Hello ");
  builder.append ("world!");
  const char *english = builder.finish ();
  builder.append ("Bonjour ");
  builder.append ("monde!");
  const char *french = builder.finish ();

giving something like this:

  Hello world!\0Bonjour monde!\0

or something else along those lines, if you now make piece_string
a zstring_builder, you can get similar, and just as readable code:

            zstring_builder piece_string (current_substring);

            piece_string.append (percent_loc, length_before_ll);
 	    piece_string.append ("I64");
 	    piece_string.append (percent_loc[length_before_ll + lcount]);

            current_substring = piece_string.pos ();

Alternatively, we could have something like a zstring_ref
class [1], as a non-owning mutable view of a C string (that never
reallocates), guaranteed to be null-terminated, and then the code
you shown would be just exactly the same. Open choices could be whether
to save the string size inside zstring_ref (like string_view), or
not saving a size, making it a really-thin wrapper around a
pointer instead.

std::string is great as an _owning_ SSO-enabled (which is sometimes a
curse when embedded in structures...) container of characters.  But
when we don't want an owning container, the right tool / refactor can
still make the code just as readable.

[1] - I've thought several times because that something like a zstring_view
(like string_view, but guarantees null-termination) would be handy in
a lot of places.  string_view isn't mutable, though, hence the
alternative name instead.  could be mut_string_view too, for example.

It's all performance/complexity tradeoffs, as usual :)

Simon


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]