Bug 27151

Summary: Step will skip subsequent statements for malloc functions
Product: gdb Reporter: Anonymous <iamanonymous.cs>
Component: breakpointsAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: iamanonymous.cs, jiangyy, vries
Priority: P2    
Version: HEAD   
Target Milestone: 11.1   
Host: Target:
Build: Last reconfirmed: 2021-01-06 00:00:00

Description Anonymous 2021-01-05 15:31:42 UTC
Consider the following test case:
---
$ cat small.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
  int *p = (int *)malloc(sizeof(int)*4); // L7
  memset(p, 0, sizeof(p));  // L8
  printf("p[0] = %d; p[3] = %d\n", p[0], p[3]); // L9
  return 0;  // L10
}
---

$ gcc -O0 -g small.c; gdb -q a.out
Reading symbols from a.out...
(gdb) start
Temporary breakpoint 1 at 0x1195: file small.c, line 7.
Starting program: /home/yibiao/DeVIL/a.out 

Temporary breakpoint 1, main () at small.c:7
7	  int *p = (int *)malloc(sizeof(int)*4); // L7
(gdb) step
p[0] = 0; p[3] = 0
[Inferior 1 (process 66988) exited normally]
(gdb)

###########################################3
We can found that L8, L9, and L10 are skipped when stepping with "step".
However, p[0], p[3] are printed. That's to say, both L8 and L9 are executed. 


When inspecting line-table, we can found that L8, L9, L10 all in the line table and with "IS-STMT" is true as follows:

(gdb) maint info line-table
objfile: /home/yibiao/DeVIL/a.out ((struct objfile *) 0x561aabac7410)
compunit_symtab: ((struct compunit_symtab *) 0x561aababd3f0)
symtab: /home/yibiao/DeVIL/small.c ((struct symtab *) 0x561aababd470)
linetable: ((struct linetable *) 0x561aabb08790):
INDEX  LINE   ADDRESS            IS-STMT 
0      6      0x0000555555555189 Y 
1      7      0x0000555555555195 Y 
2      8      0x00005555555551a3 Y 
3      9      0x00005555555551b9 Y 
4      9      0x00005555555551c1 Y 
5      10     0x00005555555551dc Y 
6      11     0x00005555555551e1 Y 
7      END    0x00005555555551e3 Y
 

Thus, we believe this should be a bug of GDB

-------------gdb and gcc version-------------
$ gdb --version
GNU gdb (GDB) 11.0.50.20201224-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ gcc --version
gcc (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


========================================================================
Note that, we reported this issue in PR27120. At that time, we are not sure whether this is a bug or a feature. Thanks to the detailed steps in PR27126 by Tom de Vries, we can confirm that this should be a bug of GDB.
Comment 1 Tom de Vries 2021-01-06 12:22:03 UTC
I managed to reproduce this on ubuntu 20.

Configurations:
- gcc-10, system gdb,
- gcc-10, gdb build from current trunk.

The problem goes away when small.c is build with fcf-protection=none.

I tried to reproduce this on my usual setup, openSUSE Leap 15.2, by forcing fcf-protection=full.  Didn't reproduce.

Copied Leap executable to ubuntu, and tried using gdb there.  Didn't reproduce.

Then copied ubuntu executable to Leap.  Reproduced.

So, sofar this seems specific to the ubuntu executable.

The two executables have similar line info and insns for main.

There is a difference in the plt.

For leap, we have:
...
00000000000005f0 <malloc@plt>:
 5f0:   ff 25 32 0a 20 00       jmpq   *0x200a32(%rip) \
          # 201028 <malloc@GLIBC_2.2.5>
 5f6:   68 02 00 00 00          pushq  $0x2
 5fb:   e9 c0 ff ff ff          jmpq   5c0 <.plt>
...

For ubuntu, we have:
...
0000000000001090 <malloc@plt>:
    1090:       f3 0f 1e fa             endbr64
    1094:       f2 ff 25 35 2f 00 00    bnd jmpq *0x2f35(%rip) \
                  # 3fd0 <malloc@GLIBC_2.2.5>
    109b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
...

Using "set debug infrun 1", with leap we have:
...
[infrun] handle_signal_stop: stop_pc=0x5555555545f0
[infrun] process_event_stop_test: stepped into dynsym resolve code
...
where:
...
(gdb) info sym 0x5555555545f0        
malloc@plt in section .plt of /home/vries/gdb_versions/devel/a.leap.out
...

But with ubuntu we have:
...
[infrun] handle_signal_stop: stop_pc=0x555555555090
[infrun] process_event_stop_test: stepped into subroutine
[infrun] insert_step_resume_breakpoint_at_sal_1: inserting step-resume breakpoint at 0x7ffff7df0710
...
where:
...
(gdb) info sym 0x555555555090
malloc@plt in section .plt.sec of /home/vries/gdb_versions/devel/a.out
...
and:
...
(gdb) info sym 0x7ffff7df0710
malloc in section .text of /lib64/ld-linux-x86-64.so.2
...
Looking for the "stepped into dynsym resolve code" in the gdb sources, we find in_solib_dynsym_resolve_code, which returns false with the ubuntu exec, and true with the leap exec.
Comment 2 Tom de Vries 2021-01-06 12:22:11 UTC
This fixes it:
...
diff --git a/gdb/objfiles.h b/gdb/objfiles.h
index b9bb80b7a62..2afd2f80154 100644
--- a/gdb/objfiles.h
+++ b/gdb/objfiles.h
@@ -786,7 +786,8 @@ extern int pc_in_section (CORE_ADDR, const char *);
 static inline int
 in_plt_section (CORE_ADDR pc)
 {
-  return pc_in_section (pc, ".plt");
+  return (pc_in_section (pc, ".plt")
+         || pc_in_section (pc, ".plt.sec"));
 }
 
 /* Keep a registry of per-objfile data-pointers required by other GDB
...
Comment 3 Tom de Vries 2021-01-06 12:25:57 UTC
*** Bug 27120 has been marked as a duplicate of this bug. ***
Comment 4 Tom de Vries 2021-01-06 12:28:17 UTC
*** Bug 25565 has been marked as a duplicate of this bug. ***
Comment 5 Anonymous 2021-01-06 13:44:47 UTC
(In reply to Tom de Vries from comment #2)
> This fixes it:
> ...
> diff --git a/gdb/objfiles.h b/gdb/objfiles.h
> index b9bb80b7a62..2afd2f80154 100644
> --- a/gdb/objfiles.h
> +++ b/gdb/objfiles.h
> @@ -786,7 +786,8 @@ extern int pc_in_section (CORE_ADDR, const char *);
>  static inline int
>  in_plt_section (CORE_ADDR pc)
>  {
> -  return pc_in_section (pc, ".plt");
> +  return (pc_in_section (pc, ".plt")
> +         || pc_in_section (pc, ".plt.sec"));
>  }
>  
>  /* Keep a registry of per-objfile data-pointers required by other GDB
> ...

Thanks a lot. I have try this commit and it indeed fix all similar problems.
Comment 6 Tom de Vries 2021-01-06 13:53:38 UTC
This is slightly more convoluted.

I tried to reproduce the problem on openSUSE Factory.  Using -fcf-protection=full, there I managed to get a .plt.sec section.  But gdb handled it ok.

It did not take the "stepped into dynsym resolve code" path, but handled things fine along another path.

So I debugged once more the ubuntu exec on leap.  I found that at some point we do:
...
      /* If we are in a function call trampoline (a stub between the                          
         calling routine and the real function), locate the real                              
         function.  That's what tells us (a) whether we want to step                          
         into it at all, and (b) what prologue we want to run to the                          
         end of, if we do step into it.  */
      real_stop_pc = skip_language_trampoline (frame, stop_pc);
...
and end up in objc_language::skip_trampoline, and then in gdbarch_skip_trampoline_code, and then in find_solib_trampoline_target:
...
/* If PC is in a shared library trampoline code stub, return the                              
   address of the `real' function belonging to the stub.                                      
   Return 0 if PC is not in a trampoline code stub or if the real                             
   function is not found in the minimal symbol table.                                         
                                                                                              
   We may fail to find the right function if a function with the                              
   same name is defined in more than one shared library, but this                             
   is considered bad programming style.  We could return 0 if we find                         
   a duplicate function in case this matters someday.  */

CORE_ADDR
find_solib_trampoline_target (struct frame_info *frame, CORE_ADDR pc)
{
  struct minimal_symbol *tsymbol = lookup_solib_trampoline_symbol_by_pc (pc);
  if (tsymbol != NULL)
    {
      for (objfile *objfile : current_program_space->objfiles ())
        {
          for (minimal_symbol *msymbol : objfile->msymbols ())
            {

...

So, we find that the pc is a trampoline for malloc, and start iterating over the minsyms in the objfiles.

With openSUSE Leap (glibc 2.26), we find this as first match:
...
$ nm /lib64/ld-linux-x86-64.so.2  | grep malloc
0000000000019710 W malloc
...

With openSUSE Factory (glibc 2.32), we have instead rtld_malloc so skip_language_trampoline returns 0.
Comment 7 Tom de Vries 2021-01-06 15:37:44 UTC
Patch submitted: https://sourceware.org/pipermail/gdb-patches/2021-January/174738.html
Comment 8 Sourceware Commits 2021-01-14 09:35:38 UTC
The master branch has been updated by Tom de Vries <vries@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=5fae2a2c66ca865f54505adb37be6bd51fecb6cd

commit 5fae2a2c66ca865f54505adb37be6bd51fecb6cd
Author: Tom de Vries <tdevries@suse.de>
Date:   Thu Jan 14 10:35:34 2021 +0100

    [gdb/breakpoint] Handle .plt.sec in in_plt_section
    
    Consider the following test-case small.c:
    ...
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
    
     int main (void) {
       int *p = (int *)malloc (sizeof(int) * 4);
       memset (p, 0, sizeof(p));
       printf ("p[0] = %d; p[3] = %d\n", p[0], p[3]);
       return 0;
     }
    ...
    
    On Ubuntu 20.04, we get:
    ...
    $ gcc -O0 -g small.c
    $ gdb -batch a.out -ex start -ex step
    Temporary breakpoint 1, main () at small.c:6
    6         int *p = (int *) malloc(sizeof(int) * 4);
    p[0] = 0; p[3] = 0
    [Inferior 1 (process $dec) exited normally]
    ...
    but after switching off the on-by-default fcf-protection, we get the desired
    behaviour:
    ...
    $ gcc -O0 -g small.c -fcf-protection=none
    $ gdb -batch a.out -ex start -ex step
    Temporary breakpoint 1, main () at small.c:6
    6         int *p = (int *) malloc(sizeof(int) * 4);
    7         memset (p, 0, sizeof(p));
    ...
    
    Using "set debug infrun 1", the first observable difference between the two
    debug sessions is that with -fcf-protection=none we get:
    ...
    [infrun] process_event_stop_test: stepped into dynsym resolve code
    ...
    In this case, "in_solib_dynsym_resolve_code (malloc@plt)" returns true because
    "in_plt_section (malloc@plt)" returns true.
    
    With -fcf-protection=full, "in_solib_dynsym_resolve_code (malloc@plt)" returns
    false because "in_plt_section (malloc@plt)" returns false, because the section
    name for malloc@plt is .plt.sec instead of .plt, which is not handled in
    in_plt_section:
    ...
    static inline int
    in_plt_section (CORE_ADDR pc)
    {
      return pc_in_section (pc, ".plt");
    }
    ...
    
    Fix this by handling .plt.sec in in_plt_section.
    
    Tested on x86_64-linux.
    
    [ Another requirement to be able to reproduce this is to have a dynamic linker
    with a "malloc" minimal symbol, which causes find_solib_trampoline_target to
    find it, such that skip_language_trampoline returns the address for the
    dynamic linkers malloc.  This causes the step machinery to set a breakpoint
    there, and to continue, expecting to hit it.  Obviously, we execute glibc's
    malloc instead, so the breakpoint is not hit and we continue to program
    completion. ]
    
    gdb/ChangeLog:
    
    2021-01-14  Tom de Vries  <tdevries@suse.de>
    
            PR breakpoints/27151
            * objfiles.h (in_plt_section): Handle .plt.sec.
Comment 9 Tom de Vries 2021-01-14 09:41:03 UTC
Patch committed, marking resolved-fixed.

No test-case.  Triggering the error condition depends on external factors, so I'm not sure I'll be able to make one.

BTW, my guess is that there are already test-cases that fail because of this on Ubuntu 20.04.