This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Harvard proposal

To: gdb at sources dot redhat dot com
Subject: Harvard proposal
From: Nick Duffek <nsd at redhat dot com>
Date: Sat, 10 Feb 2001 15:25:55 -0500
CC: cagney at redhat dot com, dje at transmeta dot com, taylor at cygnus dot com, kevinb at cygnus dot com, msnyder at cygnus dot com, jimb at cygnus dot com, per at bothner dot com, eliz at delorie dot com
Recently, I took a stab at converting CORE_ADDR to a struct.  It turned
out to be quite difficult, because there's a deeply-embedded assumption
that CORE_ADDR is an offset in a unified byte address space.

So instead, I wrote a patch that takes GDB partway toward the struct
core_addr goal.  It insert a bunch of CORE_ADDR conversion macros that
create abstraction barriers between various CORE_ADDR generators and
consumers (user, hardware, target, object file).

My original motivation for the patch was to allow users to see and specify
real addresses without needing to know about the usual 0x1000000/0x2000000
offset and bit-shift conversions.  Compensating for those conversions
during assembly debugging is tedious and potentially confusing.

In the following text, I describe the patch largely from that perspective.
However, the patch is general enough to allow architectures to make their
own choices about how to translate user-visible addresses.

The Problem
===========

GDB handles Harvard architectures by mapping instruction and data spaces
onto a single byte address space.

For example, d10v-tdep.c performs the following mapping:
  data: 0x2000000 + addr
  insn: 0x1000000 + (addr << 2)

The mapping is user-visible, which I think is problematic because:

1. The user is inconvenienced by needing to know bit shifts and arbitrary
data/instruction offsets when specifying and viewing addresses.

2. The user won't necessarily know how GDB will modify registers.  $pc and
$sp are obvious candidates for modification, but other registers may have
multiple roles.  For example, a link register may hold an instruction
address immediately after a subroutine call, but it may hold data
addresses or even integer values after it's been saved on the stack.

3. Generally speaking, the purpose of GDB is to provide accurate
information about software and hardware, and GDB's address translation
diminishes that accuracy.  GDB should reveal, not obscure.

4. Expression evaluation breaks in any number of cases.  For example, if
GDB is stopped just after the mvfc instruction in a call to the following
D10V function with n=3:

        ;; prime(n): return the nth prime number for 1 <= n <= 3.
        .text
        .global prime
prime:
        ;; save link register
        st r13,@-sp

        ;; scale n by jump target size, add offset
        add r0,r0
        addi r0,1

        ;; calculate and jump to target pc
        mvfc r1,pc
        add r1,r0
        jmp r1

        ;; 1st prime number
        ldi r0,2
        bra .L1

        ;; 2nd prime number
        ldi r0,3
        bra .L1

        ;; 3rd prime number
        ldi r0,5
        nop
.L1:
        ;; restore link register and return
        ld r13,@sp+
        jmp r13

the following commands work incorrectly:

  (gdb) x/i $r13
  0x501d:       sub     r0, r0  ||      sub     r0, r0
  (gdb) x/i $r1 + $r0
  0x502c:       sub     r0, r0  ||      sub     r0, r0

GDB doesn't (and can't) know that $r13 and $r1 hold instruction addresses
and $r0 holds an instruction offset, so it doesn't apply the necessary
internal conversions before querying memory.  To compensate, the user
needs to enter the following:

  (gdb) x/i ($r13 << 2) + 0x1000000
  0x1014074 <main+24>:  mv      r1, r0  ->      mv      r0, r1
  (gdb) x/i ($r1 << 2) + 0x1000000 + ($r0 << 2)
  0x10140b0 <prime+40>: ldi.s   r0, 0x5 ||      nop     

Similar problems occur when dereferencing data addresses in registers.

A Solution
==========

Change GDB to treat user-visible addresses as real hardware addresses.

As has been discussed in other threads, this approach reveals the
ambiguity inherent in Harvard architectures.  For example, should "x/i 0"
disassemble the first word of the instruction space or the data space?

An obvious disambiguator is an address syntax extension that indicates the
address space.  In separate threads, Doug Evans proposed a "<space>:"
prefix and Per Bothner proposed a "@<space>" suffix.  E.g.:

  x/i insn:0

would disassemble instruction address 0 and

  x/i data:0

would disassemble data address 0.

I think that in the absence of the disambiguator, GDB should pick a
reasonable default, e.g. "x/i 0" would disassemble instruction address 0.
That worked well in the two (not-yet-public) ports that use this patch.

An Implementation
=================

Conceptually partition GDB into components that might have a unique
interpretation of CORE_ADDR, e.g.:

  user             addresses displayed to and received from the GDB user
  remote           addresses specified to remote target for memory I/O
  hardware         addresses written to/read from memory or registers
  object files     symbol addresses
  internal GDB     all other occurrences of CORE_ADDR

and apply appropriate conversions when crossing boundaries between those
components.  The patch does that using gdbarch macros with the following
nomenclature:

  ADDR_<direction>_<component>[_<space>]

  <direction>
     IN   moving to internal GDB from another component
     OUT  moving from internal GDB to another component

  <component>   
     REAL    user-visible and hardware addresses
     OBJ     symbol and entry-point addresses in object files
     GDB     internal GDB addresses
     SEC     offset in an object file section
     REMOTE  addresses specified to remote target for memory I/O

  <space>
     INSN  instruction space
     DATA  data space
     SEC   infer space from the struct sec argument
     TYPE  infer space from the struct type argument

For example, ADDR_IN_REAL_TYPE (CORE_ADDR addr, struct type *type) returns
the real address ADDR of a TYPE object converted to an internal gdb
address.  I've appended the current list of ADDR_* macros to this message.

[ADDR_IN_REAL_TYPE is identical to the existing POINTER_TO_ADDRESS; I
chose the alternative ADDR_* nomenclature because it results in shorter
names and reflects the hierarchical relationships between the macros.]

Architectures can use the ADDR_* macros to map multiple address spaces
into internal GDB CORE_ADDRs.  For convenience, I wrote a harvard.c module
that handles simple d10v-ish bit-shift and bit-offset mappings to and from
the current internal GDB unified byte address space.  The interface is:

extern void harvard_init (struct gdbarch *gdbarch,
                          CORE_ADDR gdb_data_off, int gdb_data_shift,
                          CORE_ADDR gdb_insn_off, int gdb_insn_shift,
                          CORE_ADDR obj_data_off, int obj_data_shift,
                          CORE_ADDR obj_insn_off, int obj_insn_shift,
                          CORE_ADDR remote_data_off, int remote_data_shift,
                          CORE_ADDR remote_insn_off, int remote_insn_shift);

I might split that into multiple calls to allow for future components.

I'll post the actual patch soon, and if people like the idea, I'll try
converting d10v-tdep.c to use it.

What do you think?

Nick

[gdbarch macros follow]

  ADDR_IN_REAL_DATA (CORE_ADDR addr)
    Return real data address ADDR converted to an internal gdb address.
  ADDR_IN_REAL_INSN (CORE_ADDR addr)
    Return real instruction address ADDR converted to an internal gdb
    address.
  ADDR_IN_REAL_SEC (CORE_ADDR addr, struct sec *sec)
    Return real address ADDR in SEC converted to an internal gdb address.
  ADDR_IN_REAL_TYPE (CORE_ADDR addr, struct type *type)
    Return real address ADDR of a TYPE object converted to an internal gdb
    address.
  ADDR_IN_OBJ_DATA (CORE_ADDR addr)
    Return object file data address ADDR converted to an internal gdb
    address.
  ADDR_IN_OBJ_INSN (CORE_ADDR addr)
    Return object file instruction address ADDR converted to an internal
    gdb address.
  ADDR_IN_OBJ_SEC (CORE_ADDR addr, struct sec *sec)
    Return object file address ADDR in SEC converted to an internal gdb
    address.
  ADDR_IN_OBJ (CORE_ADDR addr)
    Return object file address ADDR converted to an internal gdb address.
  ADDR_IN_OBJ_P ()
    Whether to apply ADDR_IN_OBJ* conversions.
  ADDR_IN_GDB_INSN (CORE_ADDR addr)
    Return internal gdb address ADDR converted to an internal gdb
    instruction address if it isn't one already.
  ADDR_OUT_REAL (CORE_ADDR addr)
    Return internal gdb address ADDR converted to a real address.
  ADDR_OUT_OBJ (CORE_ADDR addr)
    Return internal gdb address ADDR converted to an object file address.
  ADDR_OUT_SEC (CORE_ADDR addr)
    Return internal gdb address ADDR converted to an offset from the start
    of its section.
  ADDR_OUT_REMOTE (CORE_ADDR addr)
    Return internal gdb address ADDR converted to a remote address.
  ADDROFF_OUT_REAL (CORE_ADDR addr, CORE_ADDR offset)
    Return internal gdb address OFFSET from ADDR converted to a real
    address offset.
Follow-Ups:
- Re: Harvard proposal
  - From: Per Bothner
- Re: Harvard proposal
  - From: Andrew Cagney
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]