This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Some elfutils/libdw/cfi comments


> I might be missing something, but in case the is a "R" augmentation in
> the CIE, and I only have udata4/8, it seems I am unable to figure out
> whether the addresses are pc/text/data/func encoded.

No, no, no.  Only the low bits get canonicalized from absptr to udata[48].
The low nibble is "FDE data encoding", the high nibble is "FDE flags".
Actually the &0x70 bits are not bit flags as that moniker would imply, they
are a single modifier ("relativeness flavor"), whereas 0x80 is the only
independent flag bit (DW_EH_PE_indirect).  Since DW_EH_PE_absptr is zero,
it's ambiguous whether it's the encoding or the modifier or both.  But I
was talking about it as the encoding.  So the only canonicalization is from
"implicit address size" to "address size this CIE indicates".  

Since nobody actually emits the v4 format with R augmentation as yet, it
was just me who decided that the implicit address size should be the CIE
address_size for v4.  I guess maybe any future v4 generator could as well
never use 0 as the encoding so it's never implicit, which would be better.
AFAIK the encoding byte being 0 is really only ever when it's the default
from there having been no R augmentation.  Looking at the current gcc and
gas code (both have emitters), gcc omits R when it is just absolute and gas
always emits R but always with sdata[248] (maybe |pcrel).

So might even say that an R augmentation with a 0 encoding byte is invalid.
(But we probably shouldn't do that.)  Then the only "canonicalization" is
that if there was no R augmentation (i.e. normal for .debug_frame format,
and for .eh_frame format in non-PIC code from gcc AFAICT), then what I'm
telling you is the CIE address size (explicit in v4 when anything ever
generates that, implicit otherwise).

I imagine that one day we might have e.g. a compressor feature that eats
.eh_frame and .debug_frame and reformats them to v4 .debug_frame format
where it can use udata4 encoding on a 64-bit machine for the common case
that all the address constants are < 4GB, to make the data smaller.  I
envision this for separate debuginfo purposes--copying .eh_frame in some
form is useful to be able to unwind in core dumps with only the debuginfo
and not having the binary's text from the main package on hand.  But in
fact it makes perfect sense for your use as well.  The stap unwinder can be
compile-time macroified for 4/8 size decoding, and pick at translation time
the smaller size if in transliteration you find no need to store high bits.

> > I don't follow.  This is exactly why you need to know the actual encoding
> > in use in an unambiguous way, rather than the actual encoding byte in the
> > header, whose meaning can depend on the address_size header field.
> 
> Don't I need both then?

Huh?  What dwarf_cfi_validate_fde has told you is exactly how to decode the
data.  What do you need to know the R augmentation byte for?

> Yes, I have the CIE_pointer, but I might not yet have seen that CIE,
> since (theoretically, I never seen it in practice) it might be after the
> FDE we are currently inspecting.

I did see that in practice somewhere.  Right, but it's a direct pointer, so
you can decode its header with a dwarf_next_cfi call right then.  Of
course, that's what libdw has just done internally and cached the info.

> Looking at the (eh_frame) FDE definition from:
> http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html
> and FDE can have its own augmentation data, which can describe the LSDA.
> Looking at libdw intern_fde it seems this data is always just discarded?

Yes, we just skip it.  Whatever that stuff is, it doesn't affect unwinding.
It's only used by the runtime EH code (whatever it the language runtime
calls _Unwind_GetLanguageSpecificData).  You'd only need to preserve it if
you want to figure out something about what EH would do, not for anything
that DWARF CFI per se can tell you.  (You have now caused me to start
reading libstdc++-v3/libsupc++/eh_personality.cc, and I don't think I
wanted to know.)

> That would also be fine, but it seems that using the offset from
> dwarf_next_cfi() is that iterator (when ! dwarf_cfi_cie_p).

Well, yes, but it's repetitive since libdw repeats that call for its
interning.  And you probably don't care about iterating on CIEs per se.  If
there are unreferenced CIEs, so what?  You just want to transcode whatever
the FDEs point to.

Sigh.  I was trying to give you a reasonably clean quick hack that we could
tie up this week to slightly simplify your whole crazy task.  But now we
are discussing things on the slippery slope to all the compression and
transcoding work for CFI that I'd vaguely envisioned for the long term.
But I'd thought of doing that in a much different way than the
half-measures you are doing here, where you are just validating and copying
the data.  I'd thought of a from-scratch writer at the high level, where
you are feeding it the fully-decoded output (more or less the whole stream
of Dwarf_Frame's) and it is choosing the optimal program and encoding,
deciding what all CIEs are useful to have and where to draw the FDE
boundaries for optimal overall size, etc.  Then it would be like the DIE
tree plan, where we read in to high level and then encode optimally in a
from-scratch writer.  Then all that could also be used by something like a
smart assembler, that just feeds register loads/stores (as DWARF locations)
and SP adjustments to the CFI writer.  But as compression per se there is
probably damn little payoff to doing all that, so it is hard to justify all
the work any year soon.


Thanks,
Roland

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]