This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: dwarflint


> Speaking of which: I never know what should be strict and what should be 
> warning.  Warning/error is easy: warnings are for suspicious constructs, 
> errors for outright standard violations.  Now enter strict.  Why are 
> strict checks strict?  Are these even more anal checks?  In that case, 
> do strict errors make sense at all?

The way I think about it is that there are several kinds of checks.  Each
kind has different characteristics.  I think we should write the code so
each at check we express which kind it is.  Later on, we can tune the
definitions of which checks are on by default, which are "warning" vs
"error", which are in a "expected from version N of producer FOO" set, etc.

First, format parsing checks.  These are the hard errors (bad LEBs, missing
terminators, etc.) that mean libdw cannot even parse the section for the
most basic purposes.  These are always errors and cannot be disabled.  If
these fail, we have to suppress all the other checks that would use the
section.  So, if .debug_info or .debug_abbrev have these errors, we cannot
do anything on the other sections except for their basic format parsing
checks.  DW_AT_sibling correctness (but not missing/extra) is in this
category, since higher-level checks will use libdw and would go wrong.
Also, refs are to the beginning of entries, since libdw will try to jump
directly to the offset.

If the "left hand side" (aranges, pubnames) sections have format errors,
that doesn't affect anything but the high-level checks on those sections.
If the "right hand side" sections (ranges, loc, etc.) have format errors,
that should just short-circuit individual checks that require those values.
(e.g. anything that examines DW_AT_ranges when .debug_ranges is bad.)  This
includes positive connectivity checks (dangling pointers are invalid), but
not negative connectivity checks (unreferenced entries are "bloat" or
"suspicious bloat", below).  (This is for the pointers into other sections,
because bad refs (and bad abbrev table pointers in CU headers) are fatal
format parsing checks.)

Next, there is bloat.  That is, unused holes in sections and the like, at
the low level.  The characteristic of "bloat" is that libdw's control flow
will never be affected by its presence--it just takes up space in files,
memory, etc.  (You can have an extra 20MB of .debug_abbrev in between the
regions used by a CU, and no consumer ever looks at a single extra byte.)

Next, suspicious bloat.  This is stuff that's harmless like bloat in
general, but is especially suspect, such as unused bytes that are nonzero.

Next, suboptimal bloat.  This is also extra stuff, but it can slow down the
consumer navigating the data.  This includes superfluous encoding, like
has_children with no children, useless DW_AT_sibling.  Also two same-named
attrs in one DIE (which is actually clearly wrong, but also harmless bloat).

Next, "very suboptimal".  This includes missing DW_AT_sibling.

We might need to split these out into different known kinds of
bloat/suboptimality when we get to collating the "known lousy producers"
option sets.  Be sure to write it in a way where it stays simple to switch
the keyword associated with a particular check/complaint to a different one
that goes with a new option.

All bloat/suboptimal are "warnings" in the sense that they don't mean a
consumer will be confused.  (The messages should be clear about what is
"invalid" and what is "linty", and that's what I've meant by error/warning
I guess.)  The exit status will be failure for warnings too (unless the
particular one is suppressed by an option).

All those are in "format errors"/"format warnings" category, and probably
the messages should clearly distinguish this class of check from the later
ones.  When we get to the high-level checks those will include "wrong by
the spec" errors (could well confuse a consumer) as well as "bloat",
"missing", etc., subcategories to consider.  But it's an important
distinction that all that is in the "data errors"/"data warnings" camp.
(Format errors prevent you from reading the file and representing it in the
conceptual terms of the standard, like an illegible piece of paper.  Data
errors mean you have a well-formed tree, as if you can read what I've
written, pronounce each word, and understand the punctuation, but you think
I might not know what the hell I'm talking about.)


Thanks,
Roland

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]