This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

User Memory Read Failure Question


Hello,

I've built a userland tracing mechanism on top of SystemTap which I
and my colleagues have used fruitfully for quite a while.  Recently, I
got a report that some of my users on linux 3.18.16/x64/SystemTap 2.7
were seeing missing data in some of my probe logging; on the systems
in question, I can reliably reproduce the problem, but on other
similarly configured systems, I cannot reproduce it at all.  I have
found that reads for certain user addresses are reliably failing even
though examinations of /proc/${pid}/maps show that the regions are
mapped with read access; reads of the same addresses through
/proc/${pid}/mem and examination of core files both find the expected
values at the locations in question (SystemTap continues to fail to
read after that manual examination, so I do not believe that my manual
reads changed the state which is causing the problem).  Since I am
tracing an interpreter (Node.js 0.10) whose behavior I don't fully
understand, it's possible that the process itself is changing
permissions on the pages dynamically, causing the reads to fail.  I
haven't been able to disprove this possibility.  As I've been trying
to investigate, I've begun to wonder:

(a) Whether Linux may be unmapping the pages (but leaving them
resident) for access detection, and whether if that happened,
SystemTap would fail user reads to avoid potential recursive faulting
behavior.

(b) Whether there may be reasons for systemtap read failures other
than invalid mappings that I haven't anticipated but would be able to
check for.

(c) Whether there is a good recipe for getting at the page-level
permissions in the VM, from SystemTap context or otherwise (this would
of course be platform-specific; I can dig in and embed C if need be,
but I'm not experienced with the Linux VM).

Has anyone else debugged a problem like this?  Do you have any
insights or tooling you might recommend?  Thanks for your insight!

Dan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]