This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: more transport questions


Frank Ch. Eigler writes:
 > Hi -
 > 
 > I'm just beginning to look deeper into this portion of the runtime
 > code, as I'm planning to pull in the core parts of stpd into the
 > translator/driver.
 > 
 > I wonder: is there some reason (ideally: data) to believe that the
 > intricate multi-threaded relayfs copying and subsequent cpu_merging is
 > necessary?  For instance, how bad would a simpler, single-threaded
 > multi-fd poll() work in the relayfs reading loop?  Is merging

It probably doesn't need to be done using multiple threads - it's
based on the relay-apps code that does it that way, and I didn't see
any reason to change it.  BTW, the per-cpu threads are used only for
writing the per-cpu data as it comes in - the merging is done later in
the main thread.

 > on-the-fly impractical?  On the transmission kernel-probe side, would
 > a relay_flush() after every probe handler be impractical?

Merging on-the-fly would be practical if you flushed every per-cpu
channel at the same time i.e. forced each per-cpu buffer to finish the
current sub-buffer so it could be read, but that would defeat the
purpose of using relayfs for high-speed logging.  relay_flush() also
isn't safe to call unless you know there's nothing actively logging
into a buffer.

Martin did create a more efficient sort/merge of the data, but one
question that came up then was whether the per-cpu data should be
merged or sorted at all - it makes sense for relatively small amounts,
but if you're talking about huge quantities of data, say many
gigabytes, it seems impractical to merge it all into a single file,
and why is it necessary?  Wouldn't any postprocessing tool that wanted
to make sense of that much data also need to be able to deal with it
in a more efficient way than by reading it linearly from a single
file?  That would argue for some sort of indexing scheme where for
instance data is written in constant-sized blocks and each block would
contain start/end event identifiers identifying the events contained
within that block.  In that case it wouldn't really matter whether the
data is in a single file or in multiple files (probably better to keep
it in multiple files actually).

Tom




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]