This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.


Yao Qi writes:
 > Hi, this is an optimization to dcache reading contents from target
 > memory.  Nowadays, when GDB requests to read target memory and
 > requests go through dcache, dcache will read one cache line in on
 > time, regardless the size of the requested data.  If GDB read a large
 > amount of data from target, dcache will read multiple times from
 > target memory (read in one cache line per time).  In remote debugging,
 > it means multiple RSP packets to transfer memory from GDBserver, which
 > is slow.
 > 
 > This patch is to teach dcache to read continuous target memory as much
 > as possible in one time, and update the multiple cache lines when the
 > contents are read in.  It can be done by several steps:
 > 
 >  1.  When GDB requests to read data [memaddr, memaddr + len), a
 > collection of ranges is created to record readable ranges, because
 > some memory may be marked as write-only.
 >  2.  Then, we'll check the cache state of these readable ranges.  Some
 > of them are cached, and some are not.  We record the uncached ranges.
 >  3.  Iterate the collection of uncached ranges, and issue target_read
 > to read these uncached ranges from the target memory and update cache
 > lines.  For cached ranges, read from cache lines directly.
 > 
 > I am using a perf test case backtrace to measure the speed-up of this
 > patch.  Every time, 'set dcache line-size N' and
 > 'set dcache size 4096 * 64 / N', to make sure the total size of dcache
 > is unchanged.
 > 
 > With this patch, the number of 'm' RSP packet is reduced
 > dramatically:
 > 
 > cache line size:	Original        Patched
 > 2			4657894         31224
 > 4       		2317896         28616
 > 8			158948          21462
 > 16			579474          14308
 > 32			293314          14308
 > 64			150234          14308
 > 128			78694           10738
 > 256			42938           8960
 > 512			25046           8064
 > 1024			16100           7616
 > 2048			9184            7392
 > 
 > Performance comparison:
 > 
 >                         cache line size  Patched Original
 > backtrace        cpu_time        2       4.44    33.83
 > backtrace        cpu_time        4       3.88    14.27
 > backtrace        cpu_time        8       3.1     7.92
 > backtrace        cpu_time        16      2.48    4.79
 > backtrace        cpu_time        32      2.25    2.51
 > backtrace        cpu_time        64      1.16    1.93
 > backtrace        cpu_time        128     1.02    1.69
 > backtrace        cpu_time        256     1.06    1.37
 > backtrace        cpu_time        512     1.11    1.17
 > backtrace        cpu_time        1024    1.1     1.22
 > backtrace        cpu_time        2048    1.13    1.17
 > backtrace        wall_time       2       5.49653506279   74.0839848518
 > backtrace        wall_time       4       4.70916986465   29.94830513
 > backtrace        wall_time       8       4.11279582977   15.6743021011
 > backtrace        wall_time       16      3.68633985519   8.83114910126
 > backtrace        wall_time       32      3.63511800766   5.79059791565
 > backtrace        wall_time       64      1.61371517181   3.67003703117
 > backtrace        wall_time       128     1.50599694252   2.60381913185
 > backtrace        wall_time       256     1.47533297539   2.05611109734
 > backtrace        wall_time       512     1.48193001747   1.80505800247
 > backtrace        wall_time       1024    1.50955080986   1.69646501541
 > backtrace        wall_time       2048    1.54235315323   1.61461496353
 > backtrace        vmsize          2       104568          104576
 > backtrace        vmsize          4       100556          102388
 > backtrace        vmsize          8       95384   97540
 > backtrace        vmsize          16      94092   94092
 > backtrace        vmsize          32      93348   93276
 > backtrace        vmsize          64      93148   92928
 > backtrace        vmsize          128     93148   93100
 > backtrace        vmsize          256     93148   93100
 > backtrace        vmsize          512     93148   93100
 > backtrace        vmsize          1024    93148   93100
 > backtrace        vmsize          2048    93148   93100
 > 
 > gdb:
 > 
 > 2013-10-18  Yao Qi  <yao@codesourcery.com>
 > 
 > 	* dcache.c: Include "memrange.h".
 > 	Update comments.
 > 	(dcache_read_line): Remove.
 > 	(dcache_peek_byte): Remove.
 > 	(dcache_ranges_readable): New function.
 > 	(dcache_ranges_uncached): New function.
 > 	(dcache_xfer_memory): Read multiple cache lines from target
 > 	memory in one time.

Hi.
While I understand the improvement is nice, it does add complexity to the code.
[I'm not suggesting this patch isn't justified.
But I think we do need to weigh complexity vs actual speed-ups, and
I'm in the "Need more data." state at the moment.]

The data for cache line sizes less than the default (64 bytes)
is nice but not compelling.
E.g., What if we just increase the cache line size to 256, say?
Or 1024?  And/or maybe auto-adjust it based on
"remote memory-read-packet-size" or some such?
[I'm not arguing for anything in particular.
Again, I'm just collecting data.]


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]