This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
- From: Doug Evans <dje at google dot com>
- To: Yao Qi <yao at codesourcery dot com>
- Cc: <gdb-patches at sourceware dot org>
- Date: Mon, 21 Oct 2013 13:38:50 -0700
- Subject: Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
- Authentication-results: sourceware.org; auth=none
- References: <1382066466-2551-1-git-send-email-yao at codesourcery dot com> <1382066466-2551-3-git-send-email-yao at codesourcery dot com>
Yao Qi writes:
> Hi, this is an optimization to dcache reading contents from target
> memory. Nowadays, when GDB requests to read target memory and
> requests go through dcache, dcache will read one cache line in on
> time, regardless the size of the requested data. If GDB read a large
> amount of data from target, dcache will read multiple times from
> target memory (read in one cache line per time). In remote debugging,
> it means multiple RSP packets to transfer memory from GDBserver, which
> is slow.
>
> This patch is to teach dcache to read continuous target memory as much
> as possible in one time, and update the multiple cache lines when the
> contents are read in. It can be done by several steps:
>
> 1. When GDB requests to read data [memaddr, memaddr + len), a
> collection of ranges is created to record readable ranges, because
> some memory may be marked as write-only.
> 2. Then, we'll check the cache state of these readable ranges. Some
> of them are cached, and some are not. We record the uncached ranges.
> 3. Iterate the collection of uncached ranges, and issue target_read
> to read these uncached ranges from the target memory and update cache
> lines. For cached ranges, read from cache lines directly.
>
> I am using a perf test case backtrace to measure the speed-up of this
> patch. Every time, 'set dcache line-size N' and
> 'set dcache size 4096 * 64 / N', to make sure the total size of dcache
> is unchanged.
>
> With this patch, the number of 'm' RSP packet is reduced
> dramatically:
>
> cache line size: Original Patched
> 2 4657894 31224
> 4 2317896 28616
> 8 158948 21462
> 16 579474 14308
> 32 293314 14308
> 64 150234 14308
> 128 78694 10738
> 256 42938 8960
> 512 25046 8064
> 1024 16100 7616
> 2048 9184 7392
>
> Performance comparison:
>
> cache line size Patched Original
> backtrace cpu_time 2 4.44 33.83
> backtrace cpu_time 4 3.88 14.27
> backtrace cpu_time 8 3.1 7.92
> backtrace cpu_time 16 2.48 4.79
> backtrace cpu_time 32 2.25 2.51
> backtrace cpu_time 64 1.16 1.93
> backtrace cpu_time 128 1.02 1.69
> backtrace cpu_time 256 1.06 1.37
> backtrace cpu_time 512 1.11 1.17
> backtrace cpu_time 1024 1.1 1.22
> backtrace cpu_time 2048 1.13 1.17
> backtrace wall_time 2 5.49653506279 74.0839848518
> backtrace wall_time 4 4.70916986465 29.94830513
> backtrace wall_time 8 4.11279582977 15.6743021011
> backtrace wall_time 16 3.68633985519 8.83114910126
> backtrace wall_time 32 3.63511800766 5.79059791565
> backtrace wall_time 64 1.61371517181 3.67003703117
> backtrace wall_time 128 1.50599694252 2.60381913185
> backtrace wall_time 256 1.47533297539 2.05611109734
> backtrace wall_time 512 1.48193001747 1.80505800247
> backtrace wall_time 1024 1.50955080986 1.69646501541
> backtrace wall_time 2048 1.54235315323 1.61461496353
> backtrace vmsize 2 104568 104576
> backtrace vmsize 4 100556 102388
> backtrace vmsize 8 95384 97540
> backtrace vmsize 16 94092 94092
> backtrace vmsize 32 93348 93276
> backtrace vmsize 64 93148 92928
> backtrace vmsize 128 93148 93100
> backtrace vmsize 256 93148 93100
> backtrace vmsize 512 93148 93100
> backtrace vmsize 1024 93148 93100
> backtrace vmsize 2048 93148 93100
>
> gdb:
>
> 2013-10-18 Yao Qi <yao@codesourcery.com>
>
> * dcache.c: Include "memrange.h".
> Update comments.
> (dcache_read_line): Remove.
> (dcache_peek_byte): Remove.
> (dcache_ranges_readable): New function.
> (dcache_ranges_uncached): New function.
> (dcache_xfer_memory): Read multiple cache lines from target
> memory in one time.
Hi.
While I understand the improvement is nice, it does add complexity to the code.
[I'm not suggesting this patch isn't justified.
But I think we do need to weigh complexity vs actual speed-ups, and
I'm in the "Need more data." state at the moment.]
The data for cache line sizes less than the default (64 bytes)
is nice but not compelling.
E.g., What if we just increase the cache line size to 256, say?
Or 1024? And/or maybe auto-adjust it based on
"remote memory-read-packet-size" or some such?
[I'm not arguing for anything in particular.
Again, I'm just collecting data.]