This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: glibc aio performance

From: "Don Capps" <don dot capps2 at verizon dot net>
To: "Amos P Waterland" <waterland at us dot ibm dot com>
Cc: <libc-alpha at sources dot redhat dot com>,"Thomas Gall" <tom_gall at vnet dot ibm dot com>
Date: Thu, 30 May 2002 19:00:16 -0500
Subject: Re: glibc aio performance
Organization: Self
References: <OF71498DF7.F1B104B8-ON87256BC9.006CB74A-86256BC9.007784B4@boulder.ibm.com>
Reply-to: "Don Capps" <don dot capps2 at verizon dot net>

Amos,

    I tried a few more tests. This time I used a file
size that is two times the size of the memory
in the system.

Normal write/read
./iozone -r 64 -s 600M -i 0 -i 1
    KB      reclen   write   rewrite    read    reread
614400      64     27287   30701    27743    27737

POSIX async I/O (no bcopy)
./iozone -r 64 -s 600M -i 0 -i 1 -k 32
    KB      reclen   write   rewrite    read    reread
614400      64     25169   28362    26998    27111

The difference in performance for async I/O is
around 7 percent on write/rewrite and around
2 percent for read/re-read.

This is expected because:

1. There is only one disk and multiple threads don't help
2. There is more work for the system when using
    multiple threads. (overhead)
3. The operating system already has a read-ahead algorithm for
    sequential readers so the extra async I/O threads
    don't help much.
4. More threads means more context switching, more
    CPU cache flushes, and more TLB activity.
5. Normal read/write have a nice fast I/O completion
    notification model that is implemented in the
    operating system. POSIX async I/O was a "group think"
    design and has a very poor I/O completion model that
    slows things down. Polling for I/O completion or signals
    was and is a very poor design.  Most vendors have
    custom async I/O routines that have a fast I/O completion
    mechanism. (call back notification)

When to use async I/O.

    There are times when async I/O is a better way to go.
    If you have:
        Multiple disks (stripe) filesystem  So that parallel
        async I/O requests can happen in parallel and not
        cause head seek contention.
    and
        The access behavior of the application is to random
        offsets in a file. So that multiple threads can access
        different parts of the file on different spindles in
        parallel, and to get read-ahead working as the operating
        system is not going to do any for a random reader.
    and
        The async I/O threads are used over and over to
        do the I/O so that the spawn/join times are not
        a factor.

    If your application meets the above criteria then it may
    benefit from using async I/O. If it does not meet the
    criteria then it is likely that it will not benefit.
    Although the above criteria seems overly constraining
    it maps well onto one application type very nicely.
    Database applications. (Oracle, Sybase .....)

Hope this helps,
Don Capps









----- Original Message -----
From: "Amos P Waterland" <waterland@us.ibm.com>
To: "Don Capps" <don.capps2@verizon.net>
Cc: <libc-alpha@sources.redhat.com>; "Thomas Gall" <tom_gall@vnet.ibm.com>
Sent: Thursday, May 30, 2002 4:46 PM
Subject: Re: glibc aio performance


>
> Don:
>
> Thank you very much for your analysis.  I have a few questions.
>
> I ran the suggested options on my Redhat 7.3 box, and got the following
> results.  (I noticed that in your results, your second command line had -s
> 200M, but the report had 307200: maybe the 2 was just a typo in the
email?)
>
> % iozone -r 64 -s 300M -i 0 -i 1
> KB  reclen   write rewrite    read    reread
> 307200      64   32881   32494    42239    43256
> % iozone -k 32 -r 64 -s 300M -i 0 -i 1
> KB  reclen   write rewrite    read    reread
> 307200      64   18548   19151    26866    26512
>
> As you can see, the results are much better, but the AIO is still 30-40%
> slower than the SIO.  (I re-ran the tests several times to try to iron out
> timing anomalies.)  Do you think that thread setup, teardown, and overhead
> accounts for this?  (I did try using just two threads, but did not get
> significantly better throughput.)
>
> Thanks in advance.
>
> Amos Waterland
>
>
>
>
>
>                       "Don Capps"

>                       <don.capps2@veriz        To:
<libc-alpha@sources.redhat.com>, Amos P
>                       on.net>                   Waterland/Austin/IBM@IBMUS
>                                                cc:       Thomas
Gall/Rochester/IBM@IBMUS
>                       05/30/02 02:34 PM        Subject:  Re: glibc aio
performance
>                       Please respond to
>                       "Don Capps"
>
>
>
>
>
> Amos,
>
>     I believe that this may be a case of pilot error :-)
>
>     In the case where you ran
>         iozone -i0 -i1 -k128
>
>     You asked for Iozone to use POSIX async I/O and
>     to use 128 async writes/reads of 4k and wrote/read
>     a file that was 512 Kbytes in size.
>     Thus.. there are 128 async read threads doing exactly
>     one 4k operation.
>
>     Let's look under the hood.
>     If one spawns 128 async reads/writes then this is the
>     equivalent of calling fork() 128 times and having
>     the new process do a single 4k operation and then
>     terminate. The overhead of creating the threads,
>     and terminating the threads is eating your lunch.
>
>     Here are some suggestions that you might find useful.
>
>     Try using a file size that is much larger.
>         Example: -s 300M
>     Try using a transfer size that is larger.
>         Example: -r 64
>     Try using fewer async ops.
>         Example: -k 32
>
>     In the examples above the threads will do a more
>     significant amount of work and they will be
>     re-used many times before they terminate.
>
>     There would be 32 threads and each thread will
>     do 64 kbyte transfers. The total number of
>     transfers will be 4800. Each thread will now
>     do 150 ops before it terminates.
>
>     Here is output from my Redhat  7.2 box.
>
>     ./iozone -r 64 -s 300M -i 0 -i 1
>       KB      reclen   write    rewrite    read    reread
>      307200      64   37314   29174    28042    28057
>
>
>     ./iozone -k 32 -r 64 -s 200M -i 0 -i 1
>      KB       reclen   write    rewrite    read     reread
>     307200       64   25793   30625    27376    27335
>
>     Since the filesystem is on a single disk drive there
>     is no advantage to the async operations. But, the
>     result is well within reason.
>
>     I don't believe that there is anything wrong with
>     glibc or with Iozone but more of a case of getting
>     what you asked for and finding out that the question
>     was probably not a good one :-)
>
>     The moral of the story is: If you are going to spawn
>     a thread then it would be wise to have it do some
>     significant work before it terminates.
>
>     Hope this helps,
>     Don Capps
>
>
> ----- Original Message -----
> From: "Amos P Waterland" <waterland@us.ibm.com>
> To: <libc-alpha@sources.redhat.com>
> Cc: <capps@iozone.org>; <wnorcott@us.oracle.com>; <tom_gall@vnet.ibm.com>
> Sent: Thursday, May 30, 2002 1:01 PM
> Subject: glibc aio performance
>
>
> > Hello, I have been looking at the glibc asynchronous I/O implementation,
> > and have run into a bit of an issue.
> >
> > When I run IOzone (an open source filesystem benchmark tool) with AIO
> > enabled, it reports write KB/s throughput on the order of 45 times
slower
> > than that reported without AIO.
> >
> > % iozone -i0 -i1 #run just the write and read tests
> > [snip]
> > KB  reclen   write rewrite    read    reread [snip]
> > 512       4  101769  178334   331595   345702
> > % iozone -i0 -i1 -k128  #do same, but use no-bcopy aio
> > [snip]
> > KB  reclen   write rewrite    read    reread [snip]
> > 512       4    2232   47210   121874   103372
> >
> > I have looked at the source code for the glibc implementation, and it is
> > not obvious to me why keeping a thread pool, each of whose contituents
> > perform a pread(2) or pwrite(2), should be so much slower than
> synchronous
> > I/O.  I looked at the source code for IOzone, and found that it uses
> > libasync.c for AIO, but could find no obvious performance problems in
its
> > code.
> >
> > So my question is: Might there be a problem with the way IOzone is using
> > glibc's implementation of AIO, or is glibc's implementation known to
have
> > performance problems?
> >
> > Amos Waterland
> >
> >
>
>
>
>
>
>
>

References:
- Re: glibc aio performance
  - From: Amos P Waterland

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]