This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/15884] Big performance problem in strcoll


https://sourceware.org/bugzilla/show_bug.cgi?id=15884

--- Comment #7 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  0742aef6e52a935f9ccd69594831b56d807feef3 (commit)
      from  ee54ce44cb734f18fec4f6ccdfbe997d2574321e (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0742aef6e52a935f9ccd69594831b56d807feef3

commit 0742aef6e52a935f9ccd69594831b56d807feef3
Author: Leonhard Holz <leonhard.holz@web.de>
Date:   Fri Oct 17 15:47:23 2014 +0530

    strcoll: improve performance by removing the cache (#15884)

    this is a path that should solve bug 15884. It complains about the
performance
    of strcoll(). It was found out that the runtime of strcoll() is actually
bound
    to strlen which is needed for calculating the size of a cache that was
    installed to improve the comparison performance.

    The idea for this patch was that the cache is only useful in rare cases
    (strings of same length and same first-level-chars) and that it would be
    better to avoid memory allocation at all. To prove this I wrote a
performance
    test bench-strcoll.c with test data in benchtests-strcoll.tar.gz. Also
    modifications in benchtests/Makefile and localedata/Makefile are necessary
to
    make it work.

    After removing the cache the strcoll method showed the predicted behavior
    (getting slightly faster) in all but the test case for hindi word sorting.
    This was due the hindi text having much more equal words than the other
ones.
    For equal strings the performance was worse since all comparison levels
were
    run through and from the second level on the cache improved the comparison
    performance of the original version.

    Therefore I added a bytewise test via strcmp iff the first level comparison
    found that both strings did match because in this case it is very likely
that
    equal strings are compared. This solved the problem with the hindi test
case
    and improved the performance of the others.

    Performance comparison:

    glibc files     -33.77%
    vi_VN.UTF-8     -34.12%
    en_US.UTF-8     -42.42%
    ar_SA.UTF-8     -27.49%
    zh_CN.UTF-8     +07.90%
    cs_CZ.UTF-8     -29.67%
    en_GB.UTF-8     -28.50%
    da_DK.UTF-8     -36.57%
    pl_PL.UTF-8     -39.31%
    fr_FR.UTF-8     -28.57%
    pt_PT.UTF-8     -22.82%
    el_GR.UTF-8     -26.77%
    ru_RU.UTF-8     -35.81%
    iw_IL.UTF-8     -35.34%
    es_ES.UTF-8     -34.46%
    hi_IN.UTF-8     -00.38%
    sv_SE.UTF-8     -36.99%
    hu_HU.UTF-8     -16.35%
    tr_TR.UTF-8     -27.80%
    is_IS.UTF-8     -33.24%
    it_IT.UTF-8     -24.39%
    sr_RS.UTF-8     -37.55%
    ja_JP.UTF-8     +02.84%

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog          |   12 ++
 NEWS               |    2 +-
 string/strcoll_l.c |  344 ++++------------------------------------------------
 3 files changed, 39 insertions(+), 319 deletions(-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]