This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [PATCH] improve regex performance


Hello,

Subject: Re: [PATCH] improve regex performance
From: Ulrich Drepper <drepper@redhat.com>
Date: 27 Jun 2001 00:13:47 -0700
Message-ID: <m37kxyfl90.fsf@otr.mynet>
> [snip]
> Only the one test commented out with DOES_NOT_WORK doesn't work.
> It also fails for the single-byte case which is worrisome.

I think it is a limitation of the regex library.

In this case, ".*" in the RE "G.*ran" can examine up to 20480 byte.
However in the input file ChangeLog.8, the fist 'G' is "GLIBC_2.1"
in line 29.  So ".*" examine all of remains of ChangeLog.8, which
are about 400KB.

When I cut off the first 'G' and following 20480byte as the new
input file, the test case works.
And when I add any one character to it, the test case doesn't work.

On the other hand the regular expression "G.\\{0,20480\\}ran" works,
but the RE "G.\\{0,20481\\}ran" doesn't work.
(the upper bound "20480" may differ from architecture to architecture.)

In addition, we can alleviate this limitation by increasing the value
of re_max_failures, which is defined at line 1502 in regex.c(rev 1.100).
And we may have to increase the value of RE_DUP_MAX too.

However I think the best solution is re-writing regex library from
scratch.

> If you don't see any problems with your patch I'd like to check it in.

After re-building all of the glibc, I can't see the problem now.
(It seems that I just broke somewhere in my source tree, sorry...)

Then, could you check it in please?
ChangeLog entries are:

2001-06-20 Isamu Hasegawa <isamu@yamato.ibm.com>

	* posix/regex.c(FREE_WCS_BUFFERS): New macro to free buffers.
	(re_search_2): invoke convert_mbs_to_wcs and FREE_WCS_BUFFERS.
	(wcs_re_match_2_internal): Check whether the wcs buffers need
	seting up or not, and skip seting up routin if not needed.

2001-06-26 Isamu Hasegawa <isamu@yamato.ibm.com>

	* posix/regex.c(count_mbs_length): Use binary search for
	optimization.

In addition,

> needed is much better (only between 2000% and 8000% slower :-).

I take back that I said it is reasonable...
Meanwhile perhaps it might be just a bug of the wcs version,
I'll investigate a little more, and if I found any problem, I'll
report ASAP.

Thanks,
-- 
Isamu Hasegawa
IBM Japan, Ltd.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]