This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCHv4a] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).
On 07/26/2018 10:50 AM, Florian Weimer wrote:
> On 07/26/2018 04:34 AM, Carlos O'Donell wrote:
>> On 07/25/2018 04:57 PM, Carlos O'Donell wrote:
>>> v4
>>> - Fixed ar_SA, km_KH, lo_LA, or_IN, sl_SI, th_TH.
>>> - Added range checking for a-z, A-Z for all supported UTF-8 locales.
>>>
>>> All of my testers are clean.
>>
>> Attaching v4 on top of the current master.
>>
>> This fixes all the locales.
>
> I wrote another enumeration tester, this time covering all locales. It found these issues:
>
> az_AZ: U+000069 fails to match /[a-z]/
> az_AZ: U+000049 fails to match /[A-Z]/
> az_AZ.utf8: U+000069 fails to match /[a-z]/
> az_AZ.utf8: U+000049 fails to match /[A-Z]/
See it.
> crh_UA: U+000069 fails to match /[a-z]/
> crh_UA: U+000049 fails to match /[A-Z]/
> crh_UA.utf8: U+000069 fails to match /[a-z]/
> crh_UA.utf8: U+000049 fails to match /[A-Z]/
See it.
> ku_TR: U+000069 fails to match /[a-z]/
> ku_TR: U+000049 fails to match /[A-Z]/
> ku_TR.iso88599: U+000069 fails to match /[a-z]/
> ku_TR.iso88599: U+000049 fails to match /[A-Z]/
> ku_TR.utf8: U+000069 fails to match /[a-z]/
> ku_TR.utf8: U+000049 fails to match /[A-Z]/
See it.
> lv_LV: U+000079 fails to match /[a-z]/
> lv_LV: U+000059 fails to match /[A-Z]/
> lv_LV.iso885913: U+000079 fails to match /[a-z]/
> lv_LV.iso885913: U+000059 fails to match /[A-Z]/
> lv_LV.utf8: U+000079 fails to match /[a-z]/
> lv_LV.utf8: U+000059 fails to match /[A-Z]/
See it.
> shs_CA: U+0000E6 matches /[a-z]/ unexpectedly
> shs_CA: U+0000C6 matches /[A-Z]/ unexpectedly
> shs_CA.utf8: U+0000E6 matches /[a-z]/ unexpectedly
> shs_CA.utf8: U+0000C6 matches /[A-Z]/ unexpectedly
Good catch. These were the ones I was hoping your finder would catch.
> slovene: U+00006A fails to match /[a-z]/
> slovene: U+00006B fails to match /[a-z]/
> slovene: U+00006C fails to match /[a-z]/
> slovene: U+00006D fails to match /[a-z]/
> slovene: U+00006E fails to match /[a-z]/
> slovene: U+00006F fails to match /[a-z]/
This is an alias for sl_SI.ISO-8859-2 and we see it below.
> slovenian: U+00006A fails to match /[a-z]/
> slovenian: U+00006B fails to match /[a-z]/
> slovenian: U+00006C fails to match /[a-z]/
> slovenian: U+00006D fails to match /[a-z]/
> slovenian: U+00006E fails to match /[a-z]/
> slovenian: U+00006F fails to match /[a-z]/
This is an alias for sl_SI.ISO-8859-2 and we see it below.
> sl_SI: U+00006A fails to match /[a-z]/
> sl_SI: U+00006B fails to match /[a-z]/
> sl_SI: U+00006C fails to match /[a-z]/
> sl_SI: U+00006D fails to match /[a-z]/
> sl_SI: U+00006E fails to match /[a-z]/
> sl_SI: U+00006F fails to match /[a-z]/
See it.
> sl_SI.iso88592: U+00006A fails to match /[a-z]/
> sl_SI.iso88592: U+00006B fails to match /[a-z]/
> sl_SI.iso88592: U+00006C fails to match /[a-z]/
> sl_SI.iso88592: U+00006D fails to match /[a-z]/
> sl_SI.iso88592: U+00006E fails to match /[a-z]/
> sl_SI.iso88592: U+00006F fails to match /[a-z]/
See it (aliased above twice).
> sl_SI.utf8: U+00006A fails to match /[a-z]/
> sl_SI.utf8: U+00006B fails to match /[a-z]/
> sl_SI.utf8: U+00006C fails to match /[a-z]/
> sl_SI.utf8: U+00006D fails to match /[a-z]/
> sl_SI.utf8: U+00006E fails to match /[a-z]/
> sl_SI.utf8: U+00006F fails to match /[a-z]/
See it.
> sv_FI: U+000077 fails to match /[a-z]/
> sv_FI: U+000057 fails to match /[A-Z]/
See it.
> sv_FI@euro: U+000077 fails to match /[a-z]/
> sv_FI@euro: U+000057 fails to match /[A-Z]/
Same as sv_FI.
> sv_FI.iso88591: U+000077 fails to match /[a-z]/
> sv_FI.iso88591: U+000057 fails to match /[A-Z]/
Likewise.
> sv_FI.iso885915@euro: U+000077 fails to match /[a-z]/
> sv_FI.iso885915@euro: U+000057 fails to match /[A-Z]/
Likewise.
> sv_FI.utf8: U+000077 fails to match /[a-z]/
> sv_FI.utf8: U+000057 fails to match /[A-Z]/
Likewise.
> sv_SE: U+000077 fails to match /[a-z]/
> sv_SE: U+000057 fails to match /[A-Z]/
See it.
> sv_SE.iso88591: U+000077 fails to match /[a-z]/
> sv_SE.iso88591: U+000057 fails to match /[A-Z]/
Same as above.
> sv_SE.utf8: U+000077 fails to match /[a-z]/
> sv_SE.utf8: U+000057 fails to match /[A-Z]/
Likewise.
> swedish: U+000077 fails to match /[a-z]/
> swedish: U+000057 fails to match /[A-Z]/
Alias for sv_SE.
> tt_RU: U+000069 fails to match /[a-z]/
> tt_RU: U+000049 fails to match /[A-Z]/
See it.
> tt_RU@iqtelif: U+000069 fails to match /[a-z]/
> tt_RU@iqtelif: U+000049 fails to match /[A-Z]/
See it.
> tt_RU.utf8: U+000069 fails to match /[a-z]/
> tt_RU.utf8: U+000049 fails to match /[A-Z]/
See it.
> tt_RU.utf8@iqtelif: U+000069 fails to match /[a-z]/
> tt_RU.utf8@iqtelif: U+000049 fails to match /[A-Z]/
See it.
Thanks you!
I increased tst-fnmatch.input coverage and I get this:
Line #3699: Test #3548 (az_AZ.UTF-8): fnmatch ("[a-z]", "i", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #3751: Test #3600 (az_AZ.UTF-8): fnmatch ("[A-Z]", "I", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #6819: Test #6668 (crh_UA.UTF-8): fnmatch ("[a-z]", "i", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #6871: Test #6720 (crh_UA.UTF-8): fnmatch ("[A-Z]", "I", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #18675: Test #18524 (ku_TR.UTF-8): fnmatch ("[a-z]", "i", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #18727: Test #18576 (ku_TR.UTF-8): fnmatch ("[A-Z]", "I", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #19835: Test #19684 (lv_LV.UTF-8): fnmatch ("[a-z]", "y", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #19887: Test #19736 (lv_LV.UTF-8): fnmatch ("[A-Z]", "Y", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26684: Test #26533 (sl_SI.UTF-8): fnmatch ("[a-z]", "j", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26685: Test #26534 (sl_SI.UTF-8): fnmatch ("[a-z]", "k", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26686: Test #26535 (sl_SI.UTF-8): fnmatch ("[a-z]", "l", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26687: Test #26536 (sl_SI.UTF-8): fnmatch ("[a-z]", "m", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26688: Test #26537 (sl_SI.UTF-8): fnmatch ("[a-z]", "n", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26689: Test #26538 (sl_SI.UTF-8): fnmatch ("[a-z]", "o", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #28049: Test #27898 (sv_FI.UTF-8): fnmatch ("[a-z]", "w", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #28101: Test #27950 (sv_FI.UTF-8): fnmatch ("[A-Z]", "W", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #28153: Test #28002 (sv_SE.UTF-8): fnmatch ("[a-z]", "w", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #28205: Test #28054 (sv_SE.UTF-8): fnmatch ("[A-Z]", "W", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #30427: Test #30276 (tt_RU.UTF-8): fnmatch ("[a-z]", "i", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #30479: Test #30328 (tt_RU.UTF-8): fnmatch ("[A-Z]", "I", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #30531: Test #30380 (tt_RU.UTF-8@iqtelif): fnmatch ("[a-z]", "i", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #30583: Test #30432 (tt_RU.UTF-8@iqtelif): fnmatch ("[A-Z]", "I", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Which matches all the locales you saw failures in except for shs_CA, which is a real bug.
I'll fix these up quickly.
Cheers,
Carlos.
- References:
- [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).
- Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).
- Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).
- Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).
- Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).
- Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).
- [PATCHv3] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).
- Re: [PATCHv3] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).
- Re: [PATCHv3] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).
- Re: [PATCHv3] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).
- [PATCHv4] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).
- Re: [PATCHv4a] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).
- Re: [PATCHv4a] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).