This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCHv4a] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).


On 07/26/2018 10:50 AM, Florian Weimer wrote:
> On 07/26/2018 04:34 AM, Carlos O'Donell wrote:
>> On 07/25/2018 04:57 PM, Carlos O'Donell wrote:
>>> v4
>>> - Fixed ar_SA, km_KH, lo_LA, or_IN, sl_SI, th_TH.
>>> - Added range checking for a-z, A-Z for all supported UTF-8 locales.
>>>
>>> All of my testers are clean.
>>
>> Attaching v4 on top of the current master.
>>
>> This fixes all the locales.
> 
> I wrote another enumeration tester, this time covering all locales.  It found these issues:
> 
> az_AZ: U+000069 fails to match /[a-z]/
> az_AZ: U+000049 fails to match /[A-Z]/
> az_AZ.utf8: U+000069 fails to match /[a-z]/
> az_AZ.utf8: U+000049 fails to match /[A-Z]/

See it.

> crh_UA: U+000069 fails to match /[a-z]/
> crh_UA: U+000049 fails to match /[A-Z]/
> crh_UA.utf8: U+000069 fails to match /[a-z]/
> crh_UA.utf8: U+000049 fails to match /[A-Z]/

See it.

> ku_TR: U+000069 fails to match /[a-z]/
> ku_TR: U+000049 fails to match /[A-Z]/
> ku_TR.iso88599: U+000069 fails to match /[a-z]/
> ku_TR.iso88599: U+000049 fails to match /[A-Z]/
> ku_TR.utf8: U+000069 fails to match /[a-z]/
> ku_TR.utf8: U+000049 fails to match /[A-Z]/

See it.

> lv_LV: U+000079 fails to match /[a-z]/
> lv_LV: U+000059 fails to match /[A-Z]/
> lv_LV.iso885913: U+000079 fails to match /[a-z]/
> lv_LV.iso885913: U+000059 fails to match /[A-Z]/
> lv_LV.utf8: U+000079 fails to match /[a-z]/
> lv_LV.utf8: U+000059 fails to match /[A-Z]/

See it.

> shs_CA: U+0000E6 matches /[a-z]/ unexpectedly
> shs_CA: U+0000C6 matches /[A-Z]/ unexpectedly
> shs_CA.utf8: U+0000E6 matches /[a-z]/ unexpectedly
> shs_CA.utf8: U+0000C6 matches /[A-Z]/ unexpectedly

Good catch. These were the ones I was hoping your finder would catch.

> slovene: U+00006A fails to match /[a-z]/
> slovene: U+00006B fails to match /[a-z]/
> slovene: U+00006C fails to match /[a-z]/
> slovene: U+00006D fails to match /[a-z]/
> slovene: U+00006E fails to match /[a-z]/
> slovene: U+00006F fails to match /[a-z]/

This is an alias for sl_SI.ISO-8859-2 and we see it below.

> slovenian: U+00006A fails to match /[a-z]/
> slovenian: U+00006B fails to match /[a-z]/
> slovenian: U+00006C fails to match /[a-z]/
> slovenian: U+00006D fails to match /[a-z]/
> slovenian: U+00006E fails to match /[a-z]/
> slovenian: U+00006F fails to match /[a-z]/

This is an alias for sl_SI.ISO-8859-2 and we see it below.

> sl_SI: U+00006A fails to match /[a-z]/
> sl_SI: U+00006B fails to match /[a-z]/
> sl_SI: U+00006C fails to match /[a-z]/
> sl_SI: U+00006D fails to match /[a-z]/
> sl_SI: U+00006E fails to match /[a-z]/
> sl_SI: U+00006F fails to match /[a-z]/

See it.

> sl_SI.iso88592: U+00006A fails to match /[a-z]/
> sl_SI.iso88592: U+00006B fails to match /[a-z]/
> sl_SI.iso88592: U+00006C fails to match /[a-z]/
> sl_SI.iso88592: U+00006D fails to match /[a-z]/
> sl_SI.iso88592: U+00006E fails to match /[a-z]/
> sl_SI.iso88592: U+00006F fails to match /[a-z]/

See it (aliased above twice).

> sl_SI.utf8: U+00006A fails to match /[a-z]/
> sl_SI.utf8: U+00006B fails to match /[a-z]/
> sl_SI.utf8: U+00006C fails to match /[a-z]/
> sl_SI.utf8: U+00006D fails to match /[a-z]/
> sl_SI.utf8: U+00006E fails to match /[a-z]/
> sl_SI.utf8: U+00006F fails to match /[a-z]/

See it.

> sv_FI: U+000077 fails to match /[a-z]/
> sv_FI: U+000057 fails to match /[A-Z]/

See it.

> sv_FI@euro: U+000077 fails to match /[a-z]/
> sv_FI@euro: U+000057 fails to match /[A-Z]/

Same as sv_FI.

> sv_FI.iso88591: U+000077 fails to match /[a-z]/
> sv_FI.iso88591: U+000057 fails to match /[A-Z]/

Likewise.

> sv_FI.iso885915@euro: U+000077 fails to match /[a-z]/
> sv_FI.iso885915@euro: U+000057 fails to match /[A-Z]/

Likewise.

> sv_FI.utf8: U+000077 fails to match /[a-z]/
> sv_FI.utf8: U+000057 fails to match /[A-Z]/

Likewise.

> sv_SE: U+000077 fails to match /[a-z]/
> sv_SE: U+000057 fails to match /[A-Z]/

See it.

> sv_SE.iso88591: U+000077 fails to match /[a-z]/
> sv_SE.iso88591: U+000057 fails to match /[A-Z]/

Same as above.

> sv_SE.utf8: U+000077 fails to match /[a-z]/
> sv_SE.utf8: U+000057 fails to match /[A-Z]/

Likewise.

> swedish: U+000077 fails to match /[a-z]/
> swedish: U+000057 fails to match /[A-Z]/

Alias for sv_SE.

> tt_RU: U+000069 fails to match /[a-z]/
> tt_RU: U+000049 fails to match /[A-Z]/

See it.

> tt_RU@iqtelif: U+000069 fails to match /[a-z]/
> tt_RU@iqtelif: U+000049 fails to match /[A-Z]/

See it.

> tt_RU.utf8: U+000069 fails to match /[a-z]/
> tt_RU.utf8: U+000049 fails to match /[A-Z]/

See it.

> tt_RU.utf8@iqtelif: U+000069 fails to match /[a-z]/
> tt_RU.utf8@iqtelif: U+000049 fails to match /[A-Z]/

See it.

Thanks you!

I increased tst-fnmatch.input coverage and I get this:

Line #3699: Test #3548 (az_AZ.UTF-8): fnmatch ("[a-z]", "i", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #3751: Test #3600 (az_AZ.UTF-8): fnmatch ("[A-Z]", "I", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #6819: Test #6668 (crh_UA.UTF-8): fnmatch ("[a-z]", "i", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #6871: Test #6720 (crh_UA.UTF-8): fnmatch ("[A-Z]", "I", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #18675: Test #18524 (ku_TR.UTF-8): fnmatch ("[a-z]", "i", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #18727: Test #18576 (ku_TR.UTF-8): fnmatch ("[A-Z]", "I", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #19835: Test #19684 (lv_LV.UTF-8): fnmatch ("[a-z]", "y", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #19887: Test #19736 (lv_LV.UTF-8): fnmatch ("[A-Z]", "Y", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26684: Test #26533 (sl_SI.UTF-8): fnmatch ("[a-z]", "j", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26685: Test #26534 (sl_SI.UTF-8): fnmatch ("[a-z]", "k", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26686: Test #26535 (sl_SI.UTF-8): fnmatch ("[a-z]", "l", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26687: Test #26536 (sl_SI.UTF-8): fnmatch ("[a-z]", "m", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26688: Test #26537 (sl_SI.UTF-8): fnmatch ("[a-z]", "n", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #26689: Test #26538 (sl_SI.UTF-8): fnmatch ("[a-z]", "o", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #28049: Test #27898 (sv_FI.UTF-8): fnmatch ("[a-z]", "w", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #28101: Test #27950 (sv_FI.UTF-8): fnmatch ("[A-Z]", "W", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #28153: Test #28002 (sv_SE.UTF-8): fnmatch ("[a-z]", "w", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #28205: Test #28054 (sv_SE.UTF-8): fnmatch ("[A-Z]", "W", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #30427: Test #30276 (tt_RU.UTF-8): fnmatch ("[a-z]", "i", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #30479: Test #30328 (tt_RU.UTF-8): fnmatch ("[A-Z]", "I", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #30531: Test #30380 (tt_RU.UTF-8@iqtelif): fnmatch ("[a-z]", "i", 0) = FNM_NOMATCH (FAIL, expected 0) ***
Line #30583: Test #30432 (tt_RU.UTF-8@iqtelif): fnmatch ("[A-Z]", "I", 0) = FNM_NOMATCH (FAIL, expected 0) ***

Which matches all the locales you saw failures in except for shs_CA, which is a real bug.

I'll fix these up quickly.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]