This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[RFC] Avoid cltz in comparison functions.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: libc-alpha at sourceware dot org
- Date: Mon, 1 Jun 2015 10:55:46 +0200
- Subject: [RFC] Avoid cltz in comparison functions.
- Authentication-results: sourceware.org; auth=none
- References: <20150531190013 dot GA8530 at domone> <20150531204725 dot GA9108 at domone> <20150531220131 dot GA666 at domone>
On Mon, Jun 01, 2015 at 12:01:31AM +0200, OndÅej BÃlka wrote:
> And here is updated version of comparing functions.
>
I recalled another trick that I knew but couldn't use for sse2 code.
A trick for strcmp is that you don't need access individual bytes to get
inequality. Just mask bytes after first difference and compare them.
That could be done with following.
// find first bit where they differ.
mask = (x ^ y) ^ ((x ^ y) - 1)
// set whole bytes of mask
mask = (mask & ones * 255)
// also remove bytes after terminating 0
mask &= zero ^ (zero -1);
if (x & mask > y & mask)
return 1;
if (x & mask < y & mask)
return -1;
return 0;
Then you could replace these checks with branchless ones as I did in
following RFC.
A problem is that it would make diff skeleton more messy as sometimes
its used for finding byte location like in strcasecmp.
So how is best way to integrate tricks like that?
diff --git a/sysdeps/generic/string_vector.h b/sysdeps/generic/string_vector.h
index a3c4741..18ca118 100644
--- a/sysdeps/generic/string_vector.h
+++ b/sysdeps/generic/string_vector.h
@@ -191,6 +191,47 @@ bytes_equal_nocarry (vector_int x, vector_int y)
}
#endif
+#ifndef CUSTOM_COMPARE_BYTES
+# if __BYTE_ORDER == __LITTLE_ENDIAN
+static __always_inline
+int
+compare_bytes (vector_int x, vector_int y, vector_int zero_mask)
+{
+ /* Removes bytes beyond terminating '\0'. When zero mask its 0 then it
+ evaluates to ~0 and doesn't mask anything. */
+ zero_mask = zero_mask ^ (zero_mask - 1);
+
+ /* Find first bit that differs. Make mask with bits upto that bit set
+ to 1. */
+ vector_int and_mask = (((x ^ y) ^ ((x ^ y) - 1))
+
+ /* With difference in highest bit we can't compare them by subtraction. */
+
+ if (and_mask == ~((vector_int) 0))
+ {
+ if (x > y)
+ return 1;
+ if (x < y)
+ return -1;
+ return 0;
+ }
+
+ /* Convert mask to bytewise one that has in given byte 255 or 0 if bitwise
+ mask had some bit selected or not. */
+
+ and_mask = (and_mask & ones) * 255;
+
+ and_mask = and_mask & zero_mask;
+
+ /* As check above ensures that difference fits signed vector_int we could
+ subtract them. */
+
+ vector_int diff = (x & and_mask) - (y_greater & and_mask);
+ return (int)(diff >> (8 * (LSIZE - sizeof (int))));
+}
+# endif
+#endif
+
/*
When you have hardware ctz/clz its probably best bet. However
for softare emulation you could get better than generic one as you