aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorgingerBill <gingerBill@users.noreply.github.com>2025-07-22 11:14:54 +0100
committerGitHub <noreply@github.com>2025-07-22 11:14:54 +0100
commit19a075211f4b14c5e2bb999b987ff4ed099f2af1 (patch)
tree650cde2240c5c00c04fd2ab1de98e48e74c1ede6 /core
parent513e6daacebb40425cea7fb3a181f3a6430183ab (diff)
parent6c81df82a68a2e573ed119f6b6ebd4cd98463ae6 (diff)
Merge pull request #5442 from jon-lipstate/table_lookup
table lookup simd intrinsic
Diffstat (limited to 'core')
-rw-r--r--core/simd/simd.odin51
1 files changed, 51 insertions, 0 deletions
diff --git a/core/simd/simd.odin b/core/simd/simd.odin
index b4779b5ff..303eceb97 100644
--- a/core/simd/simd.odin
+++ b/core/simd/simd.odin
@@ -2441,6 +2441,57 @@ Graphically, the operation looks as follows. The `t` and `f` represent the
select :: intrinsics.simd_select
/*
+Runtime Equivalent to Shuffle.
+
+Performs element-wise table lookups using runtime indices.
+Each element in the indices vector selects an element from the table vector.
+The indices are automatically masked to prevent out-of-bounds access.
+
+This operation is hardware-accelerated on most platforms when using 8-bit
+integer vectors. For other element types or unsupported vector sizes, it
+falls back to software emulation.
+
+Inputs:
+- `table`: The lookup table vector (should be power-of-2 size for correct masking).
+- `indices`: The indices vector (automatically masked to valid range).
+
+Returns:
+- A vector where `result[i] = table[indices[i] & (table_size-1)]`.
+
+Operation:
+
+ for i in 0 ..< len(indices) {
+ masked_index := indices[i] & (len(table) - 1)
+ result[i] = table[masked_index]
+ }
+ return result
+
+Implementation:
+
+ | Platform | Lane Size | Implementation |
+ |-------------|-------------------------------------------|---------------------|
+ | x86-64 | pshufb (16B), vpshufb (32B), AVX512 (64B) | Single vector |
+ | ARM64 | tbl1 (16B), tbl2 (32B), tbl4 (64B) | Automatic splitting |
+ | ARM32 | vtbl1 (8B), vtbl2 (16B), vtbl4 (32B) | Automatic splitting |
+ | WebAssembly | i8x16.swizzle (16B), Emulation (>16B) | Mixed |
+ | Other | Emulation | Software |
+
+Example:
+
+ import "core:simd"
+ import "core:fmt"
+
+ runtime_swizzle_example :: proc() {
+ table := simd.u8x16{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
+ indices := simd.u8x16{15, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}
+ result := simd.runtime_swizzle(table, indices)
+ fmt.println(result) // Expected: {15, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}
+ }
+
+*/
+runtime_swizzle :: intrinsics.simd_runtime_swizzle
+
+/*
Compute the square root of each lane in a SIMD vector.
*/
sqrt :: intrinsics.sqrt