Merge pull request #5442 from jon-lipstate/table_lookup

table lookup simd intrinsic
author: gingerBill <gingerBill@users.noreply.github.com> 2025-07-22 11:14:54 +0100
committer: GitHub <noreply@github.com> 2025-07-22 11:14:54 +0100
commit: 19a075211f4b14c5e2bb999b987ff4ed099f2af1 (patch)
tree: 650cde2240c5c00c04fd2ab1de98e48e74c1ede6 /core
parent: 513e6daacebb40425cea7fb3a181f3a6430183ab (diff)
parent: 6c81df82a68a2e573ed119f6b6ebd4cd98463ae6 (diff)
1 files changed, 51 insertions, 0 deletions
diff --git a/core/simd/simd.odin b/core/simd/simd.odin
index b4779b5ff..303eceb97 100644
--- a/core/simd/simd.odin
+++ b/core/simd/simd.odin
@@ -2441,6 +2441,57 @@ Graphically, the operation looks as follows. The `t` and `f` represent the
 select :: intrinsics.simd_select
 
 /*
+Runtime Equivalent to Shuffle.
+
+Performs element-wise table lookups using runtime indices.
+Each element in the indices vector selects an element from the table vector.
+The indices are automatically masked to prevent out-of-bounds access.
+
+This operation is hardware-accelerated on most platforms when using 8-bit
+integer vectors. For other element types or unsupported vector sizes, it
+falls back to software emulation.
+
+Inputs:
+- `table`: The lookup table vector (should be power-of-2 size for correct masking).
+- `indices`: The indices vector (automatically masked to valid range).
+
+Returns:
+- A vector where `result[i] = table[indices[i] & (table_size-1)]`.
+
+Operation:
+
+	for i in 0 ..< len(indices) {
+		masked_index := indices[i] & (len(table) - 1)
+		result[i] = table[masked_index]
+	}
+	return result
+
+Implementation:
+
+	| Platform    | Lane Size                                 | Implementation      |
+	|-------------|-------------------------------------------|---------------------|
+	| x86-64      | pshufb (16B), vpshufb (32B), AVX512 (64B) | Single vector       |
+	| ARM64       | tbl1 (16B), tbl2 (32B), tbl4 (64B)        | Automatic splitting |
+	| ARM32       | vtbl1 (8B), vtbl2 (16B), vtbl4 (32B)      | Automatic splitting |
+	| WebAssembly | i8x16.swizzle (16B), Emulation (>16B)     | Mixed               |
+	| Other       | Emulation                                 | Software            |
+
+Example:
+
+	import "core:simd"
+	import "core:fmt"
+
+	runtime_swizzle_example :: proc() {
+		table := simd.u8x16{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
+		indices := simd.u8x16{15, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}
+		result := simd.runtime_swizzle(table, indices)
+		fmt.println(result) // Expected: {15, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}
+	}
+
+*/
+runtime_swizzle :: intrinsics.simd_runtime_swizzle
+
+/*
 Compute the square root of each lane in a SIMD vector.
 */
 sqrt    :: intrinsics.sqrt
author	gingerBill <gingerBill@users.noreply.github.com>	2025-07-22 11:14:54 +0100
committer	GitHub <noreply@github.com>	2025-07-22 11:14:54 +0100
commit	19a075211f4b14c5e2bb999b987ff4ed099f2af1 (patch)
tree	650cde2240c5c00c04fd2ab1de98e48e74c1ede6 /core
parent	513e6daacebb40425cea7fb3a181f3a6430183ab (diff)
parent	6c81df82a68a2e573ed119f6b6ebd4cd98463ae6 (diff)