diff options
| author | gingerBill <gingerBill@users.noreply.github.com> | 2025-05-06 15:46:49 +0100 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-05-06 15:46:49 +0100 |
| commit | 0cf5b5984de51c18382b07f20b997f89442fab36 (patch) | |
| tree | bf1dd12e79c8ff88ba3495e4a24708eea372e398 /core/simd | |
| parent | e07451898396c84b6a80e63c3e283939d699784a (diff) | |
| parent | dd5b7852ce569027e87d77f46601210aa4180947 (diff) | |
Merge pull request #5108 from Barinzaya/core-simd-indices-redadd-redmul
Alternate `reduce_add`/`reduce_mul` intrinsics
Diffstat (limited to 'core/simd')
| -rw-r--r-- | core/simd/simd.odin | 194 |
1 files changed, 192 insertions, 2 deletions
diff --git a/core/simd/simd.odin b/core/simd/simd.odin index 0e69304c3..a97155f58 100644 --- a/core/simd/simd.odin +++ b/core/simd/simd.odin @@ -1759,7 +1759,103 @@ Returns: replace :: intrinsics.simd_replace /* -Reduce a vector to a scalar by adding up all the lanes. +Reduce a vector to a scalar by adding up all the lanes in a bisecting fashion. + +This procedure returns a scalar that is the sum of all lanes, calculated by +bisecting the vector into two parts, where the first contains lanes [0, N/2) +and the second contains lanes [N/2, N), and adding the two halves element-wise +to produce N/2 values. This is repeated until only a single element remains. +This order may be faster to compute than the ordered sum for floats, as it can +often be better parallelized. + +The order of the sum may be important for accounting for precision errors in +floating-point computation, as floating-point addition is not associative, that +is `(a+b)+c` may not be equal to `a+(b+c)`. + +Inputs: +- `v`: The vector to reduce. + +Result: +- Sum of all lanes, as a scalar. + +**Operation**: + + for n > 1 { + n = n / 2 + for i in 0 ..< n { + a[i] += a[i+n] + } + } + res := a[0] + +Graphical representation of the operation for N=4: + + +-----------------------+ + | v0 | v1 | v2 | v3 | + +-----------------------+ + | | | | + [+]<-- | ---' | + | [+]<--------' + | | + `>[+]<' + | + v + +-----+ + result: | y0 | + +-----+ +*/ +reduce_add_bisect :: intrinsics.simd_reduce_add_bisect + +/* +Reduce a vector to a scalar by multiplying up all the lanes in a bisecting fashion. + +This procedure returns a scalar that is the product of all lanes, calculated by +bisecting the vector into two parts, where the first contains indices [0, N/2) +and the second contains indices [N/2, N), and multiplying the two halves +together element-wise to produce N/2 values. This is repeated until only a +single element remains. This order may be faster to compute than the ordered +product for floats, as it can often be better parallelized. + +The order of the product may be important for accounting for precision errors +in floating-point computation, as floating-point multiplication is not +associative, that is `(a*b)*c` may not be equal to `a*(b*c)`. + +Inputs: +- `v`: The vector to reduce. + +Result: +- Product of all lanes, as a scalar. + +**Operation**: + + for n > 1 { + n = n / 2 + for i in 0 ..< n { + a[i] *= a[i+n] + } + } + res := a[0] + +Graphical representation of the operation for N=4: + + +-----------------------+ + | v0 | v1 | v2 | v3 | + +-----------------------+ + | | | | + [x]<-- | ---' | + | [x]<--------' + | | + `>[x]<' + | + v + +-----+ + result: | y0 | + +-----+ +*/ +reduce_mul_bisect :: intrinsics.simd_reduce_mul_bisect + +/* +Reduce a vector to a scalar by adding up all the lanes in an ordered fashion. This procedure returns a scalar that is the ordered sum of all lanes. The ordered sum may be important for accounting for precision errors in @@ -1782,7 +1878,7 @@ Result: reduce_add_ordered :: intrinsics.simd_reduce_add_ordered /* -Reduce a vector to a scalar by multiplying all the lanes. +Reduce a vector to a scalar by multiplying all the lanes in an ordered fashion. This procedure returns a scalar that is the ordered product of all lanes. The ordered product may be important for accounting for precision errors in @@ -1805,6 +1901,100 @@ Result: reduce_mul_ordered :: intrinsics.simd_reduce_mul_ordered /* +Reduce a vector to a scalar by adding up all the lanes in a pairwise fashion. + +This procedure returns a scalar that is the sum of all lanes, calculated by +adding each even-indexed element with the following odd-indexed element to +produce N/2 values. This is repeated until only a single element remains. This +order is supported by hardware instructions for some types/architectures (e.g. +i16/i32/f32/f64 on x86 SSE, i8/i16/i32/f32 on ARM NEON). + +The order of the sum may be important for accounting for precision errors in +floating-point computation, as floating-point addition is not associative, that +is `(a+b)+c` may not be equal to `a+(b+c)`. + +Inputs: +- `v`: The vector to reduce. + +Result: +- Sum of all lanes, as a scalar. + +**Operation**: + + for n > 1 { + n = n / 2 + for i in 0 ..< n { + a[i] = a[2*i+0] + a[2*i+1] + } + } + res := a[0] + +Graphical representation of the operation for N=4: + + +-----------------------+ + v: | v0 | v1 | v2 | v3 | + +-----------------------+ + | | | | + `>[+]<' `>[+]<' + | | + `--->[+]<--' + | + v + +-----+ + result: | y0 | + +-----+ +*/ +reduce_add_pairs :: intrinsics.simd_reduce_add_pairs + +/* +Reduce a vector to a scalar by multiplying all the lanes in a pairwise fashion. + +This procedure returns a scalar that is the product of all lanes, calculated by +bisecting the vector into two parts, where the first contains lanes [0, N/2) +and the second contains lanes [N/2, N), and multiplying the two halves together +multiplying each even-indexed element with the following odd-indexed element to +produce N/2 values. This is repeated until only a single element remains. This +order may be faster to compute than the ordered product for floats, as it can +often be better parallelized. + +The order of the product may be important for accounting for precision errors +in floating-point computation, as floating-point multiplication is not +associative, that is `(a*b)*c` may not be equal to `a*(b*c)`. + +Inputs: +- `v`: The vector to reduce. + +Result: +- Product of all lanes, as a scalar. + +**Operation**: + + for n > 1 { + n = n / 2 + for i in 0 ..< n { + a[i] = a[2*i+0] * a[2*i+1] + } + } + res := a[0] + +Graphical representation of the operation for N=4: + + +-----------------------+ + v: | v0 | v1 | v2 | v3 | + +-----------------------+ + | | | | + `>[x]<' `>[x]<' + | | + `--->[x]<--' + | + v + +-----+ + result: | y0 | + +-----+ +*/ +reduce_mul_pairs :: intrinsics.simd_reduce_mul_pairs + +/* Reduce a vector to a scalar by finding the minimum value between all of the lanes. This procedure returns a scalar that is the minimum value of all the lanes |