aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorFeoramund <161657516+Feoramund@users.noreply.github.com>2024-08-04 19:12:46 -0400
committerFeoramund <161657516+Feoramund@users.noreply.github.com>2024-08-04 19:12:46 -0400
commite17fc8272b08d1e2f59c13ff23df9a3d84a0c8a0 (patch)
tree2c429ee4b4f6cf069e7c75828623f0a24174da6d /core
parentdde42f0ebcef9dd7741761e6a7cc5ba738b63320 (diff)
Document rationale behind RegEx shorthand classes
Diffstat (limited to 'core')
-rw-r--r--core/text/regex/doc.odin18
1 files changed, 18 insertions, 0 deletions
diff --git a/core/text/regex/doc.odin b/core/text/regex/doc.odin
index 7b28bbc3d..61ab8b80e 100644
--- a/core/text/regex/doc.odin
+++ b/core/text/regex/doc.odin
@@ -29,6 +29,24 @@ These specifiers can be composed together, such as an optional group:
This package also supports the non-greedy variants of the repeating and
optional specifiers by appending a `?` to them.
+Of the shorthand classes that are supported, they are all ASCII-based, even
+when compiling in Unicode mode. This is for the sake of general performance and
+simplicity, as there are thousands of Unicode codepoints which would qualify as
+either a digit, space, or word character which could be irrelevant depending on
+what is being matched.
+
+Here are the shorthand class equivalencies:
+ \d: [0-9]
+ \s: [\t\n\f\r ]
+ \w: [0-9A-Z_a-z]
+
+If you need your own shorthands, you can compose strings together like so:
+ MY_HEX :: "[0-9A-Fa-f]"
+ PATTERN :: MY_HEX + "-" + MY_HEX
+
+The compiler will handle turning multiple identical classes into references to
+the same set of matching runes, so there's no penalty for doing it like this.
+
``Some people, when confronted with a problem, think