diff options
| author | Mattias EngdegÄrd | 2019-07-04 13:01:52 +0200 |
|---|---|---|
| committer | Mattias EngdegÄrd | 2019-07-07 11:49:21 +0200 |
| commit | ac1ad3e49abd57a3e39b817864ea379354119d08 (patch) | |
| tree | 6b0a410e8fcc047fe666af06f5a03796431a9925 /doc | |
| parent | b39f5e6c9c50b3153c4e7bfac9219f14da73e4d1 (diff) | |
| download | emacs-ac1ad3e49abd57a3e39b817864ea379354119d08.tar.gz emacs-ac1ad3e49abd57a3e39b817864ea379354119d08.zip | |
Describe the rx notation in the elisp manual (bug#36496)
The additions are excluded from the print version to avoid making it
thicker.
* doc/lispref/elisp.texi (Top): New menu entry.
* doc/lispref/searching.texi (Regular Expressions): New menu entry.
(Regexp Example): Add rx form of the example.
(Rx Notation, Rx Constructs, Rx Functions): New nodes.
* doc/lispref/control.texi (pcase Macro): Describe the rx pattern.
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/lispref/control.texi | 25 | ||||
| -rw-r--r-- | doc/lispref/elisp.texi | 3 | ||||
| -rw-r--r-- | doc/lispref/searching.texi | 573 |
3 files changed, 601 insertions, 0 deletions
diff --git a/doc/lispref/control.texi b/doc/lispref/control.texi index e308d68b75d..de6cd9301ff 100644 --- a/doc/lispref/control.texi +++ b/doc/lispref/control.texi | |||
| @@ -618,6 +618,31 @@ To present a consistent environment (@pxref{Intro Eval}) | |||
| 618 | to @var{body-forms} (thus avoiding an evaluation error on match), | 618 | to @var{body-forms} (thus avoiding an evaluation error on match), |
| 619 | if any of the sub-patterns let-binds a set of symbols, | 619 | if any of the sub-patterns let-binds a set of symbols, |
| 620 | they @emph{must} all bind the same set of symbols. | 620 | they @emph{must} all bind the same set of symbols. |
| 621 | |||
| 622 | @ifnottex | ||
| 623 | @anchor{rx in pcase} | ||
| 624 | @item (rx @var{rx-expr}@dots{}) | ||
| 625 | Matches strings against the regexp @var{rx-expr}@dots{}, using the | ||
| 626 | @code{rx} regexp notation (@pxref{Rx Notation}), as if by | ||
| 627 | @code{string-match}. | ||
| 628 | |||
| 629 | In addition to the usual @code{rx} syntax, @var{rx-expr}@dots{} can | ||
| 630 | contain the following constructs: | ||
| 631 | |||
| 632 | @table @code | ||
| 633 | @item (let @var{ref} @var{rx-expr}@dots{}) | ||
| 634 | Bind the symbol @var{ref} to a submatch that matches | ||
| 635 | @var{rx-expr}@enddots{}. @var{ref} is bound in @var{body-forms} to | ||
| 636 | the string of the submatch or nil, but can also be used in | ||
| 637 | @code{backref}. | ||
| 638 | |||
| 639 | @item (backref @var{ref}) | ||
| 640 | Like the standard @code{backref} construct, but @var{ref} can here | ||
| 641 | also be a name introduced by a previous @code{(let @var{ref} @dots{})} | ||
| 642 | construct. | ||
| 643 | @end table | ||
| 644 | @end ifnottex | ||
| 645 | |||
| 621 | @end table | 646 | @end table |
| 622 | 647 | ||
| 623 | @anchor{pcase-example-0} | 648 | @anchor{pcase-example-0} |
diff --git a/doc/lispref/elisp.texi b/doc/lispref/elisp.texi index e18759654d9..c86f7f3dfbf 100644 --- a/doc/lispref/elisp.texi +++ b/doc/lispref/elisp.texi | |||
| @@ -1298,6 +1298,9 @@ Regular Expressions | |||
| 1298 | 1298 | ||
| 1299 | * Syntax of Regexps:: Rules for writing regular expressions. | 1299 | * Syntax of Regexps:: Rules for writing regular expressions. |
| 1300 | * Regexp Example:: Illustrates regular expression syntax. | 1300 | * Regexp Example:: Illustrates regular expression syntax. |
| 1301 | @ifnottex | ||
| 1302 | * Rx Notation:: An alternative, structured regexp notation. | ||
| 1303 | @end ifnottex | ||
| 1301 | * Regexp Functions:: Functions for operating on regular expressions. | 1304 | * Regexp Functions:: Functions for operating on regular expressions. |
| 1302 | 1305 | ||
| 1303 | Syntax of Regular Expressions | 1306 | Syntax of Regular Expressions |
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi index ef1cffc446f..f95c9bf976e 100644 --- a/doc/lispref/searching.texi +++ b/doc/lispref/searching.texi | |||
| @@ -254,6 +254,9 @@ it easier to verify even very complex regexps. | |||
| 254 | @menu | 254 | @menu |
| 255 | * Syntax of Regexps:: Rules for writing regular expressions. | 255 | * Syntax of Regexps:: Rules for writing regular expressions. |
| 256 | * Regexp Example:: Illustrates regular expression syntax. | 256 | * Regexp Example:: Illustrates regular expression syntax. |
| 257 | @ifnottex | ||
| 258 | * Rx Notation:: An alternative, structured regexp notation. | ||
| 259 | @end ifnottex | ||
| 257 | * Regexp Functions:: Functions for operating on regular expressions. | 260 | * Regexp Functions:: Functions for operating on regular expressions. |
| 258 | @end menu | 261 | @end menu |
| 259 | 262 | ||
| @@ -359,6 +362,7 @@ is a postfix operator, similar to @samp{*} except that it must match the | |||
| 359 | preceding expression either once or not at all. For example, | 362 | preceding expression either once or not at all. For example, |
| 360 | @samp{ca?r} matches @samp{car} or @samp{cr}; nothing else. | 363 | @samp{ca?r} matches @samp{car} or @samp{cr}; nothing else. |
| 361 | 364 | ||
| 365 | @anchor{Non-greedy repetition} | ||
| 362 | @item @samp{*?}, @samp{+?}, @samp{??} | 366 | @item @samp{*?}, @samp{+?}, @samp{??} |
| 363 | @cindex non-greedy repetition characters in regexp | 367 | @cindex non-greedy repetition characters in regexp |
| 364 | These are @dfn{non-greedy} variants of the operators @samp{*}, @samp{+} | 368 | These are @dfn{non-greedy} variants of the operators @samp{*}, @samp{+} |
| @@ -951,6 +955,575 @@ Finally, the last part of the pattern matches any additional whitespace | |||
| 951 | beyond the minimum needed to end a sentence. | 955 | beyond the minimum needed to end a sentence. |
| 952 | @end table | 956 | @end table |
| 953 | 957 | ||
| 958 | @ifnottex | ||
| 959 | In the @code{rx} notation (@pxref{Rx Notation}), the regexp could be written | ||
| 960 | |||
| 961 | @example | ||
| 962 | @group | ||
| 963 | (rx (any ".?!") ; Punctuation ending sentence. | ||
| 964 | (zero-or-more (any "\"')]@}")) ; Closing quotes or brackets. | ||
| 965 | (or line-end | ||
| 966 | (seq " " line-end) | ||
| 967 | "\t" | ||
| 968 | " ") ; Two spaces. | ||
| 969 | (zero-or-more (any "\t\n "))) ; Optional extra whitespace. | ||
| 970 | @end group | ||
| 971 | @end example | ||
| 972 | |||
| 973 | Since @code{rx} regexps are just S-expressions, they can be formatted | ||
| 974 | and commented as such. | ||
| 975 | @end ifnottex | ||
| 976 | |||
| 977 | @ifnottex | ||
| 978 | @node Rx Notation | ||
| 979 | @subsection The @code{rx} Structured Regexp Notation | ||
| 980 | @cindex rx | ||
| 981 | @cindex regexp syntax | ||
| 982 | |||
| 983 | As an alternative to the string-based syntax, Emacs provides the | ||
| 984 | structured @code{rx} notation based on Lisp S-expressions. This | ||
| 985 | notation is usually easier to read, write and maintain than regexp | ||
| 986 | strings, and can be indented and commented freely. It requires a | ||
| 987 | conversion into string form since that is what regexp functions | ||
| 988 | expect, but that conversion typically takes place during | ||
| 989 | byte-compilation rather than when the Lisp code using the regexp is | ||
| 990 | run. | ||
| 991 | |||
| 992 | Here is an @code{rx} regexp@footnote{It could be written much | ||
| 993 | simpler with non-greedy operators (how?), but that would make the | ||
| 994 | example less interesting.} that matches a block comment in the C | ||
| 995 | programming language: | ||
| 996 | |||
| 997 | @example | ||
| 998 | @group | ||
| 999 | (rx "/*" ; Initial /* | ||
| 1000 | (zero-or-more | ||
| 1001 | (or (not (any "*")) ; Either non-*, | ||
| 1002 | (seq "*" ; or * followed by | ||
| 1003 | (not (any "/"))))) ; non-/ | ||
| 1004 | (one-or-more "*") ; At least one star, | ||
| 1005 | "/") ; and the final / | ||
| 1006 | @end group | ||
| 1007 | @end example | ||
| 1008 | |||
| 1009 | @noindent | ||
| 1010 | or, using shorter synonyms and written more compactly, | ||
| 1011 | |||
| 1012 | @example | ||
| 1013 | @group | ||
| 1014 | (rx "/*" | ||
| 1015 | (* (| (not (any "*")) | ||
| 1016 | (: "*" (not (any "/"))))) | ||
| 1017 | (+ "*") "/") | ||
| 1018 | @end group | ||
| 1019 | @end example | ||
| 1020 | |||
| 1021 | @noindent | ||
| 1022 | In conventional string syntax, it would be written | ||
| 1023 | |||
| 1024 | @example | ||
| 1025 | "/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/" | ||
| 1026 | @end example | ||
| 1027 | |||
| 1028 | The @code{rx} notation is mainly useful in Lisp code; it cannot be | ||
| 1029 | used in most interactive situations where a regexp is requested, such | ||
| 1030 | as when running @code{query-replace-regexp} or in variable | ||
| 1031 | customisation. | ||
| 1032 | |||
| 1033 | @menu | ||
| 1034 | * Rx Constructs:: Constructs valid in rx forms. | ||
| 1035 | * Rx Functions:: Functions and macros that use rx forms. | ||
| 1036 | @end menu | ||
| 1037 | |||
| 1038 | @node Rx Constructs | ||
| 1039 | @subsubsection Constructs in @code{rx} regexps | ||
| 1040 | |||
| 1041 | The various forms in @code{rx} regexps are described below. The | ||
| 1042 | shorthand @var{rx} represents any @code{rx} form, and @var{rx}@dots{} | ||
| 1043 | means one or more @code{rx} forms. Where the corresponding string | ||
| 1044 | regexp syntax is given, @var{A}, @var{B}, @dots{} are string regexp | ||
| 1045 | subexpressions. | ||
| 1046 | @c With the new implementation of rx, this can be changed from | ||
| 1047 | @c 'one or more' to 'zero or more'. | ||
| 1048 | |||
| 1049 | @subsubheading Literals | ||
| 1050 | |||
| 1051 | @table @asis | ||
| 1052 | @item @code{"some-string"} | ||
| 1053 | Match the string @samp{some-string} literally. There are no | ||
| 1054 | characters with special meaning, unlike in string regexps. | ||
| 1055 | |||
| 1056 | @item @code{?C} | ||
| 1057 | Match the character @samp{C} literally. | ||
| 1058 | @end table | ||
| 1059 | |||
| 1060 | @subsubheading Sequence and alternative | ||
| 1061 | |||
| 1062 | @table @asis | ||
| 1063 | @item @code{(seq @var{rx}@dots{})} | ||
| 1064 | @cindex @code{seq} in rx | ||
| 1065 | @itemx @code{(sequence @var{rx}@dots{})} | ||
| 1066 | @cindex @code{sequence} in rx | ||
| 1067 | @itemx @code{(: @var{rx}@dots{})} | ||
| 1068 | @cindex @code{:} in rx | ||
| 1069 | @itemx @code{(and @var{rx}@dots{})} | ||
| 1070 | @cindex @code{and} in rx | ||
| 1071 | Match the @var{rx}s in sequence. Without arguments, the expression | ||
| 1072 | matches the empty string.@* | ||
| 1073 | Corresponding string regexp: @samp{@var{A}@var{B}@dots{}} | ||
| 1074 | (subexpressions in sequence). | ||
| 1075 | |||
| 1076 | @item @code{(or @var{rx}@dots{})} | ||
| 1077 | @cindex @code{or} in rx | ||
| 1078 | @itemx @code{(| @var{rx}@dots{})} | ||
| 1079 | @cindex @code{|} in rx | ||
| 1080 | Match exactly one of the @var{rx}s, trying from left to right. | ||
| 1081 | Without arguments, the expression will not match anything at all.@* | ||
| 1082 | Corresponding string regexp: @samp{@var{A}\|@var{B}\|@dots{}}. | ||
| 1083 | @end table | ||
| 1084 | |||
| 1085 | @subsubheading Repetition | ||
| 1086 | |||
| 1087 | Normally, repetition forms are greedy, in that they attempt to match | ||
| 1088 | as many times as possible. Some forms are non-greedy; they try to | ||
| 1089 | match as few times as possible (@pxref{Non-greedy repetition}). | ||
| 1090 | |||
| 1091 | @table @code | ||
| 1092 | @item (zero-or-more @var{rx}@dots{}) | ||
| 1093 | @cindex @code{zero-or-more} in rx | ||
| 1094 | @itemx (0+ @var{rx}@dots{}) | ||
| 1095 | @cindex @code{0+} in rx | ||
| 1096 | Match the @var{rx}s zero or more times. Greedy by default.@* | ||
| 1097 | Corresponding string regexp: @samp{@var{A}*} (greedy), | ||
| 1098 | @samp{@var{A}*?} (non-greedy) | ||
| 1099 | |||
| 1100 | @item (one-or-more @var{rx}@dots{}) | ||
| 1101 | @cindex @code{one-or-more} in rx | ||
| 1102 | @itemx (1+ @var{rx}@dots{}) | ||
| 1103 | @cindex @code{1+} in rx | ||
| 1104 | Match the @var{rx}s one or more times. Greedy by default.@* | ||
| 1105 | Corresponding string regexp: @samp{@var{A}+} (greedy), | ||
| 1106 | @samp{@var{A}+?} (non-greedy) | ||
| 1107 | |||
| 1108 | @item (zero-or-one @var{rx}@dots{}) | ||
| 1109 | @cindex @code{zero-or-one} in rx | ||
| 1110 | @itemx (optional @var{rx}@dots{}) | ||
| 1111 | @cindex @code{optional} in rx | ||
| 1112 | @itemx (opt @var{rx}@dots{}) | ||
| 1113 | @cindex @code{opt} in rx | ||
| 1114 | Match the @var{rx}s once or an empty string. Greedy by default.@* | ||
| 1115 | Corresponding string regexp: @samp{@var{A}?} (greedy), | ||
| 1116 | @samp{@var{A}??} (non-greedy). | ||
| 1117 | |||
| 1118 | @item (* @var{rx}@dots{}) | ||
| 1119 | @cindex @code{*} in rx | ||
| 1120 | Match the @var{rx}s zero or more times. Greedy.@* | ||
| 1121 | Corresponding string regexp: @samp{@var{A}*} | ||
| 1122 | |||
| 1123 | @item (+ @var{rx}@dots{}) | ||
| 1124 | @cindex @code{+} in rx | ||
| 1125 | Match the @var{rx}s one or more times. Greedy.@* | ||
| 1126 | Corresponding string regexp: @samp{@var{A}+} | ||
| 1127 | |||
| 1128 | @item (? @var{rx}@dots{}) | ||
| 1129 | @cindex @code{?} in rx | ||
| 1130 | Match the @var{rx}s once or an empty string. Greedy.@* | ||
| 1131 | Corresponding string regexp: @samp{@var{A}?} | ||
| 1132 | |||
| 1133 | @item (*? @var{rx}@dots{}) | ||
| 1134 | @cindex @code{*?} in rx | ||
| 1135 | Match the @var{rx}s zero or more times. Non-greedy.@* | ||
| 1136 | Corresponding string regexp: @samp{@var{A}*?} | ||
| 1137 | |||
| 1138 | @item (+? @var{rx}@dots{}) | ||
| 1139 | @cindex @code{+?} in rx | ||
| 1140 | Match the @var{rx}s one or more times. Non-greedy.@* | ||
| 1141 | Corresponding string regexp: @samp{@var{A}+?} | ||
| 1142 | |||
| 1143 | @item (?? @var{rx}@dots{}) | ||
| 1144 | @cindex @code{??} in rx | ||
| 1145 | Match the @var{rx}s or an empty string. Non-greedy.@* | ||
| 1146 | Corresponding string regexp: @samp{@var{A}??} | ||
| 1147 | |||
| 1148 | @item (= @var{n} @var{rx}@dots{}) | ||
| 1149 | @cindex @code{=} in rx | ||
| 1150 | @itemx (repeat @var{n} @var{rx}) | ||
| 1151 | Match the @var{rx}s exactly @var{n} times.@* | ||
| 1152 | Corresponding string regexp: @samp{@var{A}\@{@var{n}\@}} | ||
| 1153 | |||
| 1154 | @item (>= @var{n} @var{rx}@dots{}) | ||
| 1155 | @cindex @code{>=} in rx | ||
| 1156 | Match the @var{rx}s @var{n} or more times. Greedy.@* | ||
| 1157 | Corresponding string regexp: @samp{@var{A}\@{@var{n},\@}} | ||
| 1158 | |||
| 1159 | @item (** @var{n} @var{m} @var{rx}@dots{}) | ||
| 1160 | @cindex @code{**} in rx | ||
| 1161 | @itemx (repeat @var{n} @var{m} @var{rx}@dots{}) | ||
| 1162 | @cindex @code{repeat} in rx | ||
| 1163 | Match the @var{rx}s at least @var{n} but no more than @var{m} times. Greedy.@* | ||
| 1164 | Corresponding string regexp: @samp{@var{A}\@{@var{n},@var{m}\@}} | ||
| 1165 | @end table | ||
| 1166 | |||
| 1167 | The greediness of some repetition forms can be controlled using the | ||
| 1168 | following constructs. However, it is usually better to use the | ||
| 1169 | explicit non-greedy forms above when such matching is required. | ||
| 1170 | |||
| 1171 | @table @code | ||
| 1172 | @item (minimal-match @var{rx}) | ||
| 1173 | @cindex @code{minimal-match} in rx | ||
| 1174 | Match @var{rx}, with @code{zero-or-more}, @code{0+}, | ||
| 1175 | @code{one-or-more}, @code{1+}, @code{zero-or-one}, @code{opt} and | ||
| 1176 | @code{option} using non-greedy matching. | ||
| 1177 | |||
| 1178 | @item (maximal-match @var{rx}) | ||
| 1179 | @cindex @code{maximal-match} in rx | ||
| 1180 | Match @var{rx}, with @code{zero-or-more}, @code{0+}, | ||
| 1181 | @code{one-or-more}, @code{1+}, @code{zero-or-one}, @code{opt} and | ||
| 1182 | @code{option} using non-greedy matching. This is the default. | ||
| 1183 | @end table | ||
| 1184 | |||
| 1185 | @subsubheading Matching single characters | ||
| 1186 | |||
| 1187 | @table @asis | ||
| 1188 | @item @code{(any @var{set}@dots{})} | ||
| 1189 | @cindex @code{any} in rx | ||
| 1190 | @itemx @code{(char @var{set}@dots{})} | ||
| 1191 | @cindex @code{char} in rx | ||
| 1192 | @itemx @code{(in @var{set}@dots{})} | ||
| 1193 | @cindex @code{in} in rx | ||
| 1194 | @cindex character class in rx | ||
| 1195 | Match a single character from one of the @var{set}s. Each @var{set} | ||
| 1196 | is a character, a string representing the set of its characters, a | ||
| 1197 | range or a character class (see below). A range is either a | ||
| 1198 | hyphen-separated string like @code{"A-Z"}, or a cons of characters | ||
| 1199 | like @code{(?A . ?Z)}. | ||
| 1200 | |||
| 1201 | Note that hyphen (@code{-}) is special in strings in this construct, | ||
| 1202 | since it acts as a range separator. To include a hyphen, add it as a | ||
| 1203 | separate character or single-character string.@* | ||
| 1204 | Corresponding string regexp: @samp{[@dots{}]} | ||
| 1205 | |||
| 1206 | @item @code{(not @var{charspec})} | ||
| 1207 | @cindex @code{not} in rx | ||
| 1208 | Match a character not included in @var{charspec}. @var{charspec} can | ||
| 1209 | be an @code{any}, @code{syntax} or @code{category} form, or a | ||
| 1210 | character class.@* | ||
| 1211 | Corresponding string regexp: @samp{[^@dots{}]}, @samp{\S@var{code}}, | ||
| 1212 | @samp{\C@var{code}} | ||
| 1213 | |||
| 1214 | @item @code{not-newline}, @code{nonl} | ||
| 1215 | @cindex @code{not-newline} in rx | ||
| 1216 | @cindex @code{nonl} in rx | ||
| 1217 | Match any character except a newline.@* | ||
| 1218 | Corresponding string regexp: @samp{.} (dot) | ||
| 1219 | |||
| 1220 | @item @code{anything} | ||
| 1221 | @cindex @code{anything} in rx | ||
| 1222 | Match any character.@* | ||
| 1223 | Corresponding string regexp: @samp{.\|\n} (for example) | ||
| 1224 | |||
| 1225 | @item character class | ||
| 1226 | @cindex character class in rx | ||
| 1227 | Match a character from a named character class: | ||
| 1228 | |||
| 1229 | @table @asis | ||
| 1230 | @item @code{alpha}, @code{alphabetic}, @code{letter} | ||
| 1231 | Match alphabetic characters. More precisely, match characters whose | ||
| 1232 | Unicode @samp{general-category} property indicates that they are | ||
| 1233 | alphabetic. | ||
| 1234 | |||
| 1235 | @item @code{alnum}, @code{alphanumeric} | ||
| 1236 | Match alphabetic characters and digits. More precisely, match | ||
| 1237 | characters whose Unicode @samp{general-category} property indicates | ||
| 1238 | that they are alphabetic or decimal digits. | ||
| 1239 | |||
| 1240 | @item @code{digit}, @code{numeric}, @code{num} | ||
| 1241 | Match the digits @samp{0}--@samp{9}. | ||
| 1242 | |||
| 1243 | @item @code{xdigit}, @code{hex-digit}, @code{hex} | ||
| 1244 | Match the hexadecimal digits @samp{0}--@samp{9}, @samp{A}--@samp{F} | ||
| 1245 | and @samp{a}--@samp{f}. | ||
| 1246 | |||
| 1247 | @item @code{cntrl}, @code{control} | ||
| 1248 | Match any character whose code is in the range 0--31. | ||
| 1249 | |||
| 1250 | @item @code{blank} | ||
| 1251 | Match horizontal whitespace. More precisely, match characters whose | ||
| 1252 | Unicode @samp{general-category} property indicates that they are | ||
| 1253 | spacing separators. | ||
| 1254 | |||
| 1255 | @item @code{space}, @code{whitespace}, @code{white} | ||
| 1256 | Match any character that has whitespace syntax | ||
| 1257 | (@pxref{Syntax Class Table}). | ||
| 1258 | |||
| 1259 | @item @code{lower}, @code{lower-case} | ||
| 1260 | Match anything lower-case, as determined by the current case table. | ||
| 1261 | If @code{case-fold-search} is non-nil, this also matches any | ||
| 1262 | upper-case letter. | ||
| 1263 | |||
| 1264 | @item @code{upper}, @code{upper-case} | ||
| 1265 | Match anything upper-case, as determined by the current case table. | ||
| 1266 | If @code{case-fold-search} is non-nil, this also matches any | ||
| 1267 | lower-case letter. | ||
| 1268 | |||
| 1269 | @item @code{graph}, @code{graphic} | ||
| 1270 | Match any character except whitespace, @acronym{ASCII} and | ||
| 1271 | non-@acronym{ASCII} control characters, surrogates, and codepoints | ||
| 1272 | unassigned by Unicode, as indicated by the Unicode | ||
| 1273 | @samp{general-category} property. | ||
| 1274 | |||
| 1275 | @item @code{print}, @code{printing} | ||
| 1276 | Match whitespace or a character matched by @code{graph}. | ||
| 1277 | |||
| 1278 | @item @code{punct}, @code{punctuation} | ||
| 1279 | Match any punctuation character. (At present, for multibyte | ||
| 1280 | characters, anything that has non-word syntax.) | ||
| 1281 | |||
| 1282 | @item @code{word}, @code{wordchar} | ||
| 1283 | Match any character that has word syntax (@pxref{Syntax Class Table}). | ||
| 1284 | |||
| 1285 | @item @code{ascii} | ||
| 1286 | Match any @acronym{ASCII} character (codes 0--127). | ||
| 1287 | |||
| 1288 | @item @code{nonascii} | ||
| 1289 | Match any non-@acronym{ASCII} character (but not raw bytes). | ||
| 1290 | @end table | ||
| 1291 | |||
| 1292 | Corresponding string regexp: @samp{[[:@var{class}:]]} | ||
| 1293 | |||
| 1294 | @item @code{(syntax @var{syntax})} | ||
| 1295 | @cindex @code{syntax} in rx | ||
| 1296 | Match a character with syntax @var{syntax}, being one of the following | ||
| 1297 | names: | ||
| 1298 | |||
| 1299 | @multitable {@code{close-parenthesis}} {Syntax character} | ||
| 1300 | @headitem Syntax name @tab Syntax character | ||
| 1301 | @item @code{whitespace} @tab @code{-} | ||
| 1302 | @item @code{punctuation} @tab @code{.} | ||
| 1303 | @item @code{word} @tab @code{w} | ||
| 1304 | @item @code{symbol} @tab @code{_} | ||
| 1305 | @item @code{open-parenthesis} @tab @code{(} | ||
| 1306 | @item @code{close-parenthesis} @tab @code{)} | ||
| 1307 | @item @code{expression-prefix} @tab @code{'} | ||
| 1308 | @item @code{string-quote} @tab @code{"} | ||
| 1309 | @item @code{paired-delimiter} @tab @code{$} | ||
| 1310 | @item @code{escape} @tab @code{\} | ||
| 1311 | @item @code{character-quote} @tab @code{/} | ||
| 1312 | @item @code{comment-start} @tab @code{<} | ||
| 1313 | @item @code{comment-end} @tab @code{>} | ||
| 1314 | @item @code{string-delimiter} @tab @code{|} | ||
| 1315 | @item @code{comment-delimiter} @tab @code{!} | ||
| 1316 | @end multitable | ||
| 1317 | |||
| 1318 | For details, @pxref{Syntax Class Table}. Please note that | ||
| 1319 | @code{(syntax punctuation)} is @emph{not} equivalent to the character class | ||
| 1320 | @code{punctuation}.@* | ||
| 1321 | Corresponding string regexp: @samp{\s@var{code}} | ||
| 1322 | |||
| 1323 | @item @code {(category @var{category})} | ||
| 1324 | @cindex @code{category} in rx | ||
| 1325 | Match a character in category @var{category}, which is either one of | ||
| 1326 | the names below or its category character. | ||
| 1327 | |||
| 1328 | @multitable {@code{vowel-modifying-diacritical-mark}} {Category character} | ||
| 1329 | @headitem Category name @tab Category character | ||
| 1330 | @item @code{space-for-indent} @tab space | ||
| 1331 | @item @code{base} @tab @code{.} | ||
| 1332 | @item @code{consonant} @tab @code{0} | ||
| 1333 | @item @code{base-vowel} @tab @code{1} | ||
| 1334 | @item @code{upper-diacritical-mark} @tab @code{2} | ||
| 1335 | @item @code{lower-diacritical-mark} @tab @code{3} | ||
| 1336 | @item @code{tone-mark} @tab @code{4} | ||
| 1337 | @item @code{symbol} @tab @code{5} | ||
| 1338 | @item @code{digit} @tab @code{6} | ||
| 1339 | @item @code{vowel-modifying-diacritical-mark} @tab @code{7} | ||
| 1340 | @item @code{vowel-sign} @tab @code{8} | ||
| 1341 | @item @code{semivowel-lower} @tab @code{9} | ||
| 1342 | @item @code{not-at-end-of-line} @tab @code{<} | ||
| 1343 | @item @code{not-at-beginning-of-line} @tab @code{>} | ||
| 1344 | @item @code{alpha-numeric-two-byte} @tab @code{A} | ||
| 1345 | @item @code{chinese-two-byte} @tab @code{C} | ||
| 1346 | @item @code{greek-two-byte} @tab @code{G} | ||
| 1347 | @item @code{japanese-hiragana-two-byte} @tab @code{H} | ||
| 1348 | @item @code{indian-two-byte} @tab @code{I} | ||
| 1349 | @item @code{japanese-katakana-two-byte} @tab @code{K} | ||
| 1350 | @item @code{strong-left-to-right} @tab @code{L} | ||
| 1351 | @item @code{korean-hangul-two-byte} @tab @code{N} | ||
| 1352 | @item @code{strong-right-to-left} @tab @code{R} | ||
| 1353 | @item @code{cyrillic-two-byte} @tab @code{Y} | ||
| 1354 | @item @code{combining-diacritic} @tab @code{^} | ||
| 1355 | @item @code{ascii} @tab @code{a} | ||
| 1356 | @item @code{arabic} @tab @code{b} | ||
| 1357 | @item @code{chinese} @tab @code{c} | ||
| 1358 | @item @code{ethiopic} @tab @code{e} | ||
| 1359 | @item @code{greek} @tab @code{g} | ||
| 1360 | @item @code{korean} @tab @code{h} | ||
| 1361 | @item @code{indian} @tab @code{i} | ||
| 1362 | @item @code{japanese} @tab @code{j} | ||
| 1363 | @item @code{japanese-katakana} @tab @code{k} | ||
| 1364 | @item @code{latin} @tab @code{l} | ||
| 1365 | @item @code{lao} @tab @code{o} | ||
| 1366 | @item @code{tibetan} @tab @code{q} | ||
| 1367 | @item @code{japanese-roman} @tab @code{r} | ||
| 1368 | @item @code{thai} @tab @code{t} | ||
| 1369 | @item @code{vietnamese} @tab @code{v} | ||
| 1370 | @item @code{hebrew} @tab @code{w} | ||
| 1371 | @item @code{cyrillic} @tab @code{y} | ||
| 1372 | @item @code{can-break} @tab @code{|} | ||
| 1373 | @end multitable | ||
| 1374 | |||
| 1375 | For more information about currently defined categories, run the | ||
| 1376 | command @kbd{M-x describe-categories @key{RET}}. For how to define | ||
| 1377 | new categories, @pxref{Categories}.@* | ||
| 1378 | Corresponding string regexp: @samp{\c@var{code}} | ||
| 1379 | @end table | ||
| 1380 | |||
| 1381 | @subsubheading Zero-width assertions | ||
| 1382 | |||
| 1383 | These all match the empty string, but only in specific places. | ||
| 1384 | |||
| 1385 | @table @asis | ||
| 1386 | @item @code{line-start}, @code{bol} | ||
| 1387 | @cindex @code{line-start} in rx | ||
| 1388 | @cindex @code{bol} in rx | ||
| 1389 | Match at the beginning of a line.@* | ||
| 1390 | Corresponding string regexp: @samp{^} | ||
| 1391 | |||
| 1392 | @item @code{line-end}, @code{eol} | ||
| 1393 | @cindex @code{line-end} in rx | ||
| 1394 | @cindex @code{eol} in rx | ||
| 1395 | Match at the end of a line.@* | ||
| 1396 | Corresponding string regexp: @samp{$} | ||
| 1397 | |||
| 1398 | @item @code{string-start}, @code{bos}, @code{buffer-start}, @code{bot} | ||
| 1399 | @cindex @code{string-start} in rx | ||
| 1400 | @cindex @code{bos} in rx | ||
| 1401 | @cindex @code{buffer-start} in rx | ||
| 1402 | @cindex @code{bot} in rx | ||
| 1403 | Match at the start of the string or buffer being matched against.@* | ||
| 1404 | Corresponding string regexp: @samp{\`} | ||
| 1405 | |||
| 1406 | @item @code{string-end}, @code{eos}, @code{buffer-end}, @code{eot} | ||
| 1407 | @cindex @code{string-end} in rx | ||
| 1408 | @cindex @code{eos} in rx | ||
| 1409 | @cindex @code{buffer-end} in rx | ||
| 1410 | @cindex @code{eot} in rx | ||
| 1411 | Match at the end of the string or buffer being matched against.@* | ||
| 1412 | Corresponding string regexp: @samp{\'} | ||
| 1413 | |||
| 1414 | @item @code{point} | ||
| 1415 | @cindex @code{point} in rx | ||
| 1416 | Match at point.@* | ||
| 1417 | Corresponding string regexp: @samp{\=} | ||
| 1418 | |||
| 1419 | @item @code{word-start} | ||
| 1420 | @cindex @code{word-start} in rx | ||
| 1421 | Match at the beginning of a word.@* | ||
| 1422 | Corresponding string regexp: @samp{\<} | ||
| 1423 | |||
| 1424 | @item @code{word-end} | ||
| 1425 | @cindex @code{word-end} in rx | ||
| 1426 | Match at the end of a word.@* | ||
| 1427 | Corresponding string regexp: @samp{\>} | ||
| 1428 | |||
| 1429 | @item @code{word-boundary} | ||
| 1430 | @cindex @code{word-boundary} in rx | ||
| 1431 | Match at the beginning or end of a word.@* | ||
| 1432 | Corresponding string regexp: @samp{\b} | ||
| 1433 | |||
| 1434 | @item @code{not-word-boundary} | ||
| 1435 | @cindex @code{not-word-boundary} in rx | ||
| 1436 | Match anywhere but at the beginning or end of a word.@* | ||
| 1437 | Corresponding string regexp: @samp{\B} | ||
| 1438 | |||
| 1439 | @item @code{symbol-start} | ||
| 1440 | @cindex @code{symbol-start} in rx | ||
| 1441 | Match at the beginning of a symbol.@* | ||
| 1442 | Corresponding string regexp: @samp{\_<} | ||
| 1443 | |||
| 1444 | @item @code{symbol-end} | ||
| 1445 | @cindex @code{symbol-end} in rx | ||
| 1446 | Match at the end of a symbol.@* | ||
| 1447 | Corresponding string regexp: @samp{\_>} | ||
| 1448 | @end table | ||
| 1449 | |||
| 1450 | @subsubheading Capture groups | ||
| 1451 | |||
| 1452 | @table @code | ||
| 1453 | @item (group @var{rx}@dots{}) | ||
| 1454 | @cindex @code{group} in rx | ||
| 1455 | @itemx (submatch @var{rx}@dots{}) | ||
| 1456 | @cindex @code{submatch} in rx | ||
| 1457 | Match the @var{rx}s, making the matched text and position accessible | ||
| 1458 | in the match data. The first group in a regexp is numbered 1; | ||
| 1459 | subsequent groups will be numbered one higher than the previous | ||
| 1460 | group.@* | ||
| 1461 | Corresponding string regexp: @samp{\(@dots{}\)} | ||
| 1462 | |||
| 1463 | @item (group-n @var{n} @var{rx}@dots{}) | ||
| 1464 | @cindex @code{group-n} in rx | ||
| 1465 | @itemx (submatch-n @var{n} @var{rx}@dots{}) | ||
| 1466 | @cindex @code{submatch-n} in rx | ||
| 1467 | Like @code{group}, but explicitly assign the group number @var{n}. | ||
| 1468 | @var{n} must be positive.@* | ||
| 1469 | Corresponding string regexp: @samp{\(?@var{n}:@dots{}\)} | ||
| 1470 | |||
| 1471 | @item (backref @var{n}) | ||
| 1472 | @cindex @code{backref} in rx | ||
| 1473 | Match the text previously matched by group number @var{n}. | ||
| 1474 | @var{n} must be in the range 1--9.@* | ||
| 1475 | Corresponding string regexp: @samp{\@var{n}} | ||
| 1476 | @end table | ||
| 1477 | |||
| 1478 | @subsubheading Dynamic inclusion | ||
| 1479 | |||
| 1480 | @table @code | ||
| 1481 | @item (literal @var{expr}) | ||
| 1482 | @cindex @code{literal} in rx | ||
| 1483 | Match the literal string that is the result from evaluating the Lisp | ||
| 1484 | expression @var{expr}. The evaluation takes place at call time, in | ||
| 1485 | the current lexical environment. | ||
| 1486 | |||
| 1487 | @item (regexp @var{expr}) | ||
| 1488 | @cindex @code{regexp} in rx | ||
| 1489 | @itemx (regex @var{expr}) | ||
| 1490 | @cindex @code{regex} in rx | ||
| 1491 | Match the string regexp that is the result from evaluating the Lisp | ||
| 1492 | expression @var{expr}. The evaluation takes place at call time, in | ||
| 1493 | the current lexical environment. | ||
| 1494 | |||
| 1495 | @item (eval @var{expr}) | ||
| 1496 | @cindex @code{eval} in rx | ||
| 1497 | Match the rx form that is the result from evaluating the Lisp | ||
| 1498 | expression @var{expr}. The evaluation takes place at macro-expansion | ||
| 1499 | time for @code{rx}, at call time for @code{rx-to-string}, | ||
| 1500 | in the current global environment. | ||
| 1501 | @end table | ||
| 1502 | |||
| 1503 | @node Rx Functions | ||
| 1504 | @subsubsection Functions and macros using @code{rx} regexps | ||
| 1505 | |||
| 1506 | @defmac rx rx-expr@dots{} | ||
| 1507 | Translate the @var{rx-expr}s to a string regexp, as if they were the | ||
| 1508 | body of a @code{(seq @dots{})} form. The @code{rx} macro expands to a | ||
| 1509 | string constant, or, if @code{literal} or @code{regexp} forms are | ||
| 1510 | used, a Lisp expression that evaluates to a string. | ||
| 1511 | @end defmac | ||
| 1512 | |||
| 1513 | @defun rx-to-string rx-expr &optional no-group | ||
| 1514 | Translate @var{rx-expr} to a string regexp which is returned. | ||
| 1515 | If @var{no-group} is absent or nil, bracket the result in a | ||
| 1516 | non-capturing group, @samp{\(?:@dots{}\)}, if necessary to ensure that | ||
| 1517 | a postfix operator appended to it will apply to the whole expression. | ||
| 1518 | |||
| 1519 | Arguments to @code{literal} and @code{regexp} forms in @var{rx-expr} | ||
| 1520 | must be string literals. | ||
| 1521 | @end defun | ||
| 1522 | |||
| 1523 | The @code{pcase} macro can use @code{rx} expressions as patterns | ||
| 1524 | directly; @pxref{rx in pcase}. | ||
| 1525 | @end ifnottex | ||
| 1526 | |||
| 954 | @node Regexp Functions | 1527 | @node Regexp Functions |
| 955 | @subsection Regular Expression Functions | 1528 | @subsection Regular Expression Functions |
| 956 | 1529 | ||