aboutsummaryrefslogtreecommitdiffstats
path: root/doc/lispref
diff options
context:
space:
mode:
authorMattias EngdegÄrd2019-07-04 13:01:52 +0200
committerMattias EngdegÄrd2019-07-07 11:49:21 +0200
commitac1ad3e49abd57a3e39b817864ea379354119d08 (patch)
tree6b0a410e8fcc047fe666af06f5a03796431a9925 /doc/lispref
parentb39f5e6c9c50b3153c4e7bfac9219f14da73e4d1 (diff)
downloademacs-ac1ad3e49abd57a3e39b817864ea379354119d08.tar.gz
emacs-ac1ad3e49abd57a3e39b817864ea379354119d08.zip
Describe the rx notation in the elisp manual (bug#36496)
The additions are excluded from the print version to avoid making it thicker. * doc/lispref/elisp.texi (Top): New menu entry. * doc/lispref/searching.texi (Regular Expressions): New menu entry. (Regexp Example): Add rx form of the example. (Rx Notation, Rx Constructs, Rx Functions): New nodes. * doc/lispref/control.texi (pcase Macro): Describe the rx pattern.
Diffstat (limited to 'doc/lispref')
-rw-r--r--doc/lispref/control.texi25
-rw-r--r--doc/lispref/elisp.texi3
-rw-r--r--doc/lispref/searching.texi573
3 files changed, 601 insertions, 0 deletions
diff --git a/doc/lispref/control.texi b/doc/lispref/control.texi
index e308d68b75d..de6cd9301ff 100644
--- a/doc/lispref/control.texi
+++ b/doc/lispref/control.texi
@@ -618,6 +618,31 @@ To present a consistent environment (@pxref{Intro Eval})
618to @var{body-forms} (thus avoiding an evaluation error on match), 618to @var{body-forms} (thus avoiding an evaluation error on match),
619if any of the sub-patterns let-binds a set of symbols, 619if any of the sub-patterns let-binds a set of symbols,
620they @emph{must} all bind the same set of symbols. 620they @emph{must} all bind the same set of symbols.
621
622@ifnottex
623@anchor{rx in pcase}
624@item (rx @var{rx-expr}@dots{})
625Matches strings against the regexp @var{rx-expr}@dots{}, using the
626@code{rx} regexp notation (@pxref{Rx Notation}), as if by
627@code{string-match}.
628
629In addition to the usual @code{rx} syntax, @var{rx-expr}@dots{} can
630contain the following constructs:
631
632@table @code
633@item (let @var{ref} @var{rx-expr}@dots{})
634Bind the symbol @var{ref} to a submatch that matches
635@var{rx-expr}@enddots{}. @var{ref} is bound in @var{body-forms} to
636the string of the submatch or nil, but can also be used in
637@code{backref}.
638
639@item (backref @var{ref})
640Like the standard @code{backref} construct, but @var{ref} can here
641also be a name introduced by a previous @code{(let @var{ref} @dots{})}
642construct.
643@end table
644@end ifnottex
645
621@end table 646@end table
622 647
623@anchor{pcase-example-0} 648@anchor{pcase-example-0}
diff --git a/doc/lispref/elisp.texi b/doc/lispref/elisp.texi
index e18759654d9..c86f7f3dfbf 100644
--- a/doc/lispref/elisp.texi
+++ b/doc/lispref/elisp.texi
@@ -1298,6 +1298,9 @@ Regular Expressions
1298 1298
1299* Syntax of Regexps:: Rules for writing regular expressions. 1299* Syntax of Regexps:: Rules for writing regular expressions.
1300* Regexp Example:: Illustrates regular expression syntax. 1300* Regexp Example:: Illustrates regular expression syntax.
1301@ifnottex
1302* Rx Notation:: An alternative, structured regexp notation.
1303@end ifnottex
1301* Regexp Functions:: Functions for operating on regular expressions. 1304* Regexp Functions:: Functions for operating on regular expressions.
1302 1305
1303Syntax of Regular Expressions 1306Syntax of Regular Expressions
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index ef1cffc446f..f95c9bf976e 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -254,6 +254,9 @@ it easier to verify even very complex regexps.
254@menu 254@menu
255* Syntax of Regexps:: Rules for writing regular expressions. 255* Syntax of Regexps:: Rules for writing regular expressions.
256* Regexp Example:: Illustrates regular expression syntax. 256* Regexp Example:: Illustrates regular expression syntax.
257@ifnottex
258* Rx Notation:: An alternative, structured regexp notation.
259@end ifnottex
257* Regexp Functions:: Functions for operating on regular expressions. 260* Regexp Functions:: Functions for operating on regular expressions.
258@end menu 261@end menu
259 262
@@ -359,6 +362,7 @@ is a postfix operator, similar to @samp{*} except that it must match the
359preceding expression either once or not at all. For example, 362preceding expression either once or not at all. For example,
360@samp{ca?r} matches @samp{car} or @samp{cr}; nothing else. 363@samp{ca?r} matches @samp{car} or @samp{cr}; nothing else.
361 364
365@anchor{Non-greedy repetition}
362@item @samp{*?}, @samp{+?}, @samp{??} 366@item @samp{*?}, @samp{+?}, @samp{??}
363@cindex non-greedy repetition characters in regexp 367@cindex non-greedy repetition characters in regexp
364These are @dfn{non-greedy} variants of the operators @samp{*}, @samp{+} 368These are @dfn{non-greedy} variants of the operators @samp{*}, @samp{+}
@@ -951,6 +955,575 @@ Finally, the last part of the pattern matches any additional whitespace
951beyond the minimum needed to end a sentence. 955beyond the minimum needed to end a sentence.
952@end table 956@end table
953 957
958@ifnottex
959In the @code{rx} notation (@pxref{Rx Notation}), the regexp could be written
960
961@example
962@group
963(rx (any ".?!") ; Punctuation ending sentence.
964 (zero-or-more (any "\"')]@}")) ; Closing quotes or brackets.
965 (or line-end
966 (seq " " line-end)
967 "\t"
968 " ") ; Two spaces.
969 (zero-or-more (any "\t\n "))) ; Optional extra whitespace.
970@end group
971@end example
972
973Since @code{rx} regexps are just S-expressions, they can be formatted
974and commented as such.
975@end ifnottex
976
977@ifnottex
978@node Rx Notation
979@subsection The @code{rx} Structured Regexp Notation
980@cindex rx
981@cindex regexp syntax
982
983 As an alternative to the string-based syntax, Emacs provides the
984structured @code{rx} notation based on Lisp S-expressions. This
985notation is usually easier to read, write and maintain than regexp
986strings, and can be indented and commented freely. It requires a
987conversion into string form since that is what regexp functions
988expect, but that conversion typically takes place during
989byte-compilation rather than when the Lisp code using the regexp is
990run.
991
992 Here is an @code{rx} regexp@footnote{It could be written much
993simpler with non-greedy operators (how?), but that would make the
994example less interesting.} that matches a block comment in the C
995programming language:
996
997@example
998@group
999(rx "/*" ; Initial /*
1000 (zero-or-more
1001 (or (not (any "*")) ; Either non-*,
1002 (seq "*" ; or * followed by
1003 (not (any "/"))))) ; non-/
1004 (one-or-more "*") ; At least one star,
1005 "/") ; and the final /
1006@end group
1007@end example
1008
1009@noindent
1010or, using shorter synonyms and written more compactly,
1011
1012@example
1013@group
1014(rx "/*"
1015 (* (| (not (any "*"))
1016 (: "*" (not (any "/")))))
1017 (+ "*") "/")
1018@end group
1019@end example
1020
1021@noindent
1022In conventional string syntax, it would be written
1023
1024@example
1025"/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
1026@end example
1027
1028The @code{rx} notation is mainly useful in Lisp code; it cannot be
1029used in most interactive situations where a regexp is requested, such
1030as when running @code{query-replace-regexp} or in variable
1031customisation.
1032
1033@menu
1034* Rx Constructs:: Constructs valid in rx forms.
1035* Rx Functions:: Functions and macros that use rx forms.
1036@end menu
1037
1038@node Rx Constructs
1039@subsubsection Constructs in @code{rx} regexps
1040
1041The various forms in @code{rx} regexps are described below. The
1042shorthand @var{rx} represents any @code{rx} form, and @var{rx}@dots{}
1043means one or more @code{rx} forms. Where the corresponding string
1044regexp syntax is given, @var{A}, @var{B}, @dots{} are string regexp
1045subexpressions.
1046@c With the new implementation of rx, this can be changed from
1047@c 'one or more' to 'zero or more'.
1048
1049@subsubheading Literals
1050
1051@table @asis
1052@item @code{"some-string"}
1053Match the string @samp{some-string} literally. There are no
1054characters with special meaning, unlike in string regexps.
1055
1056@item @code{?C}
1057Match the character @samp{C} literally.
1058@end table
1059
1060@subsubheading Sequence and alternative
1061
1062@table @asis
1063@item @code{(seq @var{rx}@dots{})}
1064@cindex @code{seq} in rx
1065@itemx @code{(sequence @var{rx}@dots{})}
1066@cindex @code{sequence} in rx
1067@itemx @code{(: @var{rx}@dots{})}
1068@cindex @code{:} in rx
1069@itemx @code{(and @var{rx}@dots{})}
1070@cindex @code{and} in rx
1071Match the @var{rx}s in sequence. Without arguments, the expression
1072matches the empty string.@*
1073Corresponding string regexp: @samp{@var{A}@var{B}@dots{}}
1074(subexpressions in sequence).
1075
1076@item @code{(or @var{rx}@dots{})}
1077@cindex @code{or} in rx
1078@itemx @code{(| @var{rx}@dots{})}
1079@cindex @code{|} in rx
1080Match exactly one of the @var{rx}s, trying from left to right.
1081Without arguments, the expression will not match anything at all.@*
1082Corresponding string regexp: @samp{@var{A}\|@var{B}\|@dots{}}.
1083@end table
1084
1085@subsubheading Repetition
1086
1087Normally, repetition forms are greedy, in that they attempt to match
1088as many times as possible. Some forms are non-greedy; they try to
1089match as few times as possible (@pxref{Non-greedy repetition}).
1090
1091@table @code
1092@item (zero-or-more @var{rx}@dots{})
1093@cindex @code{zero-or-more} in rx
1094@itemx (0+ @var{rx}@dots{})
1095@cindex @code{0+} in rx
1096Match the @var{rx}s zero or more times. Greedy by default.@*
1097Corresponding string regexp: @samp{@var{A}*} (greedy),
1098@samp{@var{A}*?} (non-greedy)
1099
1100@item (one-or-more @var{rx}@dots{})
1101@cindex @code{one-or-more} in rx
1102@itemx (1+ @var{rx}@dots{})
1103@cindex @code{1+} in rx
1104Match the @var{rx}s one or more times. Greedy by default.@*
1105Corresponding string regexp: @samp{@var{A}+} (greedy),
1106@samp{@var{A}+?} (non-greedy)
1107
1108@item (zero-or-one @var{rx}@dots{})
1109@cindex @code{zero-or-one} in rx
1110@itemx (optional @var{rx}@dots{})
1111@cindex @code{optional} in rx
1112@itemx (opt @var{rx}@dots{})
1113@cindex @code{opt} in rx
1114Match the @var{rx}s once or an empty string. Greedy by default.@*
1115Corresponding string regexp: @samp{@var{A}?} (greedy),
1116@samp{@var{A}??} (non-greedy).
1117
1118@item (* @var{rx}@dots{})
1119@cindex @code{*} in rx
1120Match the @var{rx}s zero or more times. Greedy.@*
1121Corresponding string regexp: @samp{@var{A}*}
1122
1123@item (+ @var{rx}@dots{})
1124@cindex @code{+} in rx
1125Match the @var{rx}s one or more times. Greedy.@*
1126Corresponding string regexp: @samp{@var{A}+}
1127
1128@item (? @var{rx}@dots{})
1129@cindex @code{?} in rx
1130Match the @var{rx}s once or an empty string. Greedy.@*
1131Corresponding string regexp: @samp{@var{A}?}
1132
1133@item (*? @var{rx}@dots{})
1134@cindex @code{*?} in rx
1135Match the @var{rx}s zero or more times. Non-greedy.@*
1136Corresponding string regexp: @samp{@var{A}*?}
1137
1138@item (+? @var{rx}@dots{})
1139@cindex @code{+?} in rx
1140Match the @var{rx}s one or more times. Non-greedy.@*
1141Corresponding string regexp: @samp{@var{A}+?}
1142
1143@item (?? @var{rx}@dots{})
1144@cindex @code{??} in rx
1145Match the @var{rx}s or an empty string. Non-greedy.@*
1146Corresponding string regexp: @samp{@var{A}??}
1147
1148@item (= @var{n} @var{rx}@dots{})
1149@cindex @code{=} in rx
1150@itemx (repeat @var{n} @var{rx})
1151Match the @var{rx}s exactly @var{n} times.@*
1152Corresponding string regexp: @samp{@var{A}\@{@var{n}\@}}
1153
1154@item (>= @var{n} @var{rx}@dots{})
1155@cindex @code{>=} in rx
1156Match the @var{rx}s @var{n} or more times. Greedy.@*
1157Corresponding string regexp: @samp{@var{A}\@{@var{n},\@}}
1158
1159@item (** @var{n} @var{m} @var{rx}@dots{})
1160@cindex @code{**} in rx
1161@itemx (repeat @var{n} @var{m} @var{rx}@dots{})
1162@cindex @code{repeat} in rx
1163Match the @var{rx}s at least @var{n} but no more than @var{m} times. Greedy.@*
1164Corresponding string regexp: @samp{@var{A}\@{@var{n},@var{m}\@}}
1165@end table
1166
1167The greediness of some repetition forms can be controlled using the
1168following constructs. However, it is usually better to use the
1169explicit non-greedy forms above when such matching is required.
1170
1171@table @code
1172@item (minimal-match @var{rx})
1173@cindex @code{minimal-match} in rx
1174Match @var{rx}, with @code{zero-or-more}, @code{0+},
1175@code{one-or-more}, @code{1+}, @code{zero-or-one}, @code{opt} and
1176@code{option} using non-greedy matching.
1177
1178@item (maximal-match @var{rx})
1179@cindex @code{maximal-match} in rx
1180Match @var{rx}, with @code{zero-or-more}, @code{0+},
1181@code{one-or-more}, @code{1+}, @code{zero-or-one}, @code{opt} and
1182@code{option} using non-greedy matching. This is the default.
1183@end table
1184
1185@subsubheading Matching single characters
1186
1187@table @asis
1188@item @code{(any @var{set}@dots{})}
1189@cindex @code{any} in rx
1190@itemx @code{(char @var{set}@dots{})}
1191@cindex @code{char} in rx
1192@itemx @code{(in @var{set}@dots{})}
1193@cindex @code{in} in rx
1194@cindex character class in rx
1195Match a single character from one of the @var{set}s. Each @var{set}
1196is a character, a string representing the set of its characters, a
1197range or a character class (see below). A range is either a
1198hyphen-separated string like @code{"A-Z"}, or a cons of characters
1199like @code{(?A . ?Z)}.
1200
1201Note that hyphen (@code{-}) is special in strings in this construct,
1202since it acts as a range separator. To include a hyphen, add it as a
1203separate character or single-character string.@*
1204Corresponding string regexp: @samp{[@dots{}]}
1205
1206@item @code{(not @var{charspec})}
1207@cindex @code{not} in rx
1208Match a character not included in @var{charspec}. @var{charspec} can
1209be an @code{any}, @code{syntax} or @code{category} form, or a
1210character class.@*
1211Corresponding string regexp: @samp{[^@dots{}]}, @samp{\S@var{code}},
1212@samp{\C@var{code}}
1213
1214@item @code{not-newline}, @code{nonl}
1215@cindex @code{not-newline} in rx
1216@cindex @code{nonl} in rx
1217Match any character except a newline.@*
1218Corresponding string regexp: @samp{.} (dot)
1219
1220@item @code{anything}
1221@cindex @code{anything} in rx
1222Match any character.@*
1223Corresponding string regexp: @samp{.\|\n} (for example)
1224
1225@item character class
1226@cindex character class in rx
1227Match a character from a named character class:
1228
1229@table @asis
1230@item @code{alpha}, @code{alphabetic}, @code{letter}
1231Match alphabetic characters. More precisely, match characters whose
1232Unicode @samp{general-category} property indicates that they are
1233alphabetic.
1234
1235@item @code{alnum}, @code{alphanumeric}
1236Match alphabetic characters and digits. More precisely, match
1237characters whose Unicode @samp{general-category} property indicates
1238that they are alphabetic or decimal digits.
1239
1240@item @code{digit}, @code{numeric}, @code{num}
1241Match the digits @samp{0}--@samp{9}.
1242
1243@item @code{xdigit}, @code{hex-digit}, @code{hex}
1244Match the hexadecimal digits @samp{0}--@samp{9}, @samp{A}--@samp{F}
1245and @samp{a}--@samp{f}.
1246
1247@item @code{cntrl}, @code{control}
1248Match any character whose code is in the range 0--31.
1249
1250@item @code{blank}
1251Match horizontal whitespace. More precisely, match characters whose
1252Unicode @samp{general-category} property indicates that they are
1253spacing separators.
1254
1255@item @code{space}, @code{whitespace}, @code{white}
1256Match any character that has whitespace syntax
1257(@pxref{Syntax Class Table}).
1258
1259@item @code{lower}, @code{lower-case}
1260Match anything lower-case, as determined by the current case table.
1261If @code{case-fold-search} is non-nil, this also matches any
1262upper-case letter.
1263
1264@item @code{upper}, @code{upper-case}
1265Match anything upper-case, as determined by the current case table.
1266If @code{case-fold-search} is non-nil, this also matches any
1267lower-case letter.
1268
1269@item @code{graph}, @code{graphic}
1270Match any character except whitespace, @acronym{ASCII} and
1271non-@acronym{ASCII} control characters, surrogates, and codepoints
1272unassigned by Unicode, as indicated by the Unicode
1273@samp{general-category} property.
1274
1275@item @code{print}, @code{printing}
1276Match whitespace or a character matched by @code{graph}.
1277
1278@item @code{punct}, @code{punctuation}
1279Match any punctuation character. (At present, for multibyte
1280characters, anything that has non-word syntax.)
1281
1282@item @code{word}, @code{wordchar}
1283Match any character that has word syntax (@pxref{Syntax Class Table}).
1284
1285@item @code{ascii}
1286Match any @acronym{ASCII} character (codes 0--127).
1287
1288@item @code{nonascii}
1289Match any non-@acronym{ASCII} character (but not raw bytes).
1290@end table
1291
1292Corresponding string regexp: @samp{[[:@var{class}:]]}
1293
1294@item @code{(syntax @var{syntax})}
1295@cindex @code{syntax} in rx
1296Match a character with syntax @var{syntax}, being one of the following
1297names:
1298
1299@multitable {@code{close-parenthesis}} {Syntax character}
1300@headitem Syntax name @tab Syntax character
1301@item @code{whitespace} @tab @code{-}
1302@item @code{punctuation} @tab @code{.}
1303@item @code{word} @tab @code{w}
1304@item @code{symbol} @tab @code{_}
1305@item @code{open-parenthesis} @tab @code{(}
1306@item @code{close-parenthesis} @tab @code{)}
1307@item @code{expression-prefix} @tab @code{'}
1308@item @code{string-quote} @tab @code{"}
1309@item @code{paired-delimiter} @tab @code{$}
1310@item @code{escape} @tab @code{\}
1311@item @code{character-quote} @tab @code{/}
1312@item @code{comment-start} @tab @code{<}
1313@item @code{comment-end} @tab @code{>}
1314@item @code{string-delimiter} @tab @code{|}
1315@item @code{comment-delimiter} @tab @code{!}
1316@end multitable
1317
1318For details, @pxref{Syntax Class Table}. Please note that
1319@code{(syntax punctuation)} is @emph{not} equivalent to the character class
1320@code{punctuation}.@*
1321Corresponding string regexp: @samp{\s@var{code}}
1322
1323@item @code {(category @var{category})}
1324@cindex @code{category} in rx
1325Match a character in category @var{category}, which is either one of
1326the names below or its category character.
1327
1328@multitable {@code{vowel-modifying-diacritical-mark}} {Category character}
1329@headitem Category name @tab Category character
1330@item @code{space-for-indent} @tab space
1331@item @code{base} @tab @code{.}
1332@item @code{consonant} @tab @code{0}
1333@item @code{base-vowel} @tab @code{1}
1334@item @code{upper-diacritical-mark} @tab @code{2}
1335@item @code{lower-diacritical-mark} @tab @code{3}
1336@item @code{tone-mark} @tab @code{4}
1337@item @code{symbol} @tab @code{5}
1338@item @code{digit} @tab @code{6}
1339@item @code{vowel-modifying-diacritical-mark} @tab @code{7}
1340@item @code{vowel-sign} @tab @code{8}
1341@item @code{semivowel-lower} @tab @code{9}
1342@item @code{not-at-end-of-line} @tab @code{<}
1343@item @code{not-at-beginning-of-line} @tab @code{>}
1344@item @code{alpha-numeric-two-byte} @tab @code{A}
1345@item @code{chinese-two-byte} @tab @code{C}
1346@item @code{greek-two-byte} @tab @code{G}
1347@item @code{japanese-hiragana-two-byte} @tab @code{H}
1348@item @code{indian-two-byte} @tab @code{I}
1349@item @code{japanese-katakana-two-byte} @tab @code{K}
1350@item @code{strong-left-to-right} @tab @code{L}
1351@item @code{korean-hangul-two-byte} @tab @code{N}
1352@item @code{strong-right-to-left} @tab @code{R}
1353@item @code{cyrillic-two-byte} @tab @code{Y}
1354@item @code{combining-diacritic} @tab @code{^}
1355@item @code{ascii} @tab @code{a}
1356@item @code{arabic} @tab @code{b}
1357@item @code{chinese} @tab @code{c}
1358@item @code{ethiopic} @tab @code{e}
1359@item @code{greek} @tab @code{g}
1360@item @code{korean} @tab @code{h}
1361@item @code{indian} @tab @code{i}
1362@item @code{japanese} @tab @code{j}
1363@item @code{japanese-katakana} @tab @code{k}
1364@item @code{latin} @tab @code{l}
1365@item @code{lao} @tab @code{o}
1366@item @code{tibetan} @tab @code{q}
1367@item @code{japanese-roman} @tab @code{r}
1368@item @code{thai} @tab @code{t}
1369@item @code{vietnamese} @tab @code{v}
1370@item @code{hebrew} @tab @code{w}
1371@item @code{cyrillic} @tab @code{y}
1372@item @code{can-break} @tab @code{|}
1373@end multitable
1374
1375For more information about currently defined categories, run the
1376command @kbd{M-x describe-categories @key{RET}}. For how to define
1377new categories, @pxref{Categories}.@*
1378Corresponding string regexp: @samp{\c@var{code}}
1379@end table
1380
1381@subsubheading Zero-width assertions
1382
1383These all match the empty string, but only in specific places.
1384
1385@table @asis
1386@item @code{line-start}, @code{bol}
1387@cindex @code{line-start} in rx
1388@cindex @code{bol} in rx
1389Match at the beginning of a line.@*
1390Corresponding string regexp: @samp{^}
1391
1392@item @code{line-end}, @code{eol}
1393@cindex @code{line-end} in rx
1394@cindex @code{eol} in rx
1395Match at the end of a line.@*
1396Corresponding string regexp: @samp{$}
1397
1398@item @code{string-start}, @code{bos}, @code{buffer-start}, @code{bot}
1399@cindex @code{string-start} in rx
1400@cindex @code{bos} in rx
1401@cindex @code{buffer-start} in rx
1402@cindex @code{bot} in rx
1403Match at the start of the string or buffer being matched against.@*
1404Corresponding string regexp: @samp{\`}
1405
1406@item @code{string-end}, @code{eos}, @code{buffer-end}, @code{eot}
1407@cindex @code{string-end} in rx
1408@cindex @code{eos} in rx
1409@cindex @code{buffer-end} in rx
1410@cindex @code{eot} in rx
1411Match at the end of the string or buffer being matched against.@*
1412Corresponding string regexp: @samp{\'}
1413
1414@item @code{point}
1415@cindex @code{point} in rx
1416Match at point.@*
1417Corresponding string regexp: @samp{\=}
1418
1419@item @code{word-start}
1420@cindex @code{word-start} in rx
1421Match at the beginning of a word.@*
1422Corresponding string regexp: @samp{\<}
1423
1424@item @code{word-end}
1425@cindex @code{word-end} in rx
1426Match at the end of a word.@*
1427Corresponding string regexp: @samp{\>}
1428
1429@item @code{word-boundary}
1430@cindex @code{word-boundary} in rx
1431Match at the beginning or end of a word.@*
1432Corresponding string regexp: @samp{\b}
1433
1434@item @code{not-word-boundary}
1435@cindex @code{not-word-boundary} in rx
1436Match anywhere but at the beginning or end of a word.@*
1437Corresponding string regexp: @samp{\B}
1438
1439@item @code{symbol-start}
1440@cindex @code{symbol-start} in rx
1441Match at the beginning of a symbol.@*
1442Corresponding string regexp: @samp{\_<}
1443
1444@item @code{symbol-end}
1445@cindex @code{symbol-end} in rx
1446Match at the end of a symbol.@*
1447Corresponding string regexp: @samp{\_>}
1448@end table
1449
1450@subsubheading Capture groups
1451
1452@table @code
1453@item (group @var{rx}@dots{})
1454@cindex @code{group} in rx
1455@itemx (submatch @var{rx}@dots{})
1456@cindex @code{submatch} in rx
1457Match the @var{rx}s, making the matched text and position accessible
1458in the match data. The first group in a regexp is numbered 1;
1459subsequent groups will be numbered one higher than the previous
1460group.@*
1461Corresponding string regexp: @samp{\(@dots{}\)}
1462
1463@item (group-n @var{n} @var{rx}@dots{})
1464@cindex @code{group-n} in rx
1465@itemx (submatch-n @var{n} @var{rx}@dots{})
1466@cindex @code{submatch-n} in rx
1467Like @code{group}, but explicitly assign the group number @var{n}.
1468@var{n} must be positive.@*
1469Corresponding string regexp: @samp{\(?@var{n}:@dots{}\)}
1470
1471@item (backref @var{n})
1472@cindex @code{backref} in rx
1473Match the text previously matched by group number @var{n}.
1474@var{n} must be in the range 1--9.@*
1475Corresponding string regexp: @samp{\@var{n}}
1476@end table
1477
1478@subsubheading Dynamic inclusion
1479
1480@table @code
1481@item (literal @var{expr})
1482@cindex @code{literal} in rx
1483Match the literal string that is the result from evaluating the Lisp
1484expression @var{expr}. The evaluation takes place at call time, in
1485the current lexical environment.
1486
1487@item (regexp @var{expr})
1488@cindex @code{regexp} in rx
1489@itemx (regex @var{expr})
1490@cindex @code{regex} in rx
1491Match the string regexp that is the result from evaluating the Lisp
1492expression @var{expr}. The evaluation takes place at call time, in
1493the current lexical environment.
1494
1495@item (eval @var{expr})
1496@cindex @code{eval} in rx
1497Match the rx form that is the result from evaluating the Lisp
1498expression @var{expr}. The evaluation takes place at macro-expansion
1499time for @code{rx}, at call time for @code{rx-to-string},
1500in the current global environment.
1501@end table
1502
1503@node Rx Functions
1504@subsubsection Functions and macros using @code{rx} regexps
1505
1506@defmac rx rx-expr@dots{}
1507Translate the @var{rx-expr}s to a string regexp, as if they were the
1508body of a @code{(seq @dots{})} form. The @code{rx} macro expands to a
1509string constant, or, if @code{literal} or @code{regexp} forms are
1510used, a Lisp expression that evaluates to a string.
1511@end defmac
1512
1513@defun rx-to-string rx-expr &optional no-group
1514Translate @var{rx-expr} to a string regexp which is returned.
1515If @var{no-group} is absent or nil, bracket the result in a
1516non-capturing group, @samp{\(?:@dots{}\)}, if necessary to ensure that
1517a postfix operator appended to it will apply to the whole expression.
1518
1519Arguments to @code{literal} and @code{regexp} forms in @var{rx-expr}
1520must be string literals.
1521@end defun
1522
1523The @code{pcase} macro can use @code{rx} expressions as patterns
1524directly; @pxref{rx in pcase}.
1525@end ifnottex
1526
954@node Regexp Functions 1527@node Regexp Functions
955@subsection Regular Expression Functions 1528@subsection Regular Expression Functions
956 1529