aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKen Raeburn2017-05-30 04:45:56 -0400
committerKen Raeburn2017-07-31 01:12:54 -0400
commitcd0966b33c1fe975520e85e0e7af82c09e4754dc (patch)
tree38b5e45900a470123bee96b2be96783022190b76
parentf6793d25e8d6c2597d37d9fa65bdcb66cce8fcbe (diff)
downloademacs-cd0966b33c1fe975520e85e0e7af82c09e4754dc.tar.gz
emacs-cd0966b33c1fe975520e85e0e7af82c09e4754dc.zip
; admin/notes/big-elc: Notes on this experimental branch.
-rw-r--r--admin/notes/big-elc313
1 files changed, 313 insertions, 0 deletions
diff --git a/admin/notes/big-elc b/admin/notes/big-elc
new file mode 100644
index 00000000000..c63e84da731
--- /dev/null
+++ b/admin/notes/big-elc
@@ -0,0 +1,313 @@
1“Big elc file” startup approach -*- mode: org; coding: utf-8 -*-
2
3These notes discuss the design and implementation status of the “big
4elc file” approach for saving and loading the Lisp environment.
5
6* Justification
7
8The original discussion in which the idea arose was on the possible
9elimination of the “unexec” mechanism, which is troublesome to
10maintain.
11
12The CANNOT_DUMP support, when it isn’t suffering bit-rot, does allow
13for loading all of the Lisp code from scratch at startup. However,
14doing so is rather slow.
15
16Stefan Monnier suggested (and implemented) loading the Lisp
17environment via loadup.el, as we do now in the “unexec” world, and
18writing out a single Lisp file with all of the resulting function and
19variable settings in it. Then a normal Emacs invocation can load this
20one Lisp file, instead of dozens, and complex data structures can
21simply be read, instead of constructed at run time.
22
23It turned out to be desirable for a couple of others to be loaded at
24run time as well, but the one big file loads most of the settings.
25
26* Implementation
27
28** Saving the Lisp environment
29
30In loadup.el, we iterate over the obarray, collecting names of faces
31and coding systems and such for later processing. Each symbol’s
32function, variable, and property values get turned into the
33appropriate fset, set-default, or setplist calls. Calls to defvar and
34make-variable-buffer-local may be generated as well. The resulting
35forms are all emitted as part of one large “progn” form, so that the
36print-circle support can correctly cross-link references to objects in
37a way that the reader will reconstruct.
38
39A few variables are explicitly skipped because they’re in use during
40the read process, or they’re intended to be reinitialized when emacs
41starts up. Some others are skipped for now because they’re not
42printable objects.
43
44Most of the support for the unexec path is present, but ignored or
45commented out. This keeps diffs (and merging) simpler.
46
47*** charsets, coding systems, and faces
48
49Some changes to charset and coding system support were made so that
50when a definition is created for a new name, a property gets attached
51to the symbol with the relevant parameters so that we can write out
52enough information to reconstruct the definition after reading it
53back.
54
55After the main definitions are written out, we emit additional forms
56to fix up charset definitions, face specs, and so on. These don’t
57have to worry about cross-linked data structures, so breaking them out
58into separate forms keeps things simpler.
59
60*** deferred loading
61
62The standard category table is huge if written out, so we load
63international/characters indirectly via dumped.elc instead. We could
64perhaps suppress the variables and functions defined in
65international/characters from being output with the rest of the Lisp
66environment. That information should be available via the load
67history. We would be assuming that no other loaded Lisp code alters
68the variables’ values; any modified function values will be overridden
69by the defalias calls.
70
71Advice attached to a subr can’t be written out and read back in
72because of the “#<subr...>” syntax; uniquify attaches advice to
73rename-buffer, so loading of uniquify is deferred until loading
74dumped.elc, or until we’ve determined that we’re not dumping at all.
75
76*** efficient symbol reading
77
78The symbol parser is not terribly fast. It reads one character at a
79time (which involves reading one or more bytes, and figuring out the
80possible encoding of a multibyte character) and figuring out where the
81end of the symbol is; then the obarray needs to be scanned to see if
82the symbol is already present.
83
84It turns out that the “#N#” processing is faster. So now there’s a
85new option to the printer that will use this form for symbols that
86show up more than once. Parsing “#24#” and doing the hash table
87lookup works out better than parsing “setplist” and scanning the
88obarray over and over, though it makes it much harder for a human to
89read.
90
91** Loading the Lisp environment
92
93The default action to invoke on startup is now to load
94“../src/dumped.elc”. For experimentation that name works fine, but
95for installation it’ll probably be something like just “dumped.elc”,
96found via the load path.
97
98New primitives are needed to deal with Emacs data that is not purely
99Lisp data structures:
100
101 + internal--set-standard-syntax-table
102 + define-charset-internal
103 + define-coding-system-internal
104
105*** Speeding up the reader
106
107Reading a very large Lisp file (over a couple of megabytes) is still
108slow.
109
110While it seems unavoidable that loading a Lisp environment at run time
111will be at least slightly slower than having that environment be part
112of the executable image when the process is launched, we want to keep
113the process startup time acceptably fast. (No, that’s not a precisely
114defined goal.)
115
116So, a few changes have been made to speed up reading the large Lisp
117file. Some of them may be generally applicable, even if the
118big-elc-file approach isn’t adopted. Others may be too specific to
119this use case to warrant the additional code.
120
121 + Avoiding substitution recursion for #N# forms when the new object
122 is a cons cell.
123 + Using hash tables instead of lists for forms to substitute.
124 + Avoiding circular object checks in some cases.
125 + Handle substituting into a list iteratively instead of
126 recursively. (This one was more about making performance analysis
127 easier for certain tools than directly improving performance.)
128 + Special-case reading from a file. Avoid repeated checks of the
129 type of input source and associated dispatching to appropriate
130 support routines, and hard-code the file-based calls. Streamline
131 the input blocking and unblocking.
132 + Avoid string allocation when reading symbols already in the
133 obarray.
134
135* Open Issues
136
137** CANNOT_DUMP, purify-flag
138
139The branch has been rebased onto a recent enough “master” version that
140CANNOT_DUMP works fairly well on GNU/Linux systems. The branch has
141now been updated to set CANNOT_DUMP unconditionally, to disable the
142unexec code. As long as dumped.elc does all the proper initialization
143like the old loadup.el did, that should work well.
144
145The regular CANNOT_DUMP build does not work on mac OS, at least in the
146otherwise-normal Nextstep, self-contained-app mode; it seems to be a
147load-path problem. See bug #27760.
148
149Some code still looks at purify-flag, including eval.c requiring that
150it be nil when autoloading. So we still let the big progn set its
151value.
152
153** Building and bootstrapping
154
155The bootstrap process assumes it needs to build the emacs executable
156twice, with different environments based on whether stuff has been
157byte-compiled.
158
159In this branch, the executables should be the same, but the dumped
160Lisp files will be different. Ideally we should build the executable
161only once, and dump out different environment files. Possibly this
162means that instead of “bootstrap-emacs” we should invoke something
163like:
164
165 ../path/to/emacs --no-loadup -l ../path/to/bootstrap-dump.elc ...
166
167It might also make sense for bootstrap-dump.elc to include the byte
168compiler, and to byte-compile the byte compiler (and other
169COMPILE_FIRST stuff) in memory before dumping.
170
171Re-examine whether the use of build numbers makes sense, if we’re not
172rewriting the executable image.
173
174** installation
175
176Installing this version of Emacs hasn’t been tested much.
177
178** offset builds (srcdir=… or /path/to/configure …)
179
180Builds outside of the source tree (where srcdir is not the root of the
181build tree) have not been tested much, and don’t currently work.
182
183The first problem, at least while bootstrapping: “../src/dumped.elc”
184is relative to $lispdir which is in the source tree, so Emacs doesn’t
185find the dumped.elc file that’s in the build tree.
186
187Moving dumped.elc under $lispdir would be inappropriate since the
188directory is in the source tree and the file content is specific to
189the configuration being built. We could create a “lisp” directory in
190the build tree and write dumped.elc there, but since we don’t
191currently have such a directory, that’ll mean some changes to the load
192path computation, which is already pretty messy.
193
194** Unhandled aspects of environment saving
195
196*** unprintable objects
197
198global-buffers-menu-map has cdr slot set to nil, but this seems to get
199fixed up at run time, so simply omitting it may be okay.
200
201advertised-signature-table has several subr entries. Perhaps we could
202filter those out, dump the rest, and then emit additional code to
203fetch the subr values via their symbol names and insert them into the
204hash after its initial creation.
205
206Markers and overlays that aren’t associated with buffers are replaced
207with newly created ones. This only works for variables with these
208objects as their values; markers or overlays contained within lists or
209elsewhere wouldn’t be fixed up, and any sharing of these objects would
210be lost, but there don’t appear to be any such cases.
211
212Any obarrays will be dumped in an incomplete form. We can’t
213distinguish them from vectors that contain symbols and zeros.
214(Possible fix someday: Make obarrays their own type.) As a special
215case of this, though, we do look for abbrev tables, and generate code
216to recreate them at load time.
217
218*** make-local-variable
219
220Different flavors of locally-bound variables are hard to distinguish
221and may not all be saved properly.
222
223*** defvaralias
224
225For variable aliases, we emit a defvaralias command and skip the
226default-value processing; we keep the property list processing and the
227rest. Is there anything else that needs to be changed?
228
229*** documentation strings
230
231We call Snarf-documentation at load time, because it’s the only way to
232get documentation pointers for Lisp subrs loaded. That may be
233addressable in other ways, but for the moment it’s outside the scope
234of this branch.
235
236Since we do call Snarf-documentation at load time, we can remove the
237doc strings in DOC from dumped.elc, but we have to be a little careful
238because not all of the pre-loaded Lisp doc strings wind up in DOC.
239The easy way to do that, of course, is to scan DOC and, for each doc
240entry we find, remove the documentation from the live Lisp data before
241dumping. So, Snarf-documentation now takes an optional argument to
242tell it to do that; that cut about 22% of the size of dumped.elc at
243the time.
244
245There are still a bunch of doc strings winding up in dumped.elc from
246various sources; see bug #27748. (Not mentioned in the bug report:
247Compiled lambda forms get “(fn N)” style doc strings in their bytecode
248representations too. But because we key on function names, there’s no
249way to accomodate them in the DOC file.)
250
251*** locations of definitions
252
253C-h v shows variables as having been defined by dumped.elc, not by the
254original source file.
255
256** coding system definitions
257
258We repeatedly iterate over coding system names, trying to reload each
259definition, and postponing those that fail. We should be able to work
260out the dependencies between them and construct an order that requires
261only one pass. (Is it worth it?)
262
263Fix coding-system-list; it seems to have duplicates now.
264
265** error reporting
266
267If dumped.elc can’t be found, Emacs will quietly exit with exit
268code 42. Unfortunately, when running in X mode, it’s difficult for
269Lisp code to print any messages to standard error when quitting. But
270we need to quit, at least in tty mode (do we in X mode?), because
271interactive usage requires some definitions provided only by the Lisp
272environment.
273
274** garbage collection
275
276The dumped .elc file contains a very large Lisp form with most of the
277definitions in it. Causing the garbage collector to always be invoked
278during startup guarantees some minimum additional delay before the
279user will be able to interact with Emacs.
280
281More clever heuristics for when to do GC are probably possible, but
282outside the scope of this branch. For now, gc-cons-threshold has been
283raised, arbitrarily, to a value that seems to allow for loading
284“dumped.elc” on GNU/Linux without GC during or immediately after.
285
286** load path setting
287
288Environment variable support may be broken.
289
290** little niceties
291
292Maybe we should rename the file, so that we display “Loading
293lisp-environment...” during startup.
294
295** bugs?
296
297The default value of charset-map-path is set based on the build tree
298(or source tree?), so reverting via customize would probably result in
299a bogus value. This bug exists in the master version as well when
300using unexec; in CANNOT_DUMP mode (when the Lisp code is only loaded
301from the installed tree) it doesn’t seem to be a problem.
302
303** other changes
304
305Dropped changes from previous revisions due to merge conflicts; may
306reinstate later:
307
308 + In lread.c, substitute in cons iteratively (on “cdr” slot) instead
309 of recursively.
310 + In lread.c, change “seen” list to hash table.
311 + In lread.c, add a separate read1 loop specialized for file reading,
312 with input blocking manipulated only when actually reading from the
313 file, not when just pulling the next byte from a buffer.