aboutsummaryrefslogtreecommitdiffstats
path: root/src
diff options
context:
space:
mode:
authorEli Zaretskii2013-12-05 22:59:23 +0200
committerEli Zaretskii2013-12-05 22:59:23 +0200
commit0cd7a14e577cae9c0713d1cfa549cfca3f0ca06c (patch)
tree31828931b3b0e4b1bf40e68e96f2601967c1266f /src
parenta22205d67caa1cbf666a703d2bc26afd5a2704b6 (diff)
downloademacs-0cd7a14e577cae9c0713d1cfa549cfca3f0ca06c.tar.gz
emacs-0cd7a14e577cae9c0713d1cfa549cfca3f0ca06c.zip
Added commentary about the overall design and its limitations.
Diffstat (limited to 'src')
-rw-r--r--src/w32.c92
1 files changed, 92 insertions, 0 deletions
diff --git a/src/w32.c b/src/w32.c
index 7d1ebebc68b..47c4f04b152 100644
--- a/src/w32.c
+++ b/src/w32.c
@@ -1290,6 +1290,98 @@ w32_valid_pointer_p (void *p, int size)
1290 1290
1291 1291
1292 1292
1293/* Here's an overview of how the Windows build supports file names
1294 that cannot be encoded by the current system codepage.
1295
1296 From the POV of Lisp and layers of C code above the functions here,
1297 Emacs on Windows pretends that its file names are encoded in UTF-8;
1298 see encode_file and decode_file on coding.c. Any file name that is
1299 passed as a unibyte string to C functions defined here is assumed
1300 to be in UTF-8 encoding. Any file name returned by functions
1301 defined here must be in UTF-8 encoding, with only a few exceptions
1302 reserved for a couple of special cases. (Be sure to use
1303 MAX_UTF8_PATH for char arrays that store UTF-8 encoded file names,
1304 as they can be much longer than MAX_PATH!)
1305
1306 The UTF-8 encoded file names cannot be passed to system APIs, as
1307 Windows does not support that. Therefore, they are converted
1308 either to UTF-16 or to the ANSI codepage, depending on the value of
1309 w32-unicode-filenames, before calling any system APIs or CRT library
1310 functions. The default value of that variable is determined by the
1311 OS on which Emacs runs: nil on Windows 9X and t otherwise, but the
1312 user can change that default (although I don't see why would she
1313 want to).
1314
1315 The 4 functions defined below, filename_to_utf16, filename_to_ansi,
1316 filename_from_utf16, and filename_from_ansi, are the workhorses of
1317 these conversions. They rely on Windows native APIs
1318 MultiByteToWideChar and WideCharToMultiByte; we cannot use
1319 functions from coding.c here, because they allocate memory, which
1320 is a bad idea on the level of libc, which is what the functions
1321 here emulate. (If you worry about performance due to constant
1322 conversion back and forth from UTF-8 to UTF-16, then don't: first,
1323 it was measured to take only a few microseconds on a not-so-fast
1324 machine, and second, that's exactly what the ANSI APIs we used
1325 before do anyway, because they are just thin wrappers around the
1326 Unicode APIs.)
1327
1328 The variables file-name-coding-system and default-file-name-coding-system
1329 still exist, but are actually used only when a file name needs to
1330 be converted to the ANSI codepage. This happens all the time when
1331 w32-unicode-filenames is nil, but can also happen from time to time
1332 when it is t. Otherwise, these variables have no effect on file-name
1333 encoding when w32-unicode-filenames is t; this is similar to
1334 selection-coding-system.
1335
1336 This arrangement works very well, but it has a few gotchas:
1337
1338 . Lisp code that encodes or decodes file names manually should
1339 normally use 'utf-8' as the coding-system on Windows,
1340 disregarding file-name-coding-system. This is a somewhat
1341 unpleasant consequence, but it cannot be avoided. Fortunately,
1342 very few Lisp packages need to do that.
1343
1344 More generally, passing to library functions (e.g., fopen or
1345 opendir) file names already encoded in the ANSI codepage is
1346 explictly *verboten*, as all those functions, as shadowed and
1347 emulated here, assume they will receive UTF-8 encoded file names.
1348
1349 For the same reasons, no CRT function or Win32 API can be called
1350 directly in Emacs sources, without either converting the file
1351 name sfrom UTF-8 to either UTF-16 or ANSI codepage, or going
1352 through some shadowing function defined here.
1353
1354 . File names passed to external libraries, like the image libraries
1355 and GnuTLS, need special handling. These libraries generally
1356 don't support UTF-16 or UTF-8 file names, so they must get file
1357 names encoded in the ANSI codepage. To facilitate using these
1358 libraries with file names that are not encodable in the ANSI
1359 codepage, use the function ansi_encode_filename, which will try
1360 to use the short 8+3 alias of a file name if that file name is
1361 not encodable in the ANSI codepage. See image.c and gnutls.c for
1362 examples of how this should be done.
1363
1364 . Running subprocesses in non-ASCII directories and with non-ASCII
1365 file arguments is limited to the current codepage (even though
1366 Emacs is perfectly capable of finding an executable program file
1367 even in a directory whose name cannot be encoded in the curreent
1368 codepage). This is because the command-line arguments are
1369 encoded _before_ they get to the w32-specific level, and the
1370 encoding is not known in advance (it doesn't have to be the
1371 current ANSI codepage), so w32proc.c functions cannot re-encode
1372 them in UTF-16. This should be fixed, but will also require
1373 changes in cmdproxy. The current limitation is not terribly bad
1374 anyway, since very few, if any, Windows console programs that are
1375 likely to be invoked by Emacs support UTF-16 encoded command
1376 lines.
1377
1378 . For similar reasons, server.el and emacsclient are also limited
1379 to the current ANSI codepage for now.
1380
1381*/
1382
1383
1384
1293/* Converting file names from UTF-8 to either UTF-16 or the ANSI 1385/* Converting file names from UTF-8 to either UTF-16 or the ANSI
1294 codepage defined by file-name-coding-system. */ 1386 codepage defined by file-name-coding-system. */
1295 1387