% pmjintro.tex --- a brief introduction to poor man's Japanese \parskip2pt plus 4pt minus 1.5pt \hsize=5in\vsize=7in\hoffset=1in\voffset=1in \font\logo=logo10 \input pmJ \def\pmJ{{\tt pmJ}{\jfcG\char252\jfcb\char220\jfbI\char236}} % uhhm, we are defining and using \pmJ outside a \beginJapanese % environment, that is why we are spelling it out that way % unless you may have a conflicting language in use elsewhere in the % document, you can make one big \beginJapanese . . . \endJapanese % surrounding everything. \def\mf{{\logo METAFONT}} \font\bigfont=cmr17 \font\russ=wncyr10 \centerline{{\bigfont Poor Man's Japanese in \TeX}} \vskip.15in \centerline{Thomas Ridgeway} \centerline{Humanities and Arts Computing Center} \centerline{University of Washington} \vskip.1in This little document is intended as an illustration of and preliminary guide to {\it poor man's Japanese} [\pmJ]. \pmJ\ is a temporary expedient for printing texts with Japanese via \TeX\ until such time as we have a well-thought-out system (and fonts) for handling Japanese. \footnote{@}{In J\TeX\ there exists such a system, but for many individuals and institutions the relative cost of fonts/unavailability of free fonts has made installing J\TeX\ not seem advantageous. It is also inconvenient to have to install another separate \TeX\ system for the purpose unless you are doing Japanese in a big way.} \pmJ\ has a small number of virtues; among these are: \item{$\bullet$}\pmJ\ is available now \item{$\bullet$}\pmJ\ is available free of charge \item{$\bullet$}\pmJ\ works with standard \TeX3.x \item{$\bullet$}\pmJ\ is device-independent (although it must be admitted that you will get relatively more pleasing results on less sophisticated printers) \item{$\bullet$}\pmJ\ is sufficiently simple-minded you will not need to be a rocket scientist to make modifications for your own use \item{$\bullet$}\pmJ\ uses fonts mechanically faked-up from dot matrix quality fonts; if you want you can incorporate new characters of your own design using a bitmap font editor which {\it may} be easier to use than \mf.\footnote{\ddag}{Need we point out that the presence of user-defined character codes in a document will make the {\tt .tex} file non-standard? Such a document will not generate correct output on a system which does not have the user-defined characters defined (!or has {\it different} user-defined characters defined!).} {\narrower For those who might be previously unacquainted with \TeX, let us put forth a few of the advantages of \TeX\ which are inherited by \pmJ: it can run on almost any modern computer; it can print to any of a large number of printers/typesetters; it can print (if desired and suitably instructed) elaborate formatting of the text; it is available at very low cost, or free (which is an instance of very low cost). }%end narrower The disadvantages of \pmJ\ are at least equally compelling: \item{$\bullet$}\pmJ\ is crude and unlikely to ever be greatly improved as regards the quality of its Japanese. \item{$\bullet$}\pmJ\ uses emulation-in-\mf\ techniques to produce a dot-matrix font on your output device. No matter how talented your printer may be, the Japanese output by \pmJ\ will looked like it was stripped in from a dot-matrix printer. \item{$\bullet$}\pmJ\ uses a large number of fonts which occupy a finite and measurable quantity of diskspace for each font. Your screen previewer had {\it better} be able to use the same resolution your printer uses, or you may experience a little tightness around the disk. You probably will not want the fonts in multiple magsteps (to a not inconsiderable extent they will just get uglier if you enlarge them). \item{$\bullet$}\pmJ\ does not do slanted, bold, or other fancy Japanese although one could in principle. However, insofar as even a minimal installation requires 88 fonts {\it for Japanese alone}, we must at this point quote The Master: {\narrower\narrower the format of a {\it char\_node} allows for up to 256 different fonts and up to 256 characters per font; $\ldots$ \TeX\ intended for oriental languages will need even more than 256 $\times$ 256 possible characters when we consider different sizes and styles of type.\footnote{*}{Donald E. Knuth, {\it \TeX : The Program}. Addison Wesley, Reading MA: 1986. (Volume B of {\it Computers and Typesetting}). \S 134, p.~57.\par To ease somewhat the problem of the number of fonts used, the wjisxx fonts which \pmJ\ uses are generated with a $\backslash$magstep2 variant included in the same font. The enlarged version will be printed any time we have signalled $\backslash$bigJtrue. I.e., ordinary \beginJapanese ^^b2^^a5 versus \bigJtrue ^^b2^^a5.\endJapanese} }%end narrower \item{$\bullet$}\pmJ\ is based on \TeX\ (hooray!), not on J\TeX\ (hooray!) nor any not-known-to-me to-be-extant Foo\TeX\ (huh?) which might know how to set type vertically, handle thousands of fonts, make doughnuts, and do all the other things that we might want software to do. \pmJ, therefore, does not set type vertically, handle thousands of fonts, emulate Spads or Spitfires, or any of the other real neat stuff. \item{$\bullet$}\pmJ\ may take a long time to set up if you need to make your own fonts. Preparing {\it one} set of fonts on a NeXT computer for use at 400 dpi required a \mf\ run of over 37 hours.% %Normally the fontmaking process attempts to generate one font at a time. The \mf\ source code %files for the fonts occupied 17 megabytes of disk space when all were %stored. \footnote{\dag}{If one looks at the simple-minded brute-force \mf\ code which the \pmJ\ fonts use, one may roughly estimate that a staggering number of computations (probably even more than eight) are needed to emulate a dot-matrix font through this method. There might, indeed, be a better way to do it. On the other hand, Pac-Man worldwide has consumed several hundred million times the number of CPU cycles consumed by compiling \pmJ\ fonts, so we are not exactly talking about wasting a scarce resource, are we? Some of you thought I was going to slam NeXTs here, didn't you? I'm not that kind of guy.} You will not ordinarily wish to make up the fonts more than once, assuming a relative absence of masochism in your personality makeup, nor will you probably want to keep the \mf\ source code permanently {\it since you can mechanically regenerate it.} \bigskip \centerline{How to use \pmJ} \medskip Assume a text editor capable of editing text in Shift-JIS-encoded Japanese characters. Assume a text editor capable of editing text in ``normal'' Latin characters. Use the first text editor to prepare a plain text file containing the Japanese text; {\it plain text file} means that there are no tricky codes in the text file which are intended to be used only by the program that produced the file---``ASCII file'' frequently conveys what we mean by ``plain text file'', but by virtue of having encoded Japanese, your file will not be ASCII. Use the second text editor to enter the Latin-letter strings providing instructions to \TeX; (if your first text editor is o.k. for Latin too, you can continue using it for this step! --- but watch out, the JIS and Shift-JIS encoding schemes have some Latin characters (graphemes) included in the two-byte code space: we need ``normal'' Latin (one-byte coded) text). On a separate line before the first line of Japanese, put a line saying ``$\backslash$input pmJ $\backslash$beginJapanese''.\footnote{**}{We {\it really} do need to have the $\backslash$beginJapanese on a line separate from and preceding the first Japanese character to be printed under its control. Absent strict adherence to this principle all sorts of unJapanese looking things will happen.} pmJ.tex is intended to be the name of a file containing the basic set of instructions for \pmJ. ``$\backslash$beginJapanese'' is a \pmJ-defined magic-word signifying that everything until an ``$\backslash$endJapanese'' is to be interpreted as Japanese, if possible. Using whatever text editor you want, incorporate text in whatever other language you want. Other languages will conflict with Japanese in \pmJ\ only if they use character codes in the range 160 to 254; such languages must be separated from Japanese by enclosing the Japanese in ``$\backslash$beginJapanese'' ``$\backslash$endJapanese'' structures. Create a new last line saying ``$\backslash$bye''. Put a blank line between paragraphs in the text, and use any other \TeX\ commands with which you may be familiar to further adorn your text. Save the file, and say the appropriate words to run \TeX\ on your file, preview it, print it, or whatever. \centerline{What can go wrong} Well, for one, you may not have an editor for entering Japanese. [Sorry; \pmJ\ has nothing to offer you if you have no way to compose your text]. Or, for another, your editor may not use Shift-JIS-encoding. [Easy: convert your text to Shift-JIS-encoding! (How? That's not my problem; or, to be a {\it little} more gracious about it, there are some converters out there, see if you can find one $\ldots$) There is a converter TO\_SJIS to translate from NEC, NEW-JIS and OLD-JIS (as described below) to what we want; this converter should be included with pmJ. There are other encoding schemes, however, of which TO\_SJIS knows nothing]. You may not know ``any other \TeX\ commands with which you may further adorn your text''\footnote{\S}{Careful study will reveal this phrase to be a direct quotation of nothing except itself.} [So take a class]. Something else goes wrong. [Fix it]. You don't have a copy of \pmJ\null. [Get one]. \centerline{How to get \pmJ} From a friend is good. If you have no friend who has \pmJ\ and is prepared to make a copy for you $\ldots$ (that was supposed to be phrased as delicately as possible) $\ldots$ you may get it through an {\it Established Channel}. There are a number of ``\TeX\ distributions'' which may elect to incorporate \pmJ\ in their offerings of \TeX\ and \TeX-related software. You will very likely be able to get a copy from the NWCSC, distributors of \TeX\ for Unix systems; they will have to charge you a fee, however, as even though they are non-profit they still have to pay their employees, pay phone bills, and all the rest of the drag with which I am sure you are familiar. You might also be able to get it electronically and free-of-charge via {\bf ftp} from some place or other, but I couldn't tell you where. As I write, November 30, 1990, none of these alternatives which I have suggested will actually work, and there is no way that you can get \pmJ\ short of enrolling in a course of study at the University of Washington, or you have a friend who has done so, or you make me a really attractive offer. \centerline{Are we really going to show any Japanese or what?} \beginJapanese Well, o.k.! Included below are extracts from {\it Electronic Transfer of Japanese} by Ken R. Lunde [klunde@vms.macc.wisc.edu or klunde@wiscmacc]. It discusses some of the varieties of encoding Japanese as alluded to above. If for some reason discernable to neither of us you are reading this and are as unfamiliar with computer representations of Japanese as I am, the text will be informative in its own right, entirely apart from serving as an example of Japanese. \centerline{PART 3: DIFFERENCES BETWEEN THE THREE 7-BIT JAPANESE CODES} (KANJI-IN will be abbreviated KI; KANJI-OUT, KO) {\tt \newskip\superskip\tabskip=\superskip \halign to \hsize{#\hfil\tabskip=0pt & \quad #\hfil & \quad #\hfil & \quad #\hfil\tabskip=\superskip\cr & KI & KO (JIS-ROMAN) & KO (ASCII)\cr \cr NEW-JIS (1983) & $<$ESC$>$\$B & $<$ESC$>$(J & $<$ESC$>$(B\cr OLD-JIS (1978) & $<$ESC$>$\$@ & $<$ESC$>$(J & $<$ESC$>$(B\cr NEC CODE & $<$ESC$>$K & $<$ESC$>$H & n/a\cr} }% end \tt NOTE \#1: The difference between KO (JIS-ROMAN) and KO (ASCII) is very minor. Also, most terminals can emulate only one of the KO character sets no matter which one it receives. For example, NINJA TERM (PDS) by Michiharu Ariza from SRA only emulates KO-ASCII. NOTE \#2: The difference between NEW-JIS and OLD-JIS is not significant. The most commonly used code in Japan for communication is the NEW-JIS. NEC CODE is the least used. The KI escape sequence tells Japanese terminals to treat what follows as two-bytes per character. KO, on the other hand, tells Japanese terminals to treat what follows as one-byte per character (back to JIS-ROMAN or ASCII). A two-byte per character coding system using 7-bit bytes (ASCII) can encode up to 16,384 characters (128 by 128); however, the Japanese only use the 94 printable ASCII codes in their matrix, so it can only encode a maximum of 8,836 characters (94 by 94). The NEW-JIS Japanese code includes 6,877 standard characters; 6,353 KANJI in 2 levels (level 1: 2,965 KANJI arranged by ON reading; level 2: 3,388 KANJI arranged by radical), 86 KATAKANA, 83 HIRAGANA, 10 numerals, 52 English characters, 147 symbols, 66 Russian characters, 48 Greek characters, and 32 line elements (for making charts). This was established in 1983. PART 4: 7-BIT JAPANESE CODE REPRESENTATIONS (This section contains Japanese text which can be read using appropriate software and hardware.) {\tt \halign to \hsize{#\hfil\tabskip=0pt & #\hfil\tabskip=\superskip\cr JAPANESE TEXT: これは日本語の文章です。\cr (ROMANIZED: KO RE WA NI HON GO NO BUN SHOU DE SU .\cr RAW CODE (NO ESCAPE CODES) \${}3\${}l\${}OF|K$\backslash$8l\${}NJ8$>$O\${}9!\#\cr NEW-JIS: $<$ESC$>$\${}B\${}3\${}l\${}OF$\vert$K$\backslash$8l\${}NJ8$>$O\${}G\${}9!\#$<$ESC$>$(J\cr OLD-JIS: $<$ESC$>$\${}@\${}3\${}l\${}OF$\vert$K$\backslash$8l\${}NJ8$>$O\${}G\${}9!\#$<$ESC$>$(J\cr NEC CODE: $<$ESC$>$K\${}3\${}l\${}OF$\vert$K$\backslash$8l\${}NJ8$>$O\${}G\${}9!\#$<$ESC$>$H\cr} }% end \tt Notice the correspondences between the ASCII characters and Japanese characters, namely that two ASCII characters represent one Japanese character; hence, Japanese characters consist of two bytes. For example, the HIRAGANA sysmbol for ``RE" is represented by the two ASCII characters ``\${}" and ``l". The following paragraphs are Japanese text represented in each of the three major Japanese codes. If you are using a Japanese terminal, then you can view the Japanese text using the procedures found in later sections of this report. NEW-JIS: (KANJI-IN: ``$<$ESC$>$\${}B"; KANJI-OUT: ``$<$ESC$>$(J")  日本語の一文字を7ビット×2バイトのコードで表現する方法には新JIS、旧 JIS、NEC漢字の三種類があります。これらのコードを用いた文章では日本語 の前後に漢字イン、漢字アウトという二つのエスケープ・シーケンスを使用するこ とにより、その中が日本語であることを示します。 OLD-JIS: (KANJI-IN: ``$<$ESC$>$\${}@"; KANJI-OUT: ``$<$ESC$>$(J")  日本語の一文字を7ビット×2バイトのコードで表現する方法には新JIS、旧 JIS、NEC漢字の三種類があります。これらのコードを用いた文章では日本語 の前後に漢字イン、漢字アウトという二つのエスケープ・シーケンスを使用するこ とにより、その中が日本語であることを示します。 NEC CODE: (KANJI-IN: ``$<$ESC$>$K"; KANJI-OUT: ``$<$ESC$>$H")  日本語の一文字を7ビット×2バイトのコードで表現する方法には新JIS、旧 JIS、NEC漢字の三種類があります。これらのコードを用いた文章では日本語 の前後に漢字イン、漢字アウトという二つのエスケープ・シーケンスを使用するこ とにより、その中が日本語であることを示します。 PART 5: 8-BIT JAPANESE CODES These codes cannot be used reliably in the US since 7-bit paths will strip off the 8th bit leaving garbage. These codes are used primarily for internal processing of Japanese. The names in the parentheses below are other names for the same code (i.e., SHIFT-JIS is also called MS-KANJI). EXAMPLES: SHIFT-JIS (MS KANJI) EUC (AT&T JIS) SHIFT-JIS: The two-byte 8-bit Japanese code's implementation is quite unlike that for the two-byte 7-bit code as was described above. I will explain how the two-byte 8-bit code works for the sake of completeness. The two-byte-per- character mode is initiated when the Japanese terminal receives a character from the ASCII 8-bit extension (i.e., any printable from 128-255). This character will be treated as the first byte of an expected two-byte sequence. The following character, which may be ANY printable ASCII character (including those in the 8-bit extension), is treated as the second byte to complete the two-byte sequences. In summary, the first byte in the two-byte 8-bit Japanese code must be a character which is part of the 8-bit extension (value 128-255), and the second byte can be ANY character. The main difference is that the two-byte 8-bit Japanese code does not use KANJI-IN or KANJI-OUT escape sequences to shift to and from the two-byte-per-character mode; the two-byte 7-bit code as was discussed above does implement KANJI-IN and KANJI-OUT escape sequences. \noindent[End of Lunde extract.] \endJapanese If a little is good, let's try some more! \input jtexsamp \bye