% pmjintro.tex --- a brief introduction to poor man's Japanese
\parskip2pt plus 4pt minus 1.5pt
\hsize=5in\vsize=7in\hoffset=1in\voffset=1in
\font\logo=logo10
\input pmJ

\def\pmJ{{\tt pmJ}{\jfcG\char252\jfcb\char220\jfbI\char236}}
% uhhm, we are defining and using \pmJ outside a \beginJapanese 
% environment, that is why we are spelling it out that way
% unless you may have a conflicting language in use elsewhere in the
% document, you can make one big \beginJapanese . . . \endJapanese
% surrounding everything.

\def\mf{{\logo METAFONT}}
\font\bigfont=cmr17
\font\russ=wncyr10

\centerline{{\bigfont Poor Man's Japanese in \TeX}}
\vskip.15in
\centerline{Thomas Ridgeway}
\centerline{Humanities and Arts Computing Center}
\centerline{University of Washington}
\vskip.1in

This little document is intended as an illustration of and preliminary
guide to {\it poor man's 
Japanese} [\pmJ].  \pmJ\ is a temporary expedient for
printing texts with Japanese via \TeX\ until such time as we have a
well-thought-out system (and fonts) for handling Japanese.  \footnote{@}{In J\TeX\ there
exists such a system, but for many individuals and institutions the
relative cost of fonts/unavailability of free fonts has made installing
J\TeX\ not seem advantageous.  It is also inconvenient to have to install another separate \TeX\ system for the purpose unless you are doing Japanese in a big way.}

\pmJ\ has a small number of virtues; among these are:

\item{$\bullet$}\pmJ\ is available now

\item{$\bullet$}\pmJ\ is available free of charge

\item{$\bullet$}\pmJ\ works with standard \TeX3.x

\item{$\bullet$}\pmJ\ is device-independent (although it must be admitted that you
will get relatively more pleasing results on less sophisticated printers)

\item{$\bullet$}\pmJ\ is sufficiently simple-minded you will not need to be a rocket
scientist to make modifications for your own use

\item{$\bullet$}\pmJ\ uses fonts mechanically faked-up from dot matrix quality fonts; if you
want you can incorporate new characters of your own design using a bitmap
font editor which {\it may} be easier to use than \mf.\footnote{\ddag}{Need we point out that the presence of user-defined character codes in a document will make the {\tt .tex} file non-standard?  Such a document will not generate correct output
on a system which does not have the user-defined characters defined (!or has {\it different} user-defined characters defined!).}

{\narrower For those who might be previously unacquainted with \TeX, let us put forth a few of
the advantages of \TeX\ which are inherited by \pmJ: it can run on almost any
modern computer; it can print to any of a large number of printers/typesetters;
it can print (if desired and suitably instructed) elaborate formatting of the text;
it is available at very low cost, or free (which is an instance of very low cost).

}%end narrower

The disadvantages of \pmJ\ are at least equally compelling:

\item{$\bullet$}\pmJ\ is crude and unlikely to ever be greatly improved as regards the quality of its Japanese.

\item{$\bullet$}\pmJ\ uses emulation-in-\mf\ techniques to produce a dot-matrix
font on your output device.  No matter how talented your printer may be,
the Japanese output by \pmJ\ will looked like it was stripped in from
a dot-matrix printer.


\item{$\bullet$}\pmJ\ uses a large number of fonts which occupy a finite and
measurable quantity of diskspace for each font.  Your screen previewer had 
{\it better} be able to use the same resolution your printer uses, or you
may experience a little tightness around the disk.  You probably will not want
the fonts in multiple magsteps 
(to a not inconsiderable extent they will just get uglier if you enlarge them).


\item{$\bullet$}\pmJ\ does not do slanted, bold, or other fancy Japanese although one could
in principle.  However, insofar as even a minimal installation requires
88 fonts {\it for Japanese alone}, we must at this point quote The Master:

{\narrower\narrower the format of a {\it char\_node} allows for up to 256
different fonts and up to 256 characters per font; $\ldots$ \TeX\ intended for oriental languages
will need even more than 256 $\times$ 256 possible characters when we consider different
sizes and styles of type.\footnote{*}{Donald E. Knuth, {\it \TeX : The Program}.
Addison Wesley, Reading MA: 1986. (Volume B of {\it Computers and Typesetting}). \S 134, p.~57.\par
To ease somewhat the problem of the number of fonts used, the wjisxx fonts 
which \pmJ\ uses are generated with a $\backslash$magstep2 variant included
in the same font.  The enlarged version will be printed any time we have
signalled $\backslash$bigJtrue.  I.e., ordinary \beginJapanese
^^b2^^a5 versus
\bigJtrue ^^b2^^a5.\endJapanese}

}%end narrower

\item{$\bullet$}\pmJ\ is based on \TeX\ (hooray!), not on  J\TeX\  (hooray!) nor
any not-known-to-me to-be-extant Foo\TeX\ (huh?) which might
know how to set type vertically, handle thousands of fonts, make 
doughnuts, and do all the other things that we might want software to do.
\pmJ, therefore, does not set type vertically, handle thousands of fonts,
emulate Spads or Spitfires, or any of the other real neat stuff.

\item{$\bullet$}\pmJ\ may take a long time to set up if you need to make your
own fonts.  Preparing {\it one} set of fonts on a NeXT computer for use
at 400 dpi required a \mf\ run of over 37 hours.%
%Normally the fontmaking process attempts to generate one font at a time. The \mf\ source code
%files for the fonts occupied 17 megabytes of disk space when all were 
%stored.
\footnote{\dag}{If one looks at the simple-minded brute-force \mf\ code
which the \pmJ\ fonts use, one may roughly estimate that a staggering number
of computations (probably even more than eight) are needed to emulate a dot-matrix font through this method.
There might, indeed, be a better way to do it.  On the other hand, Pac-Man worldwide
has consumed several hundred million times the number of CPU cycles consumed
by compiling \pmJ\ fonts, so we are not exactly talking about wasting a scarce
resource, are we?

Some of you thought I was going to slam NeXTs here, didn't you?  I'm not that kind of guy.}  You will not ordinarily wish to make up the fonts more
than once, assuming a relative absence of masochism in your personality makeup, nor will you probably want to keep the \mf\ source code permanently
{\it since you can mechanically regenerate it.}  
\bigskip
\centerline{How to use \pmJ}
\medskip
Assume a text editor capable of editing text in Shift-JIS-encoded
Japanese characters.  Assume a text editor capable of editing text in ``normal''
Latin characters.  Use the first text editor to prepare a plain text file
containing the Japanese text; {\it plain text file} means that there are no
tricky codes in the text file which are intended to be used only by the
program that produced the file---``ASCII file'' frequently conveys what we
mean by ``plain text file'', but by virtue of having encoded Japanese, your
file will not be ASCII.  Use the second text editor to enter the Latin-letter
strings providing instructions to \TeX; (if your first text editor is o.k. for
Latin too, you can continue using it for this step! --- but watch out, the JIS 
and Shift-JIS encoding schemes have some Latin characters (graphemes) included in the two-byte code
space: we need ``normal'' Latin (one-byte coded) text).  On a separate line
before the first line of Japanese, put a line saying ``$\backslash$input pmJ $\backslash$beginJapanese''.\footnote{**}{We {\it really} do need to have the $\backslash$beginJapanese on a line separate from and preceding the first Japanese character 
to be printed under its control.  Absent strict adherence to this principle all sorts 
of unJapanese looking things will happen.}
pmJ.tex is intended to be the name of a file containing the basic set of 
instructions for \pmJ.  ``$\backslash$beginJapanese'' is a \pmJ-defined magic-word
signifying that everything until an ``$\backslash$endJapanese'' is to be interpreted
as Japanese, if possible.  Using whatever text editor you want, incorporate text in
whatever other language you want.  Other languages will conflict with Japanese in \pmJ\ only if they
use character codes in the range 160 to 254;  such languages must be separated from
Japanese by enclosing the Japanese in ``$\backslash$beginJapanese'' ``$\backslash$endJapanese''
structures.  Create a new last line saying ``$\backslash$bye''.  Put a blank line
between paragraphs in the text, and use any other \TeX\ commands with which you
may be familiar to further adorn your text.  Save the file, and say the appropriate
words to run \TeX\ on your file, preview it, print it, or whatever.  

\centerline{What can go wrong}

Well, for one, you may not have an editor for entering Japanese.  [Sorry; \pmJ\
has nothing to offer you if you have no way to compose your text].  Or,
for another, your editor may not use Shift-JIS-encoding.  [Easy: convert your text
to Shift-JIS-encoding!  (How?  That's not my problem; or, to be a {\it little} more gracious
about it, there are some converters out there, see if you can find one $\ldots$)  There is a converter TO\_SJIS
to translate from NEC, NEW-JIS and OLD-JIS (as described below) to what we want; this converter should be
included with pmJ.  There are other encoding schemes, however, of which TO\_SJIS knows nothing].
You may not know ``any other \TeX\ commands with which you
may further adorn your text''\footnote{\S}{Careful study will reveal this phrase to be a direct
quotation of nothing except itself.}  [So take a class].  Something else
goes wrong. [Fix it].  You don't have a copy of \pmJ\null.  [Get one].

\centerline{How to get \pmJ}

From a friend is good.  If you have no friend who has \pmJ\ and is prepared to make
a copy for you $\ldots$ (that was supposed to be phrased as delicately as possible) 
$\ldots$ you may get it through an {\it Established Channel}.  There are a number
of ``\TeX\ distributions'' which may elect to incorporate \pmJ\ in their offerings
of \TeX\ and \TeX-related software.  You will very likely be able to get a copy
from the NWCSC, distributors of \TeX\ for Unix systems; they will have to charge you
a fee, however, as even though they are non-profit they still have to pay their employees, pay phone bills, and all
the rest of the drag with which I am sure you are familiar.  You might also be able to get it
electronically and free-of-charge via {\bf ftp} from some place or other, but I
couldn't tell you where.  As I write, November 30, 1990, none of these alternatives which I
have suggested will actually work, and there is no way that you can get \pmJ\ short
of enrolling in a course of study at the University of Washington, or you have a friend
who has done so, or you make me a really attractive offer.

\centerline{Are we really going to show any Japanese or what?}
\beginJapanese
Well, o.k.!  Included below are extracts from 
{\it Electronic Transfer of Japanese} by Ken R. Lunde [klunde@vms.macc.wisc.edu  or  klunde@wiscmacc].
It discusses some of the varieties of encoding Japanese as alluded to above.
If for some reason discernable to neither of us you are reading this and are
as unfamiliar with computer representations of Japanese as I am, the text will
be informative in its own right, entirely apart from serving as an example of Japanese.  

\centerline{PART 3: DIFFERENCES BETWEEN THE THREE 7-BIT JAPANESE CODES}
 
(KANJI-IN will be abbreviated KI; KANJI-OUT, KO)
{\tt
\newskip\superskip\tabskip=\superskip
\halign to \hsize{#\hfil\tabskip=0pt & \quad #\hfil & \quad #\hfil & \quad #\hfil\tabskip=\superskip\cr 
                 & KI       & KO (JIS-ROMAN) & KO (ASCII)\cr
\cr
NEW-JIS (1983)   & $<$ESC$>$\$B  & $<$ESC$>$(J        & $<$ESC$>$(B\cr
OLD-JIS (1978)   & $<$ESC$>$\$@  & $<$ESC$>$(J        & $<$ESC$>$(B\cr
NEC CODE         & $<$ESC$>$K   & $<$ESC$>$H         &   n/a\cr}
}% end \tt
 
NOTE \#1: The difference between KO (JIS-ROMAN) and KO (ASCII) is very minor.
Also, most terminals can emulate only one of the KO character sets no matter
which one it receives. For example, NINJA TERM (PDS) by Michiharu Ariza from
SRA only emulates KO-ASCII.
 
NOTE \#2: The difference between NEW-JIS and OLD-JIS is not significant.
 
     The most commonly used code in Japan for communication is the NEW-JIS.
NEC CODE is the least used.
 
     The KI escape sequence tells Japanese terminals to treat what follows as
two-bytes per character. KO, on the other hand, tells Japanese terminals to
treat what follows as one-byte per character (back to JIS-ROMAN or ASCII).
 
     A two-byte per character coding system using 7-bit bytes (ASCII) can
encode up to 16,384 characters (128 by 128); however, the Japanese only use
the 94 printable ASCII codes in their matrix, so it can only encode a maximum
of 8,836 characters (94 by 94).
 
     The NEW-JIS Japanese code includes 6,877 standard characters; 6,353 KANJI
in 2 levels (level 1: 2,965 KANJI arranged by ON reading; level 2: 3,388 KANJI
arranged by radical), 86 KATAKANA, 83 HIRAGANA, 10 numerals, 52 English
characters, 147 symbols, 66 Russian characters, 48 Greek characters, and 32
line elements (for making charts). This was established in 1983.
 
 
PART 4: 7-BIT JAPANESE CODE REPRESENTATIONS
 
(This section contains Japanese text which can be read using appropriate
software and hardware.)

{\tt
\halign to \hsize{#\hfil\tabskip=0pt & #\hfil\tabskip=\superskip\cr 
JAPANESE TEXT:                   これは日本語の文章です。\cr
(ROMANIZED:               KO RE WA NI HON GO NO BUN SHOU DE SU .\cr
RAW CODE (NO ESCAPE CODES)       \${}3\${}l\${}OF|K$\backslash$8l\${}NJ8$>$O\${}9!\#\cr
NEW-JIS:                  $<$ESC$>$\${}B\${}3\${}l\${}OF$\vert$K$\backslash$8l\${}NJ8$>$O\${}G\${}9!\#$<$ESC$>$(J\cr
OLD-JIS:                  $<$ESC$>$\${}@\${}3\${}l\${}OF$\vert$K$\backslash$8l\${}NJ8$>$O\${}G\${}9!\#$<$ESC$>$(J\cr
NEC CODE:                  $<$ESC$>$K\${}3\${}l\${}OF$\vert$K$\backslash$8l\${}NJ8$>$O\${}G\${}9!\#$<$ESC$>$H\cr}
}% end \tt
 
     Notice the correspondences between the ASCII characters and Japanese
characters, namely that two ASCII characters represent one Japanese
character; hence, Japanese characters consist of two bytes. For example, the
HIRAGANA sysmbol for ``RE" is represented by the two ASCII characters ``\${}" and
``l".
 
     The following paragraphs are Japanese text represented in each of the
three major Japanese codes. If you are using a Japanese terminal, then you
can view the Japanese text using the procedures found in later sections of
this report.
 
NEW-JIS: (KANJI-IN: ``$<$ESC$>$\${}B"; KANJI-OUT: ``$<$ESC$>$(J")
 
　日本語の一文字を７ビット×２バイトのコードで表現する方法には新ＪＩＳ、旧
ＪＩＳ、ＮＥＣ漢字の三種類があります。これらのコードを用いた文章では日本語
の前後に漢字イン、漢字アウトという二つのエスケープ・シーケンスを使用するこ
とにより、その中が日本語であることを示します。
 
OLD-JIS: (KANJI-IN: ``$<$ESC$>$\${}@"; KANJI-OUT: ``$<$ESC$>$(J")
 
　日本語の一文字を７ビット×２バイトのコードで表現する方法には新ＪＩＳ、旧
ＪＩＳ、ＮＥＣ漢字の三種類があります。これらのコードを用いた文章では日本語
の前後に漢字イン、漢字アウトという二つのエスケープ・シーケンスを使用するこ
とにより、その中が日本語であることを示します。
 
NEC CODE: (KANJI-IN: ``$<$ESC$>$K"; KANJI-OUT: ``$<$ESC$>$H")
 
　日本語の一文字を７ビット×２バイトのコードで表現する方法には新ＪＩＳ、旧
ＪＩＳ、ＮＥＣ漢字の三種類があります。これらのコードを用いた文章では日本語
の前後に漢字イン、漢字アウトという二つのエスケープ・シーケンスを使用するこ
とにより、その中が日本語であることを示します。
 
 
PART 5: 8-BIT JAPANESE CODES
 
     These codes cannot be used reliably in the US since 7-bit paths will strip
off the 8th bit leaving garbage. These codes are used primarily for internal
processing of Japanese. The names in the parentheses below are other names for
the same code (i.e., SHIFT-JIS is also called MS-KANJI).
 
     EXAMPLES:     SHIFT-JIS (MS KANJI)  EUC (AT&T JIS)
 
SHIFT-JIS:
 
     The two-byte 8-bit Japanese code's implementation is quite unlike that
for the two-byte 7-bit code as was described above. I will explain how
the two-byte 8-bit code works for the sake of completeness. The two-byte-per-
character mode is initiated when the Japanese terminal receives a character
from the ASCII 8-bit extension (i.e., any printable from 128-255).
This character will be treated as the first byte of an expected two-byte
sequence. The following character, which may be ANY printable ASCII character
(including those in the 8-bit extension), is treated as the second byte
to complete the two-byte sequences. In summary, the first byte in the two-byte
8-bit Japanese code must be a character which is part of the
8-bit extension (value 128-255), and the second byte can be ANY
character. The main difference is that the two-byte 8-bit Japanese code
does not use KANJI-IN or KANJI-OUT escape sequences to shift to and from the
two-byte-per-character mode; the two-byte 7-bit code as was discussed above
does implement KANJI-IN and KANJI-OUT escape sequences.
 
\noindent[End of Lunde extract.]

\endJapanese
If a little is good, let's try some more!

\input jtexsamp
\bye