========================================================================= SI_TEX.CHA Change file for SISISI-hyphenation in TeX 3.1 16.7.1991 (c) for SISISI-hyphenation: Wilhelm Barth, Heinrich Nirschl (c) for TeX changes: Heini Hofstaedter, Harald Mueller additional changes until 10. October 1991 HH ========================================================================= This is the change file for inclusion of SISISI into TeX V.3.14. There are three main changes in this file (which are of course quite intermingled): 1) the replacement of Liang's hyphenation by SISISI. 2) the change of the interface of the actual hyphenation to the post-hyphenation and the line-break algorithm. This change is necessary because in original TeX, only the information "hyphen - no hyphen" can be transmitted to the line break algorithm because of two "bottlenecks": 1. Hyphenation --> Post-Hyphenation: Here a hyphen at position i is marked by the 1-bit information odd(hyf[i]). 2. Post-Hyphenation --> Line-Breaking: Here a hyphen is marked by the existence of a discretionary node. From SISISI, there must be more information flowing to the line-break algorithm: a. Is a hyphen a main hyphen or a "nebenhyphen"? To this end, a change is made such that a penalty node is allowed as the first node in the prebreak list of a discretionary node. This penalty node contains the penalty associated with this specific hyphen. For "nebenhyphens", this value is set by an additional primitive "\nebenhyphenpenalty". Instead of the hyf-array, an array hyfpen is used that contains the actual penalties; the old "if odd(hyf[i])" now is converted ti "if hyfpen[i] <> inf_penalty". b. A hyphen may be more complicated than a single -: ck-hyphens and triple-consonant-hyphens may be present. To include these, another array hyf_disc is introduced that contains ready-made discretionary nodes for such hyphens. At first, we simply wanted to include these discretionary during the post-hyphenation into the new node list. But then Heini felt that the impossiblity of ligatures with parts of such a discretionary node was not acceptable; therefore he introdiced the "x-mechanism" (characterized by many variables starting with "x") and changed the reconstitute function so that more than the single hyphen_char could be introduced into the reconstitution process. 3) "Dirty" changes for the hyphenation of words with umlauts. These changes are necessary because the umlauts in TeX are not (necessarily) real characters, but something else (see definition of umlauts in plain.tex or german.tex). Ordinarily, the hyphenation only accepts sequences of letters, ligatures (which are expanded to their single characters), and so-called "implicit kerns". An umlaut, however, is usually represented as a letter character and another character consisting of two dots ("dieresis character"), which are connected by explicit kerns (not implicite ones!). Therefore, words containing umlauts are not hyphenable words in TeX, which is not acceptable for german texts. Therefore, we devised a "bypass" for umlauts as follows: An umlaut is now represented as a special discretionary node (so-called "dirty umlaut") as follows: \discretionary{\kern4321sp m\kernsp\kernsp\kern-1em}% {}% {} where the 4321sp is a "marker" for the "dirty umlaut", the "m" letter is used to derive the font of the umlaut, is the font number that should be associated with this umlaut (the font numbers of the umlauts are given below); and is used to cancel the preceding kerns, i.e.\ = - (4321 + ). In other words, the prebreak list contains the information that this is an umlaut (albeit a "dirty" one) and some more information needed for the hyphenation, the postbreak list is empty, and the main list contains the umlaut proper. By a special if-statement in the line-break algorithm, a hyphenation at a discretionary node of this format is suppressed, so that always the umlaut of the main list is printed. The font numbers for the lowercase umlauts (which are given in the second \kern node) must be: ae 228 oe 246 ue 252 The codes for the uppercase umlauts can be chosen arbitrarily. In order to get the hyphenation for these "dirty umlauts" running, one must include lccode assignments of the form \lccode= for each of the new letters (see example below). A typical declaration of the "dirty umlauts" runs like this: %%%%%%%%%%%%%%%%%%%%% declaration of dirty umlauts %%%%%%%%%%%%%%%%%%%%%% \def\A"{\discretionary{\kern 4321sp m\kern 196sp\kern-4517sp\kern-1em}{}{\"A}} \def\a"{\discretionary{\kern 4321sp m\kern 228sp\kern-4549sp\kern-1em}{}{\"a}} \def\O"{\discretionary{\kern 4321sp m\kern 214sp\kern-4535sp\kern-1em}{}{\"O}} \def\o"{\discretionary{\kern 4321sp m\kern 246sp\kern-4567sp\kern-1em}{}{\"o}} \def\U"{\discretionary{\kern 4321sp m\kern 220sp\kern-4541sp\kern-1em}{}{\"U}} \def\u"{\discretionary{\kern 4321sp m\kern 252sp\kern-4573sp\kern-1em}{}{\"u}} \lccode196=228 % uppercase umlaut A \lccode214=246 % uppercase umlaut O \lccode220=252 % uppercase umlaut U \lccode228=228 % uppercase umlaut a \lccode246=246 % uppercase umlaut o \lccode252=252 % uppercase umlaut u %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% When (if?) one day N.Schwarz's DC/EC fonts arrive, the changes of the third kind can be eliminated. How long this will take ... well, wait and see. But when they arrive, it's not longer possible to choose the codes for uppercase umlauts arbitrarily. The codes for uppercase umlauts will then be those, shown in the example above. ========================================================================= At first, we add a control sequence \SISISI. @x \def\PASCAL{Pascal} @y \def\SISISI{{\sl SISISI}} \def\PASCAL{Pascal} \let\maybe=\iffalse % print only changed modules @z ------------------------------------------------------------------------- @x @d banner=='This is TeX, Version 3.14' {printed when \TeX\ starts} @y @d banner=='This is TeX, Version 3.14 (+SISISI 1.0)' @z ------------------------------------------------------------------------- A number of procedures of SISISI are declared after the TeX program. As these procedures are used in the hyphenation, of course, they must be declared as `forward'. In web2c gibt es keine forward's !!!! %@x %procedure initialize; {this procedure gets things started properly} %@y %@ %procedure initialize; {this procedure gets things started properly} %@z ------------------------------------------------------------------------- %@x %@d pool_name_length=8 %@y %@d pool_name_length=11 % {number of characters of ``pool_f_name''} %@z %see the following change of the pool file name "pool_f_name" ------------------------------------------------------------------------- @x @!pool_name='TeXformats:TEX.POOL '; @y @!pool_name='TeXformats:SITEX3.POOL '; @z ------------------------------------------------------------------------- @x else bad_pool('! I can''t read TEX.POOL.') @.I can't read TEX.POOL@> @y else bad_pool('! I can''t read SITEX3.POOL.'); @.I can't read TEX.POOL@> @z ========================================================================= To have all messages, coming from SI_TEX, displayed on screen and written to the log-file simultanously, we make the following, additional definitions. temp nicht mehr vorhanden, in log vendert. @x @d wlog_ln(#)==write_ln(log_file,#) @d wlog_cr==write_ln(log_file) @y @d wlog_ln(#)==write_ln(log_file,#) @d wlog_cr==write_ln(log_file) @d si_put(#) == @+ begin if (file_offset<>0)or(term_offset<>0) then print_ln; if selector<>log_only then wterm(#); if (selector=log_only)or(selector=term_and_log) then wlog(#); end @d si_put_ln(#) == @+ begin if (file_offset<>0)or(term_offset<>0) then print_ln; if selector<>log_only then wterm_ln(#); if (selector=log_only)or(selector=term_and_log) then wlog_ln(#); end @d si_put_cr == print_ln @d dbg_put(#) == {} @d dbg_put_ln(#) == {} @d dbg_put_cr == {} @z ========================================================================= We add a new primitive \nebenhyphenpenalty for setting the penalty of a minor hyphen: @x @d int_pars=55 {total number of integer parameters} @d count_base=int_base+int_pars {256 user \.{\\count} registers} @y @d neben_hyphen_penalty_code=55 {penalty for break after neben-hyphen} @d si_debug_code=56 {value is treated as an set of active debugging sections} @d supress_debug_mark_code=57 {same as above; if a section number is set, then the output of the section number and the debugging output banner of this section is supressed} @d int_pars=58 {total number of integer parameters} @d count_base=int_base+int_pars {256 user \.{\\count} registers} @z ------------------------------------------------------------------------- @x @= @y @d neben_hyphen_penalty==int_par(neben_hyphen_penalty_code) @d si_debug(#) == if odd(int_par(si_debug_code) div #) then begin if not odd(int_par(supress_debug_mark_code)div #) then begin si_put_cr; si_put_ln(' <* debugging output section #',#:#,' *>'); end; {marking supression} { dig_count entfernt !!! } @d gubed_si == end @f si_debug == begin @f gubed_si == end @f gubed == nil @= @z ------------------------------------------------------------------------- @x othercases print("[unknown integer parameter!]") @y neben_hyphen_penalty_code:print_esc("nebenhyphenpenalty"); si_debug_code:print_esc("sidebug"); supress_debug_mark_code:print_esc("supressdebugmark"); othercases print("[unknown integer parameter!]") @z ------------------------------------------------------------------------- @x @ @= @y primitive("nebenhyphenpenalty",assign_int,int_base+neben_hyphen_penalty_code);@/ @!@:neben_hyphen_penalty_}{\.{\\nebenhyphenpenalty} primitive@> primitive("sidebug",assign_int,int_base+si_debug_code); primitive("supressdebugmark",assign_int,int_base+supress_debug_mark_code); @ @= @z ========================================================================= As now hyphens may have different penalties, we must somehow associate a penalty with a discretionary node. We do this by allowing a penalty node as the first node in the prebreak list of a discretionary node. The following changes are in two case statements that check the nodes in the sublists of a discretionary node; we add an empty case for penalty nodes (two more case statements follow a little bit later). @x othercases confusion("disc1") @y penalty_node: ; othercases confusion("disc1") @z ------------------------------------------------------------------------- @x othercases confusion("disc2") @y penalty_node: ; othercases confusion("disc2") @z ========================================================================= MAIN CHANGE 2) The following change is in the line-break section, where try_break is given the penalty stored in the penalty node. This change is the reason for all other changes concerning penalties. Moreover, in this change a line break is suppressed if the discretionary node is actually a "dirty umlaut" (via "if not is_dirty_umlaut(cur_p) then ... try_break"). @x begin s:=pre_break(cur_p); disc_width:=0; if s=null then try_break(ex_hyphen_penalty,hyphenated) else begin repeat @; s:=link(s); until s=null; act_width:=act_width+disc_width; try_break(hyphen_penalty,hyphenated); act_width:=act_width-disc_width; end; r:=replace_count(cur_p); s:=link(cur_p); while r>0 do begin @; decr(r); s:=link(s); end; prev_p:=cur_p; cur_p:=s; goto done5; end @y begin s:=pre_break(cur_p); disc_width:=0; if s=null then try_break(ex_hyphen_penalty,hyphenated) else if (type(s)=penalty_node) and (link(s)=null) then begin flush_node_list(s); s := null; try_break(ex_hyphen_penalty,hyphenated); end else if not is_dirty_umlaut(cur_p) then begin if type(s)=penalty_node then s:=link(s); repeat @; s:=link(s); until s=null; act_width:=act_width+disc_width; if type(pre_break(cur_p))=penalty_node then begin try_break(penalty(pre_break(cur_p)),hyphenated) end else begin try_break(hyphen_penalty,hyphenated); end; act_width:=act_width-disc_width; end; r:=replace_count(cur_p); s:=link(cur_p); while r>0 do begin @; decr(r); s:=link(s); end; prev_p:=cur_p; cur_p:=s; goto done5; end @z ========================================================================= Two more case statements. @x othercases confusion("disc3") @y penalty_node: ; othercases confusion("disc3") @z ------------------------------------------------------------------------- @x othercases confusion("disc4") @y penalty_node: ; othercases confusion("disc4") @z ========================================================================= There's a little problem, converting a device independent (``.dvi'') to a post script file (``.ps''). I think that this program part could cause this problem. so let's see... !a <-- = --> @x With this, i want to try to reproduce this error. Because, after i had inserted this change, that error didn`t occur again! @= q:=cur_break(cur_p); disc_break:=false; post_disc_break:=false; if q<>null then {|q| cannot be a |char_node|} if type(q)=glue_node then begin delete_glue_ref(glue_ptr(q)); glue_ptr(q):=right_skip; subtype(q):=right_skip_code+1; add_glue_ref(right_skip); goto done; end else begin if type(q)=disc_node then @ else if (type(q)=math_node)or(type(q)=kern_node) then width(q):=0; end else begin q:=temp_head; while link(q)<>null do q:=link(q); end; @; done: !a <-- = --> @y @= q:=cur_break(cur_p); disc_break:=false; post_disc_break:=false; if q<>null then {|q| cannot be a |char_node|} if type(q)=glue_node then begin delete_glue_ref(glue_ptr(q)); glue_ptr(q):=right_skip; subtype(q):=right_skip_code+1; add_glue_ref(right_skip); goto done; end else begin if type(q)=disc_node then begin hyfdeb_b:=false; if is_dirty_umlaut(q) then begin si_debug(64) hyfdeb_b:=true; gubed_si; si_put_cr; si_put_ln(' ??? There's a line break at a dirty umlaut!?'); end; if not hyfdeb_b then begin @ end; end else if (type(q)=math_node)or(type(q)=kern_node) then width(q):=0; end else begin q:=temp_head; while link(q)<>null do q:=link(q); end; @; done: !a <-- = --> @z ========================================================================= If the break actually occurs, the penalty node is made a separate list with only one node, which is immediately flushed. @x begin s:=pre_break(q); link(q):=s; while link(s)<>null do s:=link(s); pre_break(q):=null; q:=s; end @y begin s:=pre_break(q); if type(s)=penalty_node then begin s:=link(s); link(pre_break(q)):= null; flush_node_list(pre_break(q)); end; link(q):=s; while link(s)<>null do s:=link(s); pre_break(q):=null; q:=s; end @z ========================================================================= The type `array[0..65] of 0..256' is needed more often, so we make it an explicit type (see big SISISI change at end of change file). hc is now declared of this type. @x @!hc:array[0..65] of 0..256; {word to be hyphenated} @!hn:small_number; {the number of positions occupied in |hc|} @!ha,@!hb:pointer; {nodes |ha..hb| should be replaced by the hyphenated result} @y @!hc:a_word; {word to be hyphenated} @!hn:small_number; {the number of positions occupied in |hc|} @!ha,@!hb:pointer; {nodes |ha..hb| should be replaced by the hyphenated result} { some variables for debugging output } @!hyfdeb_i,hyfdeb_j: integer; @!hyfdeb_b:boolean; @!si_p, si_q, si_r, si_s: pointer; @z ------------------------------------------------------------------------- Our hyphenation routines need even more "more local variables". @x @ Hyphenation routines need a few more local variables. @y @!dirty_umlaut:array [-1..64] of pointer; {the [-1] field is used, if a whatsit node occurs inbetween a word. This can happen due to Partl's changes in ``german.sty'' } @ Hyphenation routines need a few more local variables. @z ------------------------------------------------------------------------- @x @!j:small_number; {an index into |hc| or |hu|} @y @!qq:pointer; @!j:small_number; {two indices into |hc| or |hu|} @!jj:integer; @z ========================================================================= The dirty_umlaut array must be initialized for each word. And we enable watching the node list, representing the word to be hyphenated. @x if s<>null then begin @null then begin si_debug(1) { hyfdeb_i:=depth_threshold; depth_threshold:=100; hyfdeb_j:=breadth_max; breadth_max:=100; } show_node_list(si_p); { depth_threshold:=hyfdeb_i; breadth_max:=hyfdeb_j; } gubed_si; @null then begin depth_threshold:=100; {show all levels of a list} breadth_max:=200; {show all nodes of one level of a list} si_r:=link(si_q); link(si_q):=null; end else begin (* take the values, set by the user, for this variables! *) breadth_max:=show_box_breadth; depth_threshold:=show_box_depth; end; show_node_list(si_p); si_put_cr; if si_q <> null then link( si_q ) := si_r; depth_threshold:=hyfdeb_i; breadth_max:=hyfdeb_j; gubed_si; end @z ========================================================================= MAIN CHANGE 1) Here, we add the SISISI-procedure abteilen for use in hyphenate and another label done1. @x procedure hyphenate; label common_ending,done,found,found1,found2,not_found,exit; var @@; @y @ procedure hyphenate; label common_ending,done,done1,found,found1,found2,not_found,exit; var @@; @z ========================================================================= MAIN CHANGE 3) The prehyphenation loop must now also skip over the "dirty umlauts". @x loop@+ begin if is_char_node(s) then begin c:=qo(character(s)); hf:=font(s); end else if type(s)=ligature_node then if lig_ptr(s)=null then goto continue else begin q:=lig_ptr(s); c:=qo(character(q)); hf:=font(q); end else if (type(s)=kern_node)and(subtype(s)=normal) then goto continue else if type(s)=whatsit_node then begin @; goto continue; end else goto done1; if lc_code(c)<>0 then if (lc_code(c)=c)or(uc_hyph>0) then goto done2 @y loop@+ begin if is_char_node(s) then begin c:=qo(character(s)); hf:=font(s); end else if type(s)=ligature_node then begin if lig_ptr(s)=null then goto continue else begin q:=lig_ptr(s); c:=qo(character(q)); hf:=font(q); end end else if is_dirty_umlaut(s) then begin c:=get_char_of(s); hf:=get_font_of(s); end else if (type(s)=kern_node)and(subtype(s)=normal) then goto continue else if type(s)=whatsit_node then begin @; goto continue; end else goto done1; if lc_code(c)<>0 then if (lc_code(c)=c)or(uc_hyph>0) then goto done2 @z ========================================================================= MAIN CHANGE 2) The interface variables for the hyphenation are extended: hyfpen[i] holds the actual penalty of a hyphen at position i, hyf_disc[i] contains a "special discretionary" for triple-consonant-hyphens and ck-hyphens. hc_si is the word as given to the SISISI algorithm; hyf_si is returned from SISISI and contains markers for the hyphens. After this change, there are many changes where `odd(hyf[i])' is simply replaced by `hyfpen[i]<>inf_penalty'. Another procedure to be changed is `reconstitute': It now has to handle special dicretionary nodes. For a detailed feeling about the code changes in reconstitute, either (1) look through it bit by bit, or (2) ask Heini Hofstaedter. @x end else if type(s)=ligature_node then @ @y end else if is_dirty_umlaut(s) then begin if get_font_of(s)<>hf then goto done3; c:=get_char_of(s); if lc_code(c)=0 then goto done3; if hn=63 then goto done3; hb:=s; incr(hn); hu[hn]:=c; hc[hn]:=lc_code(c); dirty_umlaut[hn]:=s; for jj:=1 to replace_count(s) do s:=link(s); end else if type(s)=whatsit_node then begin @; dirty_umlaut[-1]:=s; end else if type(s)=ligature_node then @ @z ========================================================================= MAIN CHANGE 3) Also in the conversion loop (where the node list is shuffled into the hc array), we must skip a "dirty umlaut". @x loop@+ begin if not(is_char_node(s)) then case type(s) of ligature_node: do_nothing; @y loop@+ begin if not(is_char_node(s)) then if is_dirty_umlaut(s) then begin for jj:=1 to replace_count(s)-1 do s:=link(s); end else case type(s) of ligature_node: do_nothing; @z ========================================================================= Well, there are a lot of new variables that are necessary for the new hyphenation. Most of them are necessary for Heini's change in the reconstitute function, where now discretionary nodes for hyphens with more than one character (e.g. the german ck-hyphens or three-consonant-hyphens) can be inserted. @x @ @= @!i,@!j,@!l:0..65; {indices into |hc| or |hu|} @!q,@!r,@!s:pointer; {temporary registers for list manipulation} @!bchar:halfword; {right boundary character of hyphenated word, or |non_char|} @y @!hyfpen:array [0..64] of integer; {penalties for possible discretionary hyphens} @!hyf_si:trennvektor; @!hc_si:wort; @!hyf_disc:array [0..64] of pointer; @ @= @!i,@!j,@!l,@!kk:0..65; {indices into |hc| or |hu|} @!q,@!r,@!s,@!qq,@!qr,@!hold_r:pointer; {temporary registers for list manipulation} @!bchar:halfword; {right boundary character of hyphenated word, or |non_char|} @!xchg_char:array [0..10] of 0..256; {type of |hu| and |hc|} @!add_char, {additional characters when hyphenating e.g.\ Schiffahrt $\rightarrow$ Schif{\sl f}-fahrt: here |add_char| = 1} @!xchg_i,xchg_k,xchg_hu: 0..65; {type like |i|, |j| and |l| } @!spec_hyf: pointer; {special hyphen; causes different spelling of the word, when it is hyphenated at this point} @!si_j:integer; @z ========================================================================= An odd->inf_penalty change. @x for j:=l_hyf to hn-r_hyf do if odd(hyf[j]) then goto found1; @y for j:=l_hyf to hn-r_hyf do if hyfpen[j]<>inf_penalty then goto found1; @z ========================================================================= The new variable hold_r is initialized (the kern node is a dummy node, as far as I can see). @x if not is_char_node(hb) then if type(hb)=ligature_node then if odd(subtype(hb)) then @y hold_r := new_kern(1235); link(hold_r):=r; if not is_char_node(hb) then if type(hb)=ligature_node then if odd(subtype(hb)) then @z ========================================================================= Initialize some variables if the list starts with a "dirty umlaut". @x else if type(ha)=ligature_node then if font(lig_char(ha))<>hf then goto found2 else begin init_list:=lig_ptr(ha); init_lig:=true; init_lft:=(subtype(ha)>1); @y else if is_dirty_umlaut(ha) then begin if get_font_of(ha) <> hf then goto found2 else begin init_list:=ha; init_lig := false; hu[0]:=get_char_of(ha); end end else if type(ha)=ligature_node then if font(lig_char(ha))<>hf then goto found2 else begin init_list:=lig_ptr(ha); init_lig:=true; init_lft:=(subtype(ha)>1); @z ========================================================================= MAIN CHANGE 3) unhinge all "dirty umlauts". @x common_ending: flush_node_list(r); @; flush_list(init_list) @y common_ending: qq:=hold_r; for kk:={1}j to hn do if dirty_umlaut[kk] <> null then begin while link(qq) <> dirty_umlaut[kk] do qq := link(qq); qr := link(qq); for i:=1 to replace_count(qr) do if qr<>null then qr:=link(qr) else begin {test:} si_put_ln('...??..X --> ????? !'); end; {:test} if qr<>null then begin link(qq):=link(qr); link(qr):=null; end else link(qq):=null; end; if dirty_umlaut[-1]<>null then begin qq:=hold_r; while (link(qq)<>dirty_umlaut[-1])and(link(qq)<>null) do qq:=link(qq); if link(qq)=dirty_umlaut[-1] then begin link(qq):=link(dirty_umlaut[-1]); link(dirty_umlaut[-1]):=null; end else begin si_put(' *** ERROR: node list is damaged! '); si_put_ln('(post hyphenation - unhinge a whatsit node)'); end; end; {} r:=hold_r; while link(r)<>null do begin if (mem_min<=link(r))and(link(r)<=mem_max) then r:=link(r) else begin wlog_cr; wterm_cr; wlog_ln('???$ - ERROR in node list!'); wterm_ln('???$ - ERROR in node list!'); kk:=depth_threshold; l:=breadth_max; depth_threshold:=100; breadth_max:=100; show_node_list(hold_r); link(r):=null; depth_threshold:=kk; breadth_max:=l; end; end; {} flush_node_list(hold_r); @; flush_list(init_list) @z ------------------------------------------------------------------------ Another odd->inf_penalty change (this time only textual). @x If there exists an index |k| in the range $j\le k\le m$ such that |hyf[k]| is odd and such that the result of |reconstitute| would have been different if $x_{k+1}$ had been |hchar|, then |reconstitute| sets |hyphen_passed| to the smallest such~|k|. Otherwise it sets |hyphen_passed| to zero. @y If there exists an index |k| in the range $j\le k\le m$ such that |hyfpen[k]<>inf_penalty| and such that the result of |reconstitute| would have been different if $x_{k+1}$ had been |hchar|, then |reconstitute| sets |hyphen_passed| to the smallest such~|k|. Otherwise it sets |hyphen_passed| to zero. @z ------------------------------------------------------------------------- In the reconstitute function, we set the boundary value n immediately before the next "dirty umlaut" by a simple for-loop starting at the original n. @x begin hyphen_passed:=0; t:=hold_head; w:=0; link(hold_head):=null; @y @!ii:small_number; begin hyphen_passed:=0; t:=hold_head; w:=0; link(hold_head):=null; for ii:=n downto j do if dirty_umlaut[ii] <> null then n:=ii-1; @z ------------------------------------------------------------------------- Another odd->inf_penalty change. @x if odd(hyf[j]) then cur_rh:=hchar@+else cur_rh:=non_char; @y if hyfpen[j]<>inf_penalty then cur_rh:=hchar@+else cur_rh:=non_char; @z ========================================================================= A first change in the reconstitute function. @x else begin if hcharinf_penalty then begin hyphen_passed:=j; hchar:=non_char; end; if op_byte(q)inf_penalty change. @x repeat l:=j; j:=reconstitute(j,hn,bchar,qi(hyf_char))+1; if hyphen_passed=0 then begin link(s):=link(hold_head); while link(s)>null do s:=link(s); if odd(hyf[j-1]) then begin l:=j; hyphen_passed:=j-1; link(hold_head):=null; end; end; if hyphen_passed>0 then @; until j>hn; link(s):=q @ In this repeat loop we will insert another discretionary if |hyf[j-1]| is odd, when both branches of the previous discretionary end at position |j-1|. @y if dirty_umlaut[-1]<>null then begin {hook in the previous saved whatsit node before the first character of the hyphenated word} link(s):=dirty_umlaut[-1]; s:=link(s); end; repeat if dirty_umlaut[j]<>null then begin link(s):=dirty_umlaut[j]; while link(s)>null do s:=link(s); link(hold_head):=null; j:=j+1; hyphen_passed:=0; if hyfpen[j-1]<>inf_penalty then begin l:=j; hyphen_passed:=j-1; si_debug(16) si_put_ln(' *** append a hyphen immediatly after a dirty umlaut!'); gubed_si; end; end else begin l:=j; j:=reconstitute(j,hn,bchar,qi(hyf_char))+1; if hyphen_passed=0 then begin link(s):=link(hold_head); while link(s)>null do s:=link(s); if hyfpen[j-1]<>inf_penalty then begin l:=j; hyphen_passed:=j-1; link(hold_head):=null; end; end; end; {if dirty_umlaut[j]<>null then-else} if hyphen_passed>0 then @; until j>hn; link(s):=q; si_q:=s; {save the last node of the hyphenated word for debugging output} for j:=0 to 64 do if hyf_disc[j]<>null then begin flush_node_list( hyf_disc[j]); end @ In this repeat loop we will insert another discretionary if |hyfpen[j-1]<>inf_penalty|, when both branches of the previous discretionary end at position |j-1|. @z ========================================================================= Another odd->inf_penalty change. @x hyphen_passed:=j-1; link(hold_head):=null; until not odd(hyf[j-1]) @y hyphen_passed:=j-1; link(hold_head):=null; until (hyfpen[j-1]=inf_penalty) @z ========================================================================= Here comes the big reconstitution action (cheers to Heini!). @x @= minor_tail:=null; pre_break(r):=null; hyf_node:=new_character(hf,hyf_char); if hyf_node<>null then begin incr(i); c:=hu[i]; hu[i]:=hyf_char; free_avail(hyf_node); end; while l<=i do begin l:=reconstitute(l,i,font_bchar[hf],non_char)+1; if link(hold_head)>null then begin if minor_tail=null then pre_break(r):=link(hold_head) @y As the value of |hyfpen[i]| is necessary for creating a penalty node, we must create this node before we ``kill'' the hyphenation information in |hyfpen[i]| by assigning |inf_penalty|. |minor_tail| contains the pointer to the penalty node or |null|, if no such node has been created. @= if (hyfpen[i]<>inf_penalty) and (hyfpen[i]<>hyphen_penalty) then minor_tail:=new_penalty(hyfpen[i]) else minor_tail:=null; hyfpen[i]:=inf_penalty; pre_break(r):=minor_tail; hyf_node:=new_character(hf,hyf_char); {Exchange |pre_break|-list of the selfbuilt |disc_node| with |hu|} xchg_k:=i; if hyf_disc[xchg_k]<>null then begin spec_hyf:=pre_break(hyf_disc[xchg_k]); add_char:=subtype(spec_hyf); xchg_i:=1; repeat spec_hyf:=link(spec_hyf); if spec_hyf<>null then begin xchg_char[xchg_i]:=qo(character(spec_hyf)); incr(xchg_i); end;{if} until spec_hyf=null; if hyf_node<>null then begin xchg_char[xchg_i]:=hyf_char; free_avail(hyf_node); end else decr(xchg_i); xchg_hu:=i-xchg_i+add_char+2; {this is the index of |hu| where the first character is replaced by a character of the |pre_break|-list of |spec_hyf|} for xchg_k:=1 to xchg_i do begin xchg_char[0]:=hu[xchg_hu]; hu[xchg_hu]:=xchg_char[xchg_k]; xchg_char[xchg_k]:=xchg_char[0]; incr(xchg_hu); end; {|for|} xchg_k:=i; i:=xchg_hu-1; xchg_hu:=xchg_hu-xchg_i; end {|if hyf_disc[]<>null|} else if hyf_node<>null then begin incr(i); c:=hu[i]; hu[i]:=hyf_char; free_avail(hyf_node); end; while l<=i do begin l:=reconstitute(l,i,font_bchar[hf],non_char)+1; if link(hold_head)>null then begin if minor_tail=null then pre_break(r):=link(hold_head) @z ------------------------------------------------------------------------- ... continued: @x if hyf_node<>null then begin hu[i]:=c; {restore the character in the hyphen position} l:=i; decr(i); end @y if hyf_disc[xchg_k]<>null then begin { Restore the original word in |hu| } i:=xchg_k; l:=i+1; for xchg_k:=1 to xchg_i do begin hu[xchg_hu]:=xchg_char[xchg_k]; incr(xchg_hu); end;{for} end else if hyf_node<>null then begin hu[i]:=c; {restore the character in the hyphen position} l:=i; decr(i); end @z ========================================================================= MAIN CHANGE 1) Finally, we come to the SISISI hyphenation. At first, we define two variables: @x @ Assuming that these auxiliary tables have been set up properly, the @y @!funktioniert: boolean; @!zerleg: integer; @ Assuming that these auxiliary tables have been set up properly, the @z ------------------------------------------------------------------------- And off we go; at first only slowly, ... @x @= @y {\sl Here, the German Hyphenation of W.Barth and H.Nirschl has been added!} @= @z ------------------------------------------------------------------------- ... but now we kill all of Liang's code and replace it by some re- and back-en- and decoding, a call to abteilen, and the construction of the special discretionaries. Then we go right into the SISISI hyphenation. ========================================================================= @x if trie_char(cur_lang+1)<>qi(cur_lang) then return; {no patterns for |cur_lang|} hc[0]:=0; hc[hn+1]:=0; hc[hn+2]:=256; {insert delimiters} for j:=0 to hn-r_hyf+1 do begin z:=trie_link(cur_lang+1)+hc[j]; l:=j; while hc[l]=qo(trie_char(z)) do begin if trie_op(z)<>min_quarterword then @; incr(l); z:=trie_link(z)+hc[l]; end; end; found: for j:=0 to l_hyf-1 do hyf[j]:=0; for j:=0 to r_hyf-1 do hyf[hn-j]:=0 @ @= begin v:=trie_op(z); repeat v:=v+op_start[cur_lang]; i:=l-hyf_distance[v]; if hyf_num[v]>hyf[i] then hyf[i]:=hyf_num[v]; v:=hyf_next[v]; until v=min_quarterword; end @ The exception table that is built by \TeX's \.{\\hyphenation} primitive is @y for j:=0 to hn do hc_si[j] := ktab[hc[j]]; si_debug(2) si_put(' hc:'); for j:=0 to hn do si_put( xchr[hc[j]] ); si_put_ln(':'); si_put(' hc_si:'); for j:=0 to hn do si_put( xchr[rev_tab[hc_si[j]]] ); si_put_ln(':'); gubed_si; si_debug(4) si_put_ln('< decimal character codes >'); si_put(' hc:'); for j:=0 to hn do begin si_put(hc[j]:3,'-'); if ((j+1) mod 16)=0 then begin si_put_cr; si_put(' '); end; end; si_put_cr; si_put(' hc_si:'); for j:=0 to hn do begin si_put( rev_tab[hc_si[j]]:3,'-' ); if ((j+1) mod 16)=0 then begin si_put_cr; si_put(' '); end; end; si_put_cr; si_put_cr; gubed_si; abteilen(hc_si,hyf_si,hn,funktioniert,zerleg); for j:=0 to 64 do begin hyf_disc[j]:=null; hyfpen[j]:=inf_penalty; end; if funktioniert then for j:=0 to hn do begin if hyf_si[j]=neben then hyfpen[j] := neben_hyphen_penalty else if hyf_si[j]=haupt then hyfpen[j] := hyphen_penalty else if hyf_si[j]=nebenck then begin hyfpen[j+1] := neben_hyphen_penalty; s:=get_node(small_node_size); {discretionary node erzeugen} type(s):=disc_node; hyf_disc[j+1]:=s; {|pre_break|-Liste: } pre_break(s):=new_penalty(neben_hyphen_penalty); s:=pre_break(s); subtype(s):=0; { number of additional characters } link(s):=new_character(hf,hu[j+2]); {'k' oder'K'} link(link(s)):=null; s:=hyf_disc[j+1]; post_break(s):=null; link(s):=null; end else if hyf_si[j]=haupt3 then begin hyfpen[j-1]:=hyphen_penalty; s:=get_node(small_node_size); {discretionary node erzeugen} type(s):=disc_node; hyf_disc[j-1]:=s; {|pre_break|-Liste: } pre_break(s):=new_penalty(hyphen_penalty); s:=pre_break(s); subtype(s):=1; link(s):=new_character(hf,hu[j]); { Dreifachkonsonant } link(link(s)):=null; {|post_break|- und |link|-Liste: } s:=hyf_disc[j-1]; post_break(s):=null; link(s):=null; end end; {|if funktioniert|} si_debug(8) if funktioniert then begin for j:=0 to hn do if hyfpen[j]<>inf_penalty then begin if hyf_disc[j]<>null then begin if subtype(pre_break(hyf_disc[j]))=0 then si_put( xchr[hu[j+1]], '-' ) else {if subty...} si_put( xchr[hu[j+1]], xchr[hu[j+1]], '=' ); { i hope so } end else begin si_put( xchr[hu[j]] ); if hyfpen[j]=hyphen_penalty then si_put('=') else si_put('-'); end; end else begin si_put( xchr[hu[j]] ); end; end else begin si_put_ln(' no hyphens were found in this word!'); end; si_put_cr; gubed_si; goto done1; found: hyfpen[0]:=inf_penalty; for j:=1 to hn do if odd(hyf[j]) then hyfpen[j]:=hyphen_penalty else hyfpen[j]:=inf_penalty; done1: @ Das folgende Modul ersetzt die gesamte Liang'sche Trennung. Aufgabe dieses Modules ist es, ein in hc (ein Array of 0..255 ) liegendes Wort nach Trennstellen zu untersuchen und die gefundenen Trennstellen in hyf (ebenfalls ein Array of neben...) an den entsprechenden Stellen einzutragen. Die genaue Beschreibung der Vorgangsweise findet man im INSTITUTSBERICHT NR. 26, TECHNISCHE UNIVERSIT\"AT WIEN, INSTITUT F\"UR PRAKTISCHE INFORMATIK. Alle Prozeduren f\"ur die Silbentrennung \SISISI, die bei jedem Start von TeX aufgerufen werden. @ Dieser Modul beinhaltet alle f\"ur die neue Silbentrennung |abteilen| n\"otigen Initialisierungen bzw. Prozeduren, die nur einmal am Anfang ausgef\"uhrt werden sollen. @= inittrennen; @ @= maxwl = 64; { Maximale Wortlaenge } sonderzeichen= 0; bst_a= 1; bst_b= 2; bst_c= 3; bst_d= 4; bst_e= 5; bst_f= 6; bst_g= 7; bst_h= 8; bst_i= 9; bst_j=10; bst_k=11; bst_l=12; bst_m=13; bst_n=14; bst_o=15; bst_p=16; bst_q=17; bst_r=18; bst_s=19; bst_t=20; bst_u=21; bst_v=22; bst_w=23; bst_x=24; bst_y=25; bst_z=26; bst_ae=27; bst_oe=28; bst_ue=29; bst_sz=30; trennzeichen=31; keine = 1; neben = 2; nebenck = 3; haupt = 4 ; haupt3 = 5; tabsize = 8191; { 2**13 - 1 } spec_letter_ID = 4321; { const definitionen } vok_a = 1; vok_e = 2; vok_i = 3; vok_o = 4; vok_u = 5; vok_ae = 6; vok_oe = 7; vok_ue = 8; vok_ie = 9; vok_ai = 10; vok_ei = 11; vok_au = 12; vok_aeu= 13; vok_eu = 14; vok_y = 15; kons = 16; @ @= string80 = packed array [0..80] of eight_bits; buchstabe = eight_bits; a_word = array[0..maxwl] of 0..256; wort = a_word; sperrvektor=array [0..maxwl] of boolean; dudt=array [0..maxwl] of boolean;{frueher kein type !!} { trennung = (keine, neben, nebenck, haupt, haupt3); } trennvektor = array [0..maxwl] of integer; { Trennstelle bei i ==> zwischen Buchstabe i und i+1 wird getrennt } ktabtype = packed array [0..255] of buchstabe; verdschlue = 0..8388647; { 2**23 - 1 } { -PRAK!- unsigned long in C } varr = array [0..maxwl] of verdschlue; intarr = array [0..maxwl] of integer; { darf nicht als var Parameter uebergeben werden !!} infotype = record { frueher ein variantenRecord } ausnahme:boolean; untrennbar:boolean; { erste :0..7; zweite:0..7; } erste :integer; zweite:integer; endung,vorsilbe,stamm,ehervor,stammallein:boolean; end; infobyte = eight_bits; { darf nicht als var Parameter uebergeben werden !!} tableentry=record tabv:verdschlue; tabb:infobyte; frei:boolean; wiederfrei:boolean; end; hashelement = packed array [0..3] of eight_bits; htabtype = packed array [0..tabsize] of hashelement; @ @= ktab : ktabtype; htab : htabtype; anzahl : integer; hashfile : alpha_file; kombtab : array [1..5, 1..5] of integer; konsonant : array [1 .. 30] of boolean; {array [bst_a .. bst_sz]} rev_tab: array[0..255]of integer; @ Prozedurdefinitionen function dig_count( v: integer): integer; { calculates the number of decimal digits of ``v'' } var x, e, c: integer; begin x := v div 10; e:=1; c:=1; while e<=x do begin e := e * 10; incr(c); end; dig_count:=c; end; @= procedure hashfunk (var w:wort; anfang,ende:integer; var v:varr; var ind,k:intarr ); { -PRAK!- in C long statt integer } var vacc, indacc, kacc : integer; i : integer; begin vacc := 31415; indacc := 152; kacc := 271; dbg_put_ln(' * HFNC ='); for i := anfang to ende do begin dbg_put(' * HFNC ',i-anfang:3,' A'); vacc := (vacc * 31 + w [i]) mod 8388648; dbg_put('B'); v[i] := vacc; dbg_put('C'); indacc := (indacc * 33 + w[i]) mod (tabsize+1); dbg_put('D'); ind[i] := indacc; dbg_put('E'); kacc := (kacc * 15 + w[i]) mod ((tabsize+1) div 2); dbg_put('F'); k[i] := 2*kacc + 1; dbg_put_ln('G'); end; {|for|} end; {|hashfunk|} procedure unpackinfo ( b:infobyte; { var i:infotype } var iausnahme :boolean; var iuntrennbar :boolean; var ierste :integer; var izweite :integer; var iendung :boolean; var ivorsilbe :boolean; var istamm :boolean; var iehervor :boolean; var istammallein:boolean ); begin { with i do begin } if odd(b) then begin { 87654321 } iausnahme:= true; b:= b div 2; { 08765432 } if odd(b) then iuntrennbar:= true else iuntrennbar:= false; b:=b div 2; { 00876543 } ierste := b mod 8; { 00876xxx } izweite := b div 8; { 00000876 } end else begin iausnahme:= false; { 87654321 } b:= b div 2; { 08765432 } if odd(b) then iendung:= true else iendung:= false; b:= b div 2; { 00876543 } if odd(b) then ivorsilbe:= true else ivorsilbe:= false; b:= b div 2; { 00087654 } if odd(b) then istamm:= true else istamm:= false; b:= b div 2; { 00008765 } if odd(b) then iehervor:= true else iehervor:= false; b:= b div 2; { 00000876 } if odd(b) then istammallein:= true else istammallein:= false; end; {|if-then-else|} { end; |with|} end; {|unpackinfo|} procedure unpackentry( h:hashelement; { var e:tableentry } var etabv:verdschlue; var etabb:infobyte; var efrei:boolean; var ewiederfrei:boolean ); begin { with e do begin } dbg_put_cr; dbg_put(' ==> UPCKNTY 1'); etabv:= h[0]+h[1]*256+h[2]*256*256; dbg_put(' 2'); if odd(h[3]) then begin { Ausnahme } dbg_put(' 3 A'); etabb:= h[3]; dbg_put('1'); efrei:= false; ewiederfrei:= false; end else begin dbg_put(' 3 B'); etabb:= h[3] mod 64; dbg_put('1'); { h[3]:=h[3] div 64;} {-PRAK!- in C ein var-Paramter, darf daher nicht verndert werden. } dbg_put('2'); { durch folgende 2 Abfragen ersetzt. -PRAK!- } if odd(h[3] div 64) then efrei := true else efrei := false; if odd(h[3] div 128) then ewiederfrei := true else ewiederfrei := false; end; dbg_put_ln('- 4'); { end; |with|} end; {|unpackentry|} procedure hashsuch ( v:verdschlue; ind,k : integer; { var i:infotype; } var iausnahme :boolean; var iuntrennbar :boolean; var ierste :integer; var izweite :integer; var iendung :boolean; var ivorsilbe :boolean; var istamm :boolean; var iehervor :boolean; var istammallein:boolean; var g:boolean ); { sucht Eintragung in der Hashtabelle, g ist TRUE falls gefunden } var gef,ngef:boolean; j:integer; entry : tableentry; begin gef := false; ngef := false; j := ind; repeat unpackentry(htab[j],entry.tabv,entry.tabb,entry.frei,entry.wiederfrei); if entry.frei then ngef := true else if (v = entry.tabv) and not entry.wiederfrei then begin unpackinfo(entry.tabb,iausnahme,iuntrennbar,ierste,izweite, iendung,ivorsilbe,istamm,iehervor,istammallein); gef := true; end else j := (j+k) mod (tabsize + 1) until gef or ngef; g := gef; end; {|hashsuch|} procedure hashload ( var status:integer ); { einlesen der Hash-Tabelle von file fn } var i : integer; e : tableentry; begin { Orginal sitex3.cha -PRAK!- for i:=1 to file_name_size do name_of_file:= ' '; name_of_file[1]:='h'; name_of_file[2]:='f'; name_of_file[3]:='3'; name_length := 3; anzahl := 0; if a_open_in(hashfile) then begin } { geaendert !! -PRAK!- } for i:=1 to file_name_size do name_of_file[i]:= ' '; name_of_file[1]:='h'; name_of_file[2]:='f'; name_of_file[3]:='3'; name_length := 3; anzahl := 0; { -PRAK!- texinputpath fr C notwendig } if a_open_in(hashfile, TEX_INPUT_PATH) then begin for i := 0 to tabsize do begin { -PRAK!- in C Probleme mit den typen beim einlesen } read (hashfile, htab[i][0]); { htab = packed array [0..tabsize] of } read (hashfile, htab[i][1]); { hashelement, und } read (hashfile, htab[i][2]); { hashelement = packed array 0..3 } read (hashfile, htab[i][3]); unpackentry (htab[i],e.tabv,e.tabb,e.frei,e.wiederfrei); if not e.frei and not e.wiederfrei then anzahl := anzahl + 1; end; {|for|} status:= 0; a_close(hashfile); end else status:= 1; end; {|hashload|} procedure inittrennen; var ch: 0..255; bst: buchstabe; i, status: integer; is_iniTeX: boolean; begin { Konversionstabelle |ktab| initialisieren } { ======================================== } for ch := 0 to 255 do begin { if (ch in [" ",",",";",":","-","/","?","!","(",")",".","""","'","^"]) then } case ch of " ",",",";",":","-","/","?","!","(",")",".","""","'","^": if ch="." then ktab[ch]:= "." else ktab[ch] := trennzeichen { else ktab[ch] := sonderzeichen geht nicht !!} end; { case } if (ktab[ch] <> ".") and (ktab[ch] <> trennzeichen) then ktab[ch] := sonderzeichen; end; { for } for bst := bst_a to bst_z do begin ktab[ord('A')+ord(bst)-ord(bst_a)]:=bst; ktab[ord('a')+ord(bst)-ord(bst_a)]:=bst end; { Umlaute und scharfes s - f\"ur ``production \TeX version'' } ktab[255]:=bst_sz; ktab[246]:=bst_oe; ktab[252]:=bst_ue; ktab[228]:=bst_ae; { Umlaute und scharfes s - f\"ur ``initialization \TeX version'' } ktab["1"]:=bst_sz; ktab["2"]:=bst_ue; ktab["3"]:=bst_oe; ktab["4"]:=bst_ae; ktab[25]:=bst_sz; {?-wieso? HH} {testing:} for i:=0 to 255 do rev_tab[ktab[i]]:=i; rev_tab[ktab[255]]:=223; xchr[223]:=chr(223); xchr[228]:=chr(228); xchr[246]:=chr(246); xchr[252]:=chr(252); xchr[255]:=chr(223); xchr[ 25]:=chr(223); {:end of testing} { Kombinationstabelle |kombtab| initialisieren } { ============================================ } { Diese Tabelle muss symetrisch sein, kombtab[a,b] = kombtab[b,a] } kombtab[ keine, keine] := keine; kombtab[ keine, neben] := keine; kombtab[ keine,nebenck] := keine; kombtab[ keine, haupt] := keine; kombtab[ keine, haupt3] := keine; kombtab[ neben, keine] := keine; kombtab[ neben, neben] := neben; kombtab[ neben,nebenck] := keine; kombtab[ neben, haupt] := neben; kombtab[ neben, haupt3] := keine; kombtab[nebenck, keine] := keine; kombtab[nebenck, neben] := keine; kombtab[nebenck,nebenck] := nebenck; kombtab[nebenck, haupt] := keine; kombtab[nebenck, haupt3] := keine; kombtab[ haupt, keine] := keine; kombtab[ haupt, neben] := neben; kombtab[ haupt,nebenck] := keine; kombtab[ haupt, haupt] := haupt; kombtab[ haupt, haupt3] := keine; kombtab[ haupt3, keine] := keine; kombtab[ haupt3, neben] := keine; kombtab[ haupt3,nebenck] := keine; kombtab[ haupt3, haupt] := keine; kombtab[ haupt3, haupt3] := haupt3; { Konsonantentabelle initialisieren } for bst:=bst_a to bst_sz do konsonant[bst] := true; konsonant[bst_a] := false; konsonant[bst_e] := false; konsonant[bst_i] := false; konsonant[bst_o] := false; konsonant[bst_u] := false; konsonant[bst_ae] := false; konsonant[bst_oe] := false; konsonant[bst_ue] := false; { ================== Hashtabelle |htab| initialisieren ====================== Die Prozedur ``hashload'' darf im Initialisierungslauf von TeX auf keinen Fall aufgerufen werden. Deshalb wird zun\"achst festgestellt, ob dieser Programmlauf ein Initialisierungslauf ist oder nicht. Abh\"angig davon wird ``hashload'' nur aufgerufen, wenn dies ein ``Arbeits''-lauf (production-version) von \TeX ist. =============================================================================} is_iniTeX:=false; init is_iniTeX:=true; tini if not is_iniTeX then begin hashload(status); if status <> 0 then begin si_put_cr; si_put_ln (' * * * * * * * * * *!'); si_put_ln('**** ERROR: Can not read hash-file!'); si_put_cr; jump_out; end; end; {|not iniTeX|} end; {|inittrennen|} procedure packinfo ( i:infotype; var b : infobyte ); { Byte: /zweite/zweite/zweite/erste/erste/erste/untrennbar/ausnahm/ } { bzw.: /0 /0 /stall /eherv/stamm/vors /endung /0 / } begin b:=0; dbg_put_cr; { with i do begin } if i.ausnahme then begin dbg_put(' ### PCKINF 1A utr:'); if i.untrennbar then dbg_put('Y') else dbg_put('N'); dbg_put_ln(',',ord(i.untrennbar):3,' 1.',i.erste:3,' 2.',i.zweite:3); b:=1+2*ord(i.untrennbar)+4*i.erste+32*i.zweite ; dbg_put('B'); end else begin dbg_put(' ### PCKINF 2A'); b:= 2*ord(i.endung)+4*ord(i.vorsilbe)+8*ord(i.stamm)+16*ord(i.ehervor) +32*ord(i.stammallein); dbg_put('B'); end; { end; with } dbg_put_ln('-3'); end; {|packinfo|} procedure packentry( e:tableentry; var h:hashelement ); { /infobyte+frei,wiederfrei/verdschhigh/verdschmiddle/verdschllow/ } begin { with e do begin } dbg_put_cr; dbg_put(' +++ PCKNTY A'); h[0]:= e.tabv mod 256; { 87654321abcdefghxxxxxxxx} dbg_put('B'); e.tabv:= e.tabv div 256; { 0000000087654321abcdefgh} dbg_put('C'); h[1]:= e.tabv mod 256; { 0000000087654321xxxxxxxx} dbg_put('D'); h[2]:= e.tabv div 256; { 000000000000000087654321} dbg_put('E'); if odd(e.tabb) then h[3]:= e.tabb { Ausnahme !! } else begin dbg_put('F1'); h[3]:=e.tabb+64*ord(e.frei)+128*ord(e.wiederfrei); { wfxxxxxx } dbg_put('-2'); end; dbg_put_ln('G'); { end; |with|} end; {|packentry|} procedure hashetr ( var w:wort; laenge:integer; i:infotype; var g:boolean ); var v1: varr; ind1, k1: intarr; v: verdschlue; ind, k, j, e: integer; gef,ngef:boolean; entry: tableentry; helem: hashelement; begin dbg_put_ln(' ** HTR 1'); if anzahl >= tabsize then begin dbg_put_ln(' ** HTR 2a'); g := false; end else begin dbg_put_ln(' ** HTR 2b'); if (anzahl < 10) or (anzahl mod 100 = 0) then begin si_put_cr; si_put(' -Anzahl:',anzahl,'. '); end; dbg_put_ln(' ** HTR 3'); hashfunk(w,1,laenge,v1,ind1,k1); dbg_put_ln(' ** HTR 4'); v := v1[laenge]; ind := ind1[laenge]; k := k1[laenge]; gef := false; ngef := false; repeat dbg_put_ln(' ** HTR 5 A'); unpackentry(htab[ind],entry.tabv,entry.tabb,entry.frei,entry.wiederfrei); dbg_put_ln(' ** HTR B'); if entry.frei or entry.wiederfrei then begin anzahl := anzahl + 1; ngef := true; entry.tabv := v; dbg_put_ln(' ** HTR C1'); packinfo (i,entry.tabb); dbg_put_ln(' ** HTR D1'); entry.frei := false; entry.wiederfrei := false; dbg_put_ln(' ** HTR E1'); packentry(entry,helem); dbg_put_ln(' ** HTR F1'); htab[ind] := helem; { -PRAK!- in C memcpy() } end else begin dbg_put_ln(' ** HTR C2'); gef := v = entry.tabv; if not gef then ind := (ind+k) mod (tabsize + 1); dbg_put_ln(' ** HTR D2'); end; until gef or ngef; dbg_put_ln(' ** HTR 6'); g := ngef; end end; {|hashetr|} procedure hashempty; var i:integer; elem : tableentry; he : hashelement; begin { with elem do begin } elem.tabv:=0; elem.tabb := 0; { !!!!!! Irgend ein Affe hat diese Initialisierung vergessen. ( Und dies ist eine Beleidigung fuer jeden Affen ) HM } elem.frei := true; elem.wiederfrei := false; { end; } packentry (elem, he); for i := 0 to tabsize do htab[i] := he; { -PRAK!- in C memcpy() } anzahl := 0; end; {|hashempty|} procedure hashsave (var status:integer); { abspeichern der Hash-Tabelle auf file } var i: integer; begin for i:=1 to file_name_size do name_of_file[i]:=' '; name_of_file[1]:='h'; name_of_file[2]:='f'; name_of_file[3]:='3'; name_length:=3; if a_open_out(hashfile) then begin for i := 0 to tabsize do begin write (hashfile, htab[i][0]:4); write (hashfile, htab[i][1]:4); write (hashfile, htab[i][2]:4); write (hashfile, htab[i][3]:4); write_ln (hashfile); end; status:=0; a_close(hashfile); end else status:=1; end; {|hashsave|} function hashsize :integer; begin hashsize:=anzahl; end; procedure zahl(line:string80;var i:integer; ende:integer; var j:integer); var aus: boolean; begin j:=0; repeat if i<=ende then if ktab[line[i]]="." then i:=i+1 else aus:=true else aus:=true until aus; repeat if i<=ende then begin { if line[i] in ["0".."9"] then begin } case line[i] of "0","1","2","3","4","5","6","7","8","9": begin j:=10*j+line[i]-"0";i:=i+1 end; { else aus:=true geht nicht !!! } end; if (line[i] <> "0") and (line[i] <> "1") and (line[i] <> "2") and (line[i] <> "3") and (line[i] <> "4") and (line[i] <> "5") and (line[i] <> "6") and (line[i] <> "7") and (line[i] <> "8") and (line[i] <> "9") then aus :=true; end else aus:=true until aus; end; {|zahl|} procedure infobau(line: string80; anfang,ende: integer; { var inform: infotype;} var informausnahme :boolean; var informuntrennbar :boolean; var informerste :integer; var informzweite :integer; var informendung :boolean; var informvorsilbe :boolean; var informstamm :boolean; var informehervor :boolean; var informstammallein:boolean; var g: boolean); var i,j:integer;ok,fehler:boolean; b:buchstabe; begin g:=false; { with inform do begin } { initialisierung } informausnahme:=false; informendung:=false; informvorsilbe:=false; informehervor:=false; informstamm:=false; informstammallein:=false; i:=anfang; ok:=false; fehler:=false; repeat if i<=ende then if line[i]="." then i:=i+1 else ok:=true else fehler:=true until ok or fehler; if not fehler then begin b:=ktab[line[i]]; if b = bst_a then begin {ausnahme} informausnahme:=true; i:=i+1; if ktab[line[i]] = bst_u then begin i := i + 1; informuntrennbar := true; end else informuntrennbar := false; zahl(line,i,ende,j); if j=0 then begin informerste:=7; informzweite:=7; { Record-komponente ``zweite'' wird } { initialisiert, damit beim packen keine } { undefinierten Ergebnisse entstehen (die u.U. zu } { RUN-TIME-ERRORS f\"uhren k\"onnen!) } end else begin j:=j-2; if (j>=0) and (j<=6) then informerste:=j else fehler:=true; zahl(line,i,ende,j); if j=0 then informzweite:=7 else begin j:=j-informerste-4; if (j>=0) and (j<=6) then informzweite:=j else fehler:=true; end; end; if not fehler then g:=true; end {|if b = bst_a |} else begin ok:=false; repeat case b of {bst_v} 22: informvorsilbe:=true; {bst_e} 5: informendung:=true; {bst_s} 19: informstamm:=true; {bst_b} 2: informehervor:=true; {bst_t} 20: informstammallein:=true; {trennzeichen} 31:; othercases fehler:=true endcases; if i=ende then ok:=true else begin i:=i+1; b:=ktab[line[i]] end until ok or fehler; if not fehler then g := (informvorsilbe or informendung or informstamm) and (not informehervor or informvorsilbe and informendung) and (not informstammallein or informstamm); end; {|if b = bst_a then-else|} end {|if not fehler|} { end |with inform|} end; {|infobau|} procedure eintragen (line: string80; l: integer ); var i, laenge: integer; inform: infotype; w: wort; g: boolean; begin laenge := 1; while line [laenge] <> "." do laenge := laenge + 1; laenge := laenge -1; if laenge >= 1 then begin if laenge <= maxwl then begin infobau(line,laenge+1,l, inform.ausnahme,inform.untrennbar, inform.erste,inform.zweite,inform.endung, inform.vorsilbe,inform.stamm,inform.ehervor, inform.stammallein,g); if not g then begin si_put_cr; si_put(' Info falsch: ') end else begin for i:= 1 to laenge do w[i]:=ktab[line[i]]; hashetr(w,laenge,inform,g); if not g then begin si_put_cr; si_put(' Tabellenfehler: '); end; end; if not g then begin for i:=1 to l do si_put( xchr[line[i]] ); si_put_cr; end end {|if laenge <= maxwl|} else begin si_put_cr; si_put_ln(' zu langes Wort: '); for i:=1 to l do si_put( xchr[line[i]] ); si_put_cr; end end {|if laenge >= 1|} else begin si_put_cr; si_put_ln(' Falsche Zeile: '); for i:=1 to l do si_put( xchr[line[i]] ); si_put_cr; end end; {|eintragen|} {---------------------------------------------------------------------------} { naechsterbst -> naechsterbst(i,tr,dudstop,b,w) } procedure naechsterbst(var i:integer;var tr:trennvektor; var dudstop:sperrvektor;var b:buchstabe; var w:wort); begin if (i>0) and (tr[i]=keine) and not dudstop[i] then begin b:=w[i];i:=i-1 end else b:=trennzeichen end; {---------------------------------------------------------------------------} { duden(tr) -> duden(tr,w,tr1,dudtr,dud,dudstop,laenge,ok,zerlegungen) } procedure duden (tr:trennvektor; var w:wort; var tr1 :trennvektor; var dudtr:trennvektor; dud :dudt; { array [0..maxwl] of boolean; } var dudstop:sperrvektor; laenge : integer; var ok :boolean; var zerlegungen:integer ); { traegt in dudtr die Trennstellen nach Dudenregeln ein } { frueher mtype laut = (.....) } var i:integer; j:integer; zust:integer; aus:boolean; tre:integer; letzte:integer; l,lalt:integer;{ vormals laut } b:buchstabe; begin { duden } i:=laenge; for j := 0 to laenge do dudtr[j] := tr[j]; while (i>0) and not dud [i] do i:=i-1; while i>0 do begin zust:=1; aus:=false; letzte := 0; l:=kons; b:=w[i]; i:=i-1; repeat {naechsten Laut lesen} { procedure nlaut; } begin { nlaut } tre:=i+1; lalt:=l; case b of {bst_a } 1:begin l:=vok_a; naechsterbst(i,tr,dudstop,b,w) end; {bst_o } 15:begin l:=vok_o; naechsterbst(i,tr,dudstop,b,w) end; {bst_ue} 29:begin l:=vok_ue; naechsterbst(i,tr,dudstop,b,w) end; {bst_oe} 28:begin l:=vok_oe; naechsterbst(i,tr,dudstop,b,w) end; {bst_ae} 27:begin l:=vok_ae; naechsterbst(i,tr,dudstop,b,w) end; {bst_e } 5:begin l:=vok_e; naechsterbst(i,tr,dudstop,b,w); if b = bst_i then begin l:=vok_ie; naechsterbst(i,tr,dudstop,b,w); if b = bst_e then begin {ungetbst = } i := i + 1; l:=vok_e; b := bst_i; end; end end; {bst_i} 9:begin l:=vok_i; naechsterbst(i,tr,dudstop,b,w); if b = bst_e then begin l:=vok_ei; naechsterbst(i,tr,dudstop,b,w) end else if b = bst_a then begin l:=vok_ai; naechsterbst(i,tr,dudstop,b,w) end end; {bst_u} 21:begin l:=vok_u; naechsterbst(i,tr,dudstop,b,w); if b = bst_e then begin l:=vok_eu; naechsterbst(i,tr,dudstop,b,w) end else if b = bst_a then begin l:=vok_au; naechsterbst(i,tr,dudstop,b,w) end else if b = bst_ae then begin l:=vok_aeu; naechsterbst(i,tr,dudstop,b,w) end else if b = bst_q then begin l:=kons; naechsterbst(i,tr,dudstop,b,w) end end; {bst_t} 20:begin l:=kons; naechsterbst(i,tr,dudstop,b,w); if b = bst_s then naechsterbst(i,tr,dudstop,b,w) end; {bst_h} 8:begin l:=kons; naechsterbst(i,tr,dudstop,b,w); if b = bst_c then begin naechsterbst(i,tr,dudstop,b,w); if b = bst_s then naechsterbst(i,tr,dudstop,b,w) end else if b = bst_p then naechsterbst(i,tr,dudstop,b,w) else if b = bst_t then naechsterbst(i,tr,dudstop,b,w) end; {bst_b, bst_c, bst_d, bst_f, bst_g, bst_j, bst_k, bst_l, bst_m, bst_n, bst_p, bst_q, bst_r, bst_s, bst_v, bst_w, bst_x, bst_z, bst_sz } 2,3,4,6,7,10,11,12,13,14,16,17,18,19,22,23,24,26,30 : begin l:=kons; naechsterbst(i,tr,dudstop,b,w) end; {bst_y} 25:begin l:=vok_y; naechsterbst(i,tr,dudstop,b,w); end; {trennzeichen,sonderzeichen} 31,0 :aus:=true end; end; { nlaut } if not aus then {Naechster Zustand} case zust of 1:if l=kons then zust:=2 else zust:=4; 2:if l<>kons then zust:=3; 3:if l=kons then zust:=6 else if l<>lalt then begin zust:=4; letzte:=tre; dudtr[letzte]:=neben end; 4:if l=kons then zust:=6 else zust:=3; 6:if l=kons then begin zust:=2; letzte:=tre; if (w[tre]=bst_c) and (w[tre+1] = bst_k) then dudtr[letzte]:=nebenck else dudtr[letzte]:=neben end else begin zust:=4; letzte:=tre; dudtr[letzte]:=neben end end until aus; if zust=2 then dudtr[letzte]:=keine else if zust=4 then { if not (lalt in [vok_ai,vok_ei,vok_au,vok_aeu,vok_eu,vok_ie]) then } if (lalt < vok_ie) or (lalt > vok_eu) then dudtr[letzte]:=keine; while (i>0) and not dud[i] do i:=i-1 end; end { duden }; {---------------------------------------------------------------------------} { procedure trennen (zustand : integer; anfang: integer; spv:sperrvektor; var ok1 : boolean); } procedure trennen ( { trennen } zustand : integer; anfang : integer; spv : sperrvektor; var ok1 : boolean; { abteilen } var tr :trennvektor; var dudtr:trennvektor; var zerlegungen:integer ; var dud:dudt; { array [0..maxwl] of boolean; } var v:varr; var ind,k:intarr; var dudstop:sperrvektor; { abteilen aufruf } var w:wort; var tr1:trennvektor; laenge : integer ); { Zustand : 1 ... Es kann nur Vorsilbe oder Stamm kommen, rechts muss noch ein Stamm kommen. 2 ... Es kann Endung, Vorsilbe oder Stamm kommen. 3 ... Es kann nur Vorsilbe oder Stamm kommen, im linken Teil muessen noch Trennstellen eingefuegt werden, rechts muss noch ein Stamm kommen. 4 ... Es kann nur Vorsilbe oder Stamm kommen, im linken Teil muessen noch Trennstellen eingefuegt werden. 5 ... Trennstelle mit entfallenem Konsonanten } var gef, nichtok:boolean; schnitt:integer; inform:infotype; i:integer; ok_stamm, ok_vor, ok_end, ok_help:boolean; tri:integer; { p_stamm } i1,tre:integer; ok : boolean; stop_ptr :integer; spvtmp :sperrvektor; { lokale Kopie des Sperrvektors, fr die bersetzung nach C notwendig. -PRAK!- } { procedure p_endung(var endok:boolean); } { procedure p_vorsilbe(var vorok:boolean); } { procedure p_stamm(var stok:boolean); } { procedure p_stammallein (var staok:boolean); } begin { trennen } spvtmp := spv; ok1:=false; if anfang = laenge + 1 then { Ende des Wortes erreicht} if (zustand = 2) or (zustand=4) then {erfolgreich zerschnitten } begin dud[anfang-1]:=true; {duden(tr);} duden(tr,w,tr1,dudtr,dud,dudstop,laenge,ok,zerlegungen); for i := laenge downto 1 do begin if dudtr[i-1] = haupt3 then begin dudtr[i-1] := keine; tri := haupt3; end else tri := dudtr[i]; if zerlegungen = 0 then tr1[i] := tri else tr1[i] := kombtab[tr1[i],tri]; end; zerlegungen:=zerlegungen+1; ok1:= true end else { kein Stamm im Wortrest ok1 = false } else { restlichen Teil zerschneiden } begin hashfunk (w,anfang,laenge,v,ind,k); schnitt := laenge; nichtok := false; repeat { links abschneiden } repeat if spv[schnitt] then { Schnittstelle gesperrt } gef:=false else begin hashsuch (v[schnitt],ind[schnitt],k[schnitt], inform.ausnahme,inform.untrennbar, inform.erste,inform.zweite,inform.endung, inform.vorsilbe,inform.stamm,inform.ehervor, inform.stammallein,gef); end; if gef then spv[schnitt]:=true else schnitt := schnitt - 1 until gef or (schnitt = anfang - 1); if gef then { teilwortanfang ist bekanntes wort } begin if not inform.ausnahme then begin if inform.endung and (zustand=2) then begin { --- p_endung(ok_end); --- } ok_end:=false; if (w[anfang]=bst_c) or (w[anfang]=bst_h) then begin dud[anfang-1]:=true; tr[anfang-1]:=neben end; { Fugen 's' bevorzugen } if schnitt < laenge then if (w[schnitt+1] = bst_s) and not spv[schnitt+1] then begin spv[schnitt+1]:=true; trennen (2, schnitt+2, spv, ok_help, tr,dudtr,zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); ok_end := ok_help end; trennen (2, schnitt+1, spv, ok_help, tr,dudtr,zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); ok_end := ok_end or ok_help; {eingetragene Trennstellen loeschen} tr[anfang-1]:=keine; dud[anfang-1]:=false; { --- p_endung(ok_end) - ende --- } ok1 := ok1 or ok_end end else ok_end:=false; if inform.vorsilbe then begin { --- p_vorsilbe(ok_vor); --- } { Trennstelle(n) vor der Vorsilbe eintragen } if zustand = 1 then tr[anfang-1]:=neben else begin dud[anfang-1]:=true; if zustand = 5 then tr[anfang-1] := haupt3 else tr[anfang-1]:=haupt; end; { Trennstellen innerhalb der Vorsilbe eintragen } dud[schnitt]:=true; { Rest des Wortes zerschneiden } trennen(1, schnitt+1, spv, ok_vor, tr,dudtr,zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); {eingetragene Trennstellen loeschen} tr[anfang-1]:=keine; dud[anfang-1]:=false; dud[schnitt]:=false; { --- p_vorsilbe(ok_vor) - ende --- } ok1 := ok1 or ok_vor end else ok_vor := false; if inform.stamm and not ok_vor then begin if not inform.stammallein then begin { --- p_stamm(ok_stamm); --- } ok_stamm := false; stop_ptr := 0; if zustand = 1 then tr[anfang-1]:=neben else begin dud[anfang-1]:=true; if zustand = 5 then tr[anfang-1] := haupt3 else tr[anfang-1]:=haupt end; {ev. Ausnahmetrennstellen eintragen} if inform.ausnahme then begin if inform.erste<>7 then begin tre:=anfang+inform.erste+1; tr[tre]:=neben; if inform.zweite<>7 then begin tre:=tre+inform.zweite+2; if inform.untrennbar then begin stop_ptr := tre; dudstop[stop_ptr] := true; end else tr[tre]:=neben end else begin if inform.untrennbar then begin tr[tre] := keine; stop_ptr := tre; dudstop[tre]:=true; end end end else begin {inform.erste = 7} stop_ptr := schnitt-1; dudstop[stop_ptr] := true end end; { Fugen 's' bevorzugen } if schnitt < laenge then if (w[schnitt+1] = bst_s) and not spv[schnitt+1] then begin spv [schnitt+1] := true; trennen (2, schnitt+2, spv, ok_help,tr,dudtr, zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); ok_stamm := ok_help end; { Sonderbehandlung eines in der Wortfuge entfallenen mitlauts } if (schnitt >=2) and (schnitt < laenge) then if konsonant[w[schnitt]] and not konsonant[w[schnitt+1]] and (w[schnitt] = w[schnitt-1]) then begin trennen(5, schnitt, spv, ok_help,tr,dudtr, zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); ok_stamm := ok_stamm or ok_help; end; { restlichen Teil zerschneiden } trennen (2, schnitt+1, spv, ok_help,tr,dudtr, zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); ok_stamm := ok_stamm or ok_help; { eingetragene Trennstellen entfernen } if inform.ausnahme then for i1:=anfang-1 to schnitt do begin tr[i1]:=keine; dud[i1]:=false; dudstop[stop_ptr] := false; end else begin tr[anfang-1]:=keine; dud[anfang-1]:=false; end; { --- p_stamm(ok_stamm) - ende --- } ok1 := ok1 or ok_stamm; end else if (anfang=1) and (schnitt=laenge) then begin { --- p_stammallein(ok_stamm); --- } dud[anfang-1]:=true; tr[anfang-1]:=haupt; trennen (4, schnitt+1, spv, ok_stamm,tr,dudtr, zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); {eingetragene Trennstellen loeschen} tr[anfang-1]:=keine; dud[anfang-1]:=false; { --- p_stammallein(ok_stamm) - ende --- } ok1 := ok1 or ok_stamm; end; end end else begin { --- p_stamm(ok_stamm); --- } ok_stamm := false; stop_ptr := 0; if zustand = 1 then tr[anfang-1]:=neben else begin dud[anfang-1]:=true; if zustand = 5 then tr[anfang-1] := haupt3 else tr[anfang-1]:=haupt end; {ev. Ausnahmetrennstellen eintragen} if inform.ausnahme then begin if inform.erste<>7 then begin tre:=anfang+inform.erste+1; tr[tre]:=neben; if inform.zweite<>7 then begin tre:=tre+inform.zweite+2; if inform.untrennbar then begin stop_ptr := tre; dudstop[stop_ptr] := true; end else tr[tre]:=neben end else begin if inform.untrennbar then begin tr[tre] := keine; stop_ptr := tre; dudstop[tre]:=true; end end end else begin {inform.erste = 7} stop_ptr := schnitt-1; dudstop[stop_ptr] := true; end; end; { Fugen 's' bevorzugen } if schnitt < laenge then if (w[schnitt+1] = bst_s) and not spv[schnitt+1] then begin spv [schnitt+1] := true; trennen (2, schnitt+2, spv, ok_help,tr,dudtr, zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); ok_stamm := ok_help end; { Sonderbehandlung eines in der Wortfuge entfallenen mitlauts } if (schnitt >=2) and (schnitt < laenge) then if konsonant[w[schnitt]] and not konsonant[w[schnitt+1]] and (w[schnitt] = w[schnitt-1]) then begin trennen(5, schnitt, spv, ok_help,tr,dudtr, zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); ok_stamm := ok_stamm or ok_help; end; { restlichen Teil zerschneiden } trennen (2, schnitt+1, spv, ok_help,tr,dudtr, zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); ok_stamm := ok_stamm or ok_help; { eingetragene Trennstellen entfernen } if inform.ausnahme then for i1:=anfang-1 to schnitt do begin tr[i1]:=keine; dud[i1]:=false; dudstop[stop_ptr] := false; end else begin tr[anfang-1]:=keine; dud[anfang-1]:=false; end; { --- p_stamm(ok_stamm) - ende --- } ok1 := ok1 or ok_stamm; end; {naechsten Schleifendurchlauf vorbereiten} schnitt := schnitt - 1; nichtok := schnitt = anfang - 1 end else { kein bekanntes Wort am Beginn des Teilwortes } nichtok := true; until nichtok end; spv := spvtmp; { spv wiederherstellen, fr die C Version. -PRAK!- } end { trennen }; {----------------------------------------------------------------------------} procedure abteilen ( var w:wort; var tr1:trennvektor; laenge : integer; var ok:boolean; var zerlegungen:integer ); label 99; { kommt in den outer Block } { mtype sperrvektor=array [0..maxwl] of boolean; } var i:integer; dud:dudt; { array [0..maxwl] of boolean; } v:varr; ind,k:intarr; spv,dudstop:sperrvektor; tr, dudtr:trennvektor; begin { abteilen } tr1[0] := keine; for i:=0 to laenge do begin tr[i]:=keine; spv[i]:=false; dudstop[i]:=false; dud[i]:=false; if i > 0 then if (w[i] = trennzeichen) or (w[i] = sonderzeichen) then begin zerlegungen := 0; ok := false; goto 99; end; end; zerlegungen:=0; trennen(3, 1, spv, ok, tr,dudtr,zerlegungen,dud,v,ind,k, dudstop,w,tr1,laenge); for i:=0 to laenge do if tr1[i]=nebenck then begin tr1[i]:=keine; { c-k wird zu -ck da sp\"ater dement- } tr1[i-1]:=nebenck; {sprechend getrennt wird } end; 99: end;{ abteilen } {----------------------------------------------------------------------------} {identification number for dirty umlauts; see description in change file} function is_dirty_umlaut(p:pointer):boolean; var res:boolean; q:pointer; begin res:=false; if p<>null then if type(p)=disc_node then if pre_break(p)<>null then if link(pre_break(p))<>null then if link(link(pre_break(p)))<>null then begin q:=pre_break(p); if type(q)=penalty_node then q:=link(q); if (type(q)=kern_node)and(width(q)=spec_letter_ID)and is_char_node(link(q))and(type(link(link(q)))=kern_node) then res:=true; end; is_dirty_umlaut:=res; end; {|is_dirty_umlaut|} function get_char_of(s:pointer):integer; var q:pointer; begin q:=pre_break(s); if type(q)=penalty_node then q:=link(q); get_char_of:=width(link(link(q))); end; {|get_char_of|} function get_font_of(s:pointer):integer; var q:pointer; begin q:=pre_break(s); if type(q)=penalty_node then q:=link(q); get_font_of:=font(link(q)); end; {|get_font_of|} {----------------------------------------------------------------------------} @ Ende ... @z ========================================================================= @x @ Now let's go back to the easier problem, of building the linked trie. When \.{INITEX} has scanned the `\.{\\patterns}' control sequence, it calls on |new_patterns| to do the right thing. @= procedure new_patterns; {initializes the hyphenation pattern data} label done, done1; var k,@!l:small_number; {indices into |hc| and |hyf|} @!digit_sensed:boolean; {should the next digit be treated as a letter?} @!v:quarterword; {trie op code} @!p,@!q:trie_pointer; {nodes of trie traversed during insertion} @!first_child:boolean; {is |p=trie_l[q]|?} @!c:ASCII_code; {character being inserted} begin if trie_not_ready then begin set_cur_lang; scan_left_brace; {a left brace must follow \.{\\patterns}} @; end else begin print_err("Too late for "); print_esc("patterns"); help1("All patterns must be given before typesetting begins."); error; link(garbage):=scan_toks(false,false); flush_list(def_ref); end; end; @y @ We do not build a trie (wether a linked nor a packed). Instead, we create a hash table for the \SISISI-hyphenation algorithm. But we still use the data provided by the `\.{\\patterns}' primitive. @= procedure new_patterns; {initializes the hyphenation pattern data} label done; var k: small_number; {index into |zeile|} @!zeile:string80; { Silbe Type <,> als Eintrag der Hashtabelle } @!status:integer; @!c:ASCII_code; {character being inserted} begin if trie_not_ready then begin set_cur_lang; scan_left_brace; {a left brace must follow \.{\\patterns}} hashempty; { create the empty hash table } @; hashsave(status); if status<>0 then begin wterm_ln('Fehler bei hash save !!!!!'{?}); wlog_ln( 'Fehler bei hash save !!!!!'{?}); end; end else begin print_err("Too late for "); print_esc("patterns"); help1("All patterns must be given before typesetting begins."); error; link(garbage):=scan_toks(false,false); flush_list(def_ref); end; end; @z -------------------------------------------------------------------------- @x @ Novices are not supposed to be using \.{\\patterns}, so the error messages are terse. (Note that all error messages appear in \TeX's string pool, even if they are used only by \.{INITEX}.) @= k:=0; hyf[0]:=0; digit_sensed:=false; loop@+ begin get_x_token; case cur_cmd of letter,other_char:@; spacer,right_brace: begin if k>0 then @; if cur_cmd=right_brace then goto done; k:=0; hyf[0]:=0; digit_sensed:=false; end; othercases begin print_err("Bad "); print_esc("patterns"); @.Bad \\patterns@> help1("(See Appendix H.)"); error; end endcases; end; done: @y @ Here we build the hash table for the \SISISI-hyphenation. @= for k:=0 to maxwl do zeile[k]:=" "; k:=0; loop@+ begin get_x_token; { \SISISI Worttabelle hat das Format 'Text.Type,'. Der Beistrich wird eigens abgefragt, da er in TeX umdefiniert werden kann. } if (cur_cmd = right_brace)or(cur_chr = ",") then begin { Ein Eintrag der Worttabelle - letzter (|cur_cmd|=|right_brace|) - gelesen } if k>0 then eintragen(zeile,k); { ein Eintrag vorhanden } if cur_cmd = right_brace then goto done; k := 0; { hyf[] ist \"uberfl\"ussig, da die Trenninformation in zeile mitverpackt ist. } end else { Das Wort ist noch nicht fertig gelesen } if (cur_cmd = letter) or (cur_cmd = other_char) or (cur_cmd = spacer) or (cur_chr = ".") then begin @; end else begin print_err("Bad "); print_esc("patterns"); @.Bad \\patterns@> help1("(See Appendix H.)"); error; end; end; {|loop|} done: @z ----------------------------------------------------------------------------- @x @ @= if digit_sensed or(cur_chr<"0")or(cur_chr>"9") then begin if cur_chr="." then cur_chr:=0 {edge-of-word delimiter} else begin cur_chr:=lc_code(cur_chr); if cur_chr=0 then begin print_err("Nonletter"); @.Nonletter@> help1("(See Appendix H.)"); error; end; end; if k<63 then begin incr(k); hc[k]:=cur_chr; hyf[k]:=0; digit_sensed:=false; end; end else if k<63 then begin hyf[k]:=cur_chr-"0"; digit_sensed:=true; end @y @ @= if (cur_chr<>" ")then begin { Das Leerzeichen wird \"uberlesen! } if (cur_chr <> ".") then begin { der Punkt trennt die Silbe von der Silbenart und bleibt erhalten } if (cur_chr<"0")or(cur_chr>"9") then begin cur_chr:=lc_code(cur_chr); if cur_chr=0 then begin print_err("Nonletter"); @.Nonletter@> help1("(See Appendix H.)"); error; cur_chr:=128; end; end; end; if k<63 then begin incr(k); zeile[k]:=cur_chr; end; end; @z ---------------------------------------------------------------------------- @x @ When the following code comes into play, the pattern $p_1\ldots p_k$ appears in |hc[1..k]|, and the corresponding sequence of numbers $n_0\ldots n_k$ appears in |hyf[0..k]|. @= begin @; q:=0; hc[0]:=cur_lang; while l<=k do begin c:=hc[l]; incr(l); p:=trie_l[q]; first_child:=true; while (p>0)and(c>so(trie_c[p])) do begin q:=p; p:=trie_r[q]; first_child:=false; end; if (p=0)or(c; q:=p; {now node |q| represents $p_1\ldots p_l$} end; if trie_o[q]<>min_quarterword then begin print_err("Duplicate pattern"); @.Duplicate pattern@> help1("(See Appendix H.)"); error; end; trie_o[q]:=v; end @y @ Well, that was everything about the initialization of the \SISISI hash table (at this place). All following sections, describing the initialization of the trie are replaced by ``empty sections'' in order to have same section numbering in SI\TeX as in \TeX. @z ---------------------------------------------------------------------------- @x @ @= begin if trie_ptr=trie_size then overflow("pattern memory",trie_size); @:TeX capacity exceeded pattern memory}{\quad pattern memory@> incr(trie_ptr); trie_r[trie_ptr]:=p; p:=trie_ptr; trie_l[p]:=0; if first_child then trie_l[q]:=p@+else trie_r[q]:=p; trie_c[p]:=si(c); trie_o[p]:=min_quarterword; end @y @ ``Empty section''. @z ------------------------------------------------------------------------------ @x @ @= if hc[1]=0 then hyf[0]:=0; if hc[k]=0 then hyf[k]:=0; l:=k; v:=min_quarterword; loop@+ begin if hyf[l]<>0 then v:=new_trie_op(k-l,hyf[l],v); if l>0 then decr(l)@+else goto done1; end; done1: @y @ Further ``empty section''. @z @x @ Finally we put everything together: Here is how the trie gets to its final, efficient form. The following packing routine is rigged so that the root of the linked tree gets mapped into location 1 of |trie|, as required by the hyphenation algorithm. This happens because the first call of |first_fit| will ``take'' location~1. @= procedure init_trie; var @!p:trie_pointer; {pointer for initialization} @!j,@!k,@!t:integer; {all-purpose registers for initialization} @!r,@!s:trie_pointer; {used to clean up the packed |trie|} @!h:two_halves; {template used to zero out |trie|'s holes} begin @; if trie_root<>0 then begin first_fit(trie_root); trie_pack(trie_root); end; @y @ Because we do not use the trie, we fill it with dummy data. But the main reason, why we do not remove this procedure is, that it is called and the used variables are referenced somewhere in \TeX. @= procedure init_trie; var @!p:trie_pointer; {pointer for initialization} @!j,@!k,@!t:integer; {all-purpose registers for initialization} @!r,@!s:trie_pointer; {used to clean up the packed |trie|} @!h:two_halves; {template used to zero out |trie|'s holes} begin trie_root:=0; @z =========================================================================