Index of /archives/text/CTAN/support/word2x

Icon  Name                         Last modified      Size  Description
[PARENTDIR] Parent Directory - [TXT] wordwrap.cc 1998-10-08 03:12 1.3K [TXT] word6.h 1998-10-08 03:12 796 [TXT] word2x.cc 1999-08-06 09:14 7.1K [TXT] word2x.1 1998-10-08 03:12 3.0K [   ] word2x 1999-08-06 10:06 316K [TXT] use_getopt.h 1998-10-08 03:12 210 [TXT] usdate.cc 1998-10-08 03:12 1.1K [TXT] ukdate.cc 1998-10-08 03:12 1.1K [TXT] tune.h 1998-10-08 03:12 808 [TXT] transcript 1999-08-06 10:06 3.9K [TXT] text-table.h 1998-10-08 03:12 453 [TXT] text-table.cc 1998-10-08 03:12 6.7K [TXT] text-fmt.cc 1998-10-08 03:12 6.3K [TXT] tblock.h.orig 1998-10-08 03:12 813 [TXT] tblock.h 1998-10-08 03:12 791 [TXT] tblock.cc 1998-10-08 03:12 4.3K [TXT] strip.h 1999-08-06 09:09 2.9K [TXT] strip.cc.orig 1998-10-08 03:12 13K [TXT] strip.cc 1999-05-09 23:50 13K [TXT] shrink_width.cc 1998-10-08 03:12 748 [TXT] scan_num.cc 1998-10-08 03:12 1.0K [TXT] rtest2.cc 1998-10-08 03:12 1.8K [   ] rtest2 1999-08-06 10:06 128K [TXT] reader.h 1998-10-08 03:12 5.5K [TXT] reader.cc 1998-10-08 03:12 24K [TXT] part_num_probe.c 1998-10-08 03:12 782 [TXT] num_unit_probe.c 1998-10-08 03:12 1.3K [TXT] nullproc.cc 1998-10-08 03:12 320 [TXT] map_chars.cc 1998-10-08 03:12 864 [   ] liboutfmt.a 1999-08-06 10:06 151K [TXT] lib.h 1998-11-30 20:44 2.5K [TXT] latex-table.h 1998-10-08 03:12 527 [TXT] latex-table.cc 1998-10-08 03:12 6.0K [TXT] latex-fmt.cc 1998-10-08 03:12 12K [TXT] latex-embed.cc 1998-10-08 03:12 10K [TXT] interface.h 1998-10-08 03:12 1.4K [TXT] install-sh 1998-10-08 03:12 4.7K [TXT] html-table.h 1998-10-08 03:12 520 [TXT] html-table.cc 1999-08-06 09:12 5.8K [TXT] html-fmt.cc 1998-10-08 03:12 11K [TXT] html-embed.cc 1998-10-08 03:12 7.2K [TXT] getopt1.c 1998-10-08 03:12 4.3K [TXT] getopt.h 1998-10-08 03:12 4.4K [TXT] getopt.c 1998-10-08 03:12 21K [TXT] fmt-latex.h 1998-10-08 03:12 920 [TXT] fmt-html.h 1998-10-08 03:12 1.0K [TXT] fifo.h 1999-08-06 09:09 4.1K [   ] fake_link 1998-10-08 03:12 95 [TXT] dedate.cc 1998-11-30 20:41 854 [TXT] deL1date.cc 1998-11-30 20:41 793 [TXT] deHTMLdate.cc 1998-11-30 20:41 915 [TXT] configure.in 1999-08-06 09:12 2.1K [TXT] configure 1999-08-06 09:12 66K [TXT] config.sub 1999-08-06 09:07 19K [TXT] config.h.top 1998-10-08 03:12 44 [TXT] config.h.in 1998-10-08 03:12 2.2K [TXT] config.h.bot 1998-10-08 03:12 122 [TXT] config.guess 1999-08-06 09:07 27K [TXT] compat.c 1998-10-08 03:12 1.1K [TXT] col-align.cc 1998-10-08 03:12 2.0K [TXT] alloca.c 1998-10-08 03:12 13K [TXT] README 1999-08-06 09:20 7.3K [TXT] Makefile.linux 1998-10-08 03:12 1.4K [TXT] Makefile.in 1998-12-29 06:17 3.0K [TXT] INTERNALS 1998-10-08 03:12 7.1K
$Id: README,v 1.7 1997/04/13 02:59:59 dps Exp $

What is new in version 0.005 of word2x

Update version to 0.005
Fix version number bug
Update config.guess an config.sub
Re-generate configure.in with newer autoconf
Fix various ANSI violations that g++ 2.95 disliked


What is new in version 0.004 of word2x

Stupid bug in word2x_junk_filter::filter_junk bug which ignored the last
character read squashed.

Added german support from word2x port EX2.


What it was new in version 0.003 of word2x

word2x-0.003 is version word2x-0.002 with a major bug in strip.cc
eliminated. word2x-0.002 was 0.001 retro-fitted with some quite new
junk filtering code with lots of tunable parameters (i.e. all of
tune.h).  This code is extracted from the envolving, and currently
incomplete, source tree of the next major release. (When this happens I
will stop supporting or maintaing any 0.00x versions).

The major change is much better junk filtering, losing less text and
throwing out more junk; unicode documents should now
work. Increasing numbers of problem document which have OLE junk in
places that break the code are appearing. Splitting the document with
lls (from the LAOLA package) and attacking the WordDocument stream
works---sometime I will have a useable library that can do this
automagically. (In word2x-0.002 you have a very good chance of
tickling the strip.cc bug and its (buggy) bug trap).

Documents that do cause problems after the suggested work-around to
word2x@duncan.telstar.net please. The immediate fix is to try one of
the other two programs. (Free software people are prepared to
co-operate with the "competition"). There are links to all the
"competition" I know of on the word2x home page at
http://word2x.alcom.co.uk (hosted by the alcom.co.uk free of charge,
despite the fact charges normally apply).


Installing word2x

You need a C++ compiler and a version of make that does understands
how to make .o files from .cc files, for example GNU make. Ideally you
have getopt_long already in your C library but you might not. If this
applies set GETOPT to gopt.o in the Makefile. getopt_long is the
version supplied by the free software foundation in glibc-1.09

If your make does not know then add a rule. for GNU make the rule is

%.o: %.cc
	$(CPP) $(CPPFLAGS) -c -o $@ $<

Please note that a warning about a contravariance violation is normal.

As this is program only recently escaped, YMMV. The main reason for
its escape was incessant irration that comp.os.linux.misc posters
manage about word .doc files (IMHO this is justified). I had wrote
this program for myself and my word problem; I let it run wild in the
hope that is it useful for others. [I now know is it is helping some
people]

Further information on other converters is avialable in the list of
converters avialble via <http://www.kfa-juelich.de/isr/1/texconv.html>
(word2x seems to have a monopoly on converters from word to latex not
requiring word and avialable on non-MS platforms).

The program has been compiled on (the first two by me personally):

Linux 2.1.30 (Unix)
SunOS (Unix)
DEC Alpha AXP under OSF/1 (Unix) 
IBM SP/2 (RS6000) under AIX (Unix) [SP/2s are heavy computing power...]

It is known not compile with

Borland C++ 3.1 (PC version).

If any manages to compile on a PC version, please tell
Duncan Simpson <dps@duncan.telstar.net> and
W.Hennings <W.Hennings@kfa-juelich.de>

Limited flat (linear) memory might be lethal, esp. if your system
lacks alloca. If have not learned to steal what is free then you can
send money (prefably UK funds), postcards, etc to the author at

Frax House, Kingston Bagpuize, OXON OX13 5AW

or for the next couple of years

Flat 6, 93 Westridge Road, Southampton

I neither suggest that you do donate nor that you do not donate.


SunOS

Can be problematic. Setting LD to ./sunos_link and defining add
produced a binary that worked for me with one warning about
strncasecmp. I guess SUN's ld is incompatible with g++ or something;
using ar and ranlib, aka the sunos_link shell script, works. The
configuration script hopefuk does this stuff for you.


Reported bugs

On some platforms it misses the first 3/4 of a page. If you are
afflicted get out your copy of hexdump and adjust the start offset in
word2x to the correct value. This should be fixed now.

Copyright

This program is(c) D.P.Simson 1997. The program is licenced under the
GPL version 2, or any later version (at your option). This means DOS
people must distribute source as per the GPL.

The stuff I did not write is:

config.guess and config.sub come from GNU autoconf and are thus
(c) The Free Software Foundation.

getopt.c, getopt1.c and getopt.h are (c) The Free Software Foundation.
I am fairly sure the LGPL requires these files to be distributed as
well.

alloca.c is almost certaintly also (c) The Free Sofwtware Foundation.

install-sh is probably (c) The X consortium


Introductory proganda

Despite the fact that open formats like rtf are good and widely
avialable far too many idiots seem to insist on using word .doc
format. This program is an attempt to limit the damage this causes
users of non-microsoft systems and text processing systems, for
example LaTeX.

It is designed to be retargetable and avoid some of the travesties of
proper typsetting comitted by word, which is hobbled by the lack of
litagures in TrueType fonts (and the lack of different design sizes to
some extent). There is quite a large amount of guesswork from context
to reduce the impact of my lack of understanding a document the way
word does. One even sees interesting things like
<Paragraph mode> 550* <eqn> \F(foo, bar) <end eqn> * 42 * (pixels per em)
<End paragraph>
which is not too good! There may be multiple bits of alternating roman
and equation, multiple items of text in brackets, etc.
etc. Fortunately the reader converts these, in two stages, to a single
maths insert. Maths inserts with embedded newlines get rendered as
eqnarray* in LaTeX mode. All maths is just deleted in text mode (would
someone like to add this support?).

LaTeX mode sees the equation example above as <eqn insert> 550
* \F(foo, bar * baz) * 42 * (pixels per em) <end eqn insert> and
renders it as
% Some comments omited for brevity
$$550 \times {\text{foo} \over \text{bar} \times \text{baz} } \times 42
\times \text{(pixels per em)}$$
which looks a lot better than word's own version, which uses awful
stars instead of proper times signs.

Text mode implements tables with real columns, unlike catdoc. Long
entries are folded automatically and there is some semi-intelligent
width reduction.  Hypenation is not supported so if someone instists
on using supercalifragilistic... then an overlong line might result
(anyone care to fix this? I thought it was just overkill to implement
the hypenation algorithm along with all the rest).

Apart from the pictures and a little trailing junk the code does a
good job on the TrueType documents. The readme generates some error
messages about extra ^Us amoung other things due to a lack of
understanding of some of the inserts used in some documents. Anyone
who can decode more types of insert, please tell me about it and
preferably send a patch so I can avoid extra programming (got too much
real work to be doing).

If someone wishes to contribute *roff output I would include it. Extra
understanding of equations also gratefully recieved as the examples in
the TrueType docs are rather limited. Bibliography and any other you
can tell me about also grateful listened to.

Duncan (-: