README 30 October 1992 by Paul Leyland (pcl@black.ox.ac.uk)

This directory (/wordlists) contains a number of sub-directories each
containing compressed wordlists by subject.  The "Random" directory
is a catch-all.

Beware.  There are rather a lot of words in total; there are even
quite a few thousand duplicates.

If anyone has wordlists of languages and topics not present here,
please drop a line to pcl@black.ox.ac.uk, telling me how I can get
hold of them.

All the individual files in the Klein lists are here (not the
aggregated all-words), as are many foreign language dictionaries and
words from a number of technical and leisure fields

Henk Smit, enk@cs.vu.nl, from whom I acquired many of these lists, writes:

=======================================

  These are the dictionaries I have found, their sizes, and where I got them
 from.

	Dutch:
		178429 words, 1998881 bytes, 779056 bytes compressed.
		This list is made out of some smaller lists, 
			het Groene Boekje (available at donau.et.tudelft.nl)
			TeX dutch wordlist (available at archive.cs.ruu.nl)
			local additions at de Vrije Universiteit (cs.vu.nl)

	German:
		There are two lists, germanl.Z and words.german.Z.
		germanl.Z: 27342 words, ? bytes, 137591 bytes compressed.
		words.german.Z: 160086 words, 2060734 bytes, 761528 compressed.
		both from ftp.informatik.tu-muenchen.de:/pub/doc/dict

	Italian:
		60453 words, 561982 bytes, 217241 bytes compressed.
		David Vincenzetti <vince@ghost.unimi.it>
		ghost.unimi.it:/pub/voc.Z

	Norwegian:
		61843 words, 589234 bytes, 258162 bytes compressed,
		Anders Ellefsrud <anders@ifi.uio.no>,
		ftp.ifi.uio.no:/pub/dicts/norwegian-words.Z

	Swedish:
		23688 words, 200853 bytes, 96169 bytes compressed.

	Finnish:
		280475 words, 3340963 bytes, 1329070 bytes compressed.
		ftp.uu.net:/doc/dictionaries/Finnish

	Japanese:
		115600 words, 935022 bytes, 403986 bytes compressed.
		ftp.waseda.ac.jp:/pub/security/wordlists

	names/Family-Names.Z and names/Given-Names.Z:
		Family-Names: 13484 names, 106780 bytes, 57749 bytes compressed.
		Given-Names: 8608 names, 60271 bytes, 31136 bytes compressed.
		Andrew Macpherson <A.Macpherson@stl.stc.co.uk>
		available on bnrgate.bnr.co.uk.

	names/names.french.Z and names/names.hp.Z:
		names.franch: 702 names, 5315 bytes, 3023 bytes compressed.
		names.hp: 44554 names, 430014 bytes, 188971 bytes compressed.
		Dan Kegel <dank@blacks.jpl.nasa.gov>
		available on blacks.jpl.nasa.gov:/pub/security/wordlists

	names/surnames.finnish.Z
		713 names, 4488 bytes, 2428 compressed.
		ftp.uu.net:/doc/dictionaries/Finnish
=======================================


Here's the 0-Index file from another repository.  All the following
files are here somewhere.  This lot was originally collected together
by Don Olivier, don@hsph.harvard.edu, but have since diffused around
the world to several ftp archives


=======================================
Antworth	@ Big dictionary, includes many inflected forms
CIS		@ Words and names from Current Index to Statistics (partial)
CRL.words	@ Dictionary from Center for Research in Lexicography
Congress	@ Names and nicknames of U. S. Congressmen
Domains		@ Internet domains
Dosref		@ Words from the DOS Technical Reference Manual
Ethnologue	@ Words from the "Ethnologue Database"
Ftpsites	@ Anonymous ftp sites
Jargon		@ Words from the Jargon File
Koran		@ Words from the Koran
LCarrol		@ Words from AliceIW, AliceTTLG, Snark
Movies		@ Characters, actors, and titles from thousands of movies
Paradise.Lost	@ Words from P. L. (a touch of class)
Python		@ Words and names from M. P. scripts
Roget.words	@ Words from 1911 R's Thesaurus
Trek		@ Words and names from Star Trek plot summaries
Unabr.dict	@ A big unabridged dictionary
World.factbook	@ Words, names, many acronyms from the CIA World Factbook
Zipcodes	@ All U. S. post offices (except the last half of Alaska)

Words in /usr/dict/words deleted from all these lists
Words in Dan Klein's suite of lists deleted from several of them
    (that's why "klingon" doesn't appear in "Trek")

=======================================