Text Analysis Tools

ALLC Session - Text Analysis Tools: Architectures and protocols

Type of Proposal	Panel session
Title	Text Analysis Tools: Architectures and protocols
Chair	Harold Short
Affiliation	King's College London
Email	harold.short@kcl.ac.uk
Contact address	Centre for Computing in the Humanities, King's College London, Strand, London WC2R 2LS, U.K.
Fax	+44 (0)171 848-2980
Phone	+44 (0)171 848-2739
Author	John Bradley and Tom Horton
Author	Manfred Thaller

There has been considerable concern in recent years at the absence of text analysis tools that can take advantage of the intellectual effort that goes into encoding. Thus, while there are now large collections of encoded texts, with the number continuing to grow rapidly, there has been little corresponding development of the 'next generation' of tools such as OCP, TACT and others that would enable humanities scholars to exploit fully the intellectual investment represented by these collections. Browsing and searching is the limit of what can readily be done in most cases.

This concern has resulted in a number of meetings, discussions and initiatives on both sides of the Atlantic over a number of years. There have been papers, panels or BOF sessions at many ACH-ALLC conferences, going back at least to Georgetown in 1995. There were two meetings held at Princeton, organised by Susan Hockey when she was at CETH. Following the ALLC-ACH 98 conference in Debrecen, the 'ELTA' software initiative was launched to promote discussion and the exchange of ideas. Related to this, during the past year a series of meetings and workshops has been organised aimed at establishing a framework and a set of open specifications that would enable collaborative tool development to proceed.

There appears to be general perception that in this area it is unlikely that commercial software developers will create systems to meet scholarly needs that are (a) affordable and (b) capable of the kind of modification and tuning that will enable scholars to push these approaches to the limits that are necessary. It seems likely therefore that the most effective model will be one that enables scholars and information professionals in higher education to develop software independently but collaboratively.

The papers in this panel session arise in part from some collaborative work carried out in Europe, focused on a workshop in Bergen in October 1998, hosted by Manfred Thaller, which brought together humanities computing (and historical computing) specialists and computational linguists. In part they arise from a workshop held at King's College London in January 1999, which brought together European and North American specialists. In both workshops the emphasis was on exploring underlying models and structures rather than on applications or user needs. This session is also informed by recent transatlantic collaborative activity aimed at initiating practical development work.

After the presentation of the two formal papers, there will be a brief report on the ELTA initiative. It is intended that there should be a good amount of time for open discussion before the close of the session. The URL for ELTA is:

http://www.cse.fau.edu/~tom/elta/