European Regional Science Association

ERSA 2003 Congress

Abstracts

The abstract for paper number 371:

Jean-Claude Thill, Department of Geography, University at Buffalo, The State University of New York, New York, USA, William A. Kretzschmar, Department of English, University of Georgia , Georgia, USA
Exploratory Analysis of English Dialect Features in the Middle and South Atlantic U.S. States by Self-Organizing Maps

Conventional methods of quantitative spatio-linguistic analysis of the variations of word usage and pronunciation have been found to be ill-suited to analyze many existing datasets such as the Linguistic Atlas of Middle and South Atlantic States (LAMSAS) for various reasons, including sparse data, skewed distribution and highly multidimensionality. In this paper, we propose to use a neural network model called Self-Organizing Maps (SOM). As a data mining methodology, SOM serves the purpose of reducing the dimensionality of multidimensional data sets, identifying latent organization rules, and classifying surveyed cases into larger features.

Our analysis consists of three separate, but related facets implemented in a Windows environment. First, SOM models of linguistic attributes taken independently (models of lexical variants, models of pronunciation variants, models of grammatical constructions) are trained to generate linguistic feature bundles. Second, a geographic information system (GIS) of LAMSAS data is designed and implemented. The digital map produced by the GIS can be dynamically linked to the output map of the SOM application, which allows for interactive analysis of associations between linguistic feature bundles and geographic clusters. The GIS also allows for selective linking on the basis of personal and social characteristics of survey respondents. Finally, graphical displays of personal and social variables (histograms or scatter plots) are generated and dynamically linked to the output SOM map.

The paper discusses the methodological challenges involved in the implementation of this approach to LAMSAS data. We also discuss results of the analysis, particularly in relation to the following linguistic and dialectologic questions:

1.What are the salient linguistic feature bundles that emerge to synthesize the dialectological reality of the Middle and South Atlantic States? 2.How are linguistic features (whether words, grammatical constructions, pronunciations, or combinations of the latter) distributed over geographic areas, if not in relatively uniform patterns of complementary distribution (“pail” in one area, and “bucket” in another)?

What personal/social variables such as sex, age, ethnicity, occupation, education, and community type (urban/rural), correlate with linguistic feature bundles?

Unfortunately full paper has not been submitted.