SPA Conference session: Ambiguous vocabulary in identifier names

One-line description:An exploration of the sources and relevance of ambiguity in identifier name vocabulary.
Session format: Workshop (75 mins) [read about the different session types]
Abstract:Identifier names communicate the concepts and ideas represented in source code. As with any form of communication, identifier names are not always clear, and, sometimes, the vocabulary used is ambiguous. Identifier naming conventions typically give general advice, such as typographical rules, but do not give detailed advice on identifier name selection. The academic literature contains examples of rules for identifier naming schemes designed to reduce ambiguity, and reports of research that show some developers introduce ambiguous vocabulary to identifier names in both closed and open source software. Where ambiguous vocabulary is used in identifier names additional effort is required to understand the source code, thereby increasing the maintenance effort. Furthermore, ambiguous vocabulary may cause program comprehension tools that rely on identifier names to construct models to provide potentially misleading information.

This session explores the sources of ambiguity in identifier name vocabulary and the relevance of ambiguous vocabulary to software developers, and to maintainers in particular. The workshop aims to establish what kinds of ambiguous vocabulary are significant obstacles to program comprehension, and when, as in any form of communication, ambiguous vocabulary might be considered an occupational hazard, or even a source of amusement. Participants are encouraged to bring (anonymised) examples of ambiguous identifier names to the workshop.
Audience background:The session is aimed at software developers, and could also be of interest to those involved in requirements engineering.
Benefits of participating:An understanding of where ambiguous identifier name vocabulary is an obstacle to program comprehension.

To develop more reflective identifier naming practice.
Materials provided:First exercise: worksheets consisting of a prose description of a system, and paper to develop a diagram to model the description.

Second exercise: a UML class diagram of a system that participants will be expected to annotate
Process:Groups consisting of between 4 and 6 participants each will undertake two exercises.

The first exercise explores the vocabulary that individuals bring to the software development process and the way in which teams compromise to develop a common vocabulary. Two basic systems will be specified for the workshop, and there will be two descriptions of each system using different vocabulary. Each group will work on one description only. The vocabulary used in each description will contain ambiguities, both polysemes and synonyms. The aims of the exercise are: to observe whether participants adopt the vocabulary of the description, resolving any ambiguities, or whether they adopt another vocabulary; and to encourage participants to reflect on the choices they made, and to appreciate the influences on their decisions, including their knowledge of the domain and naming conventions.

The second, longer exercise, will look at the importance of ambiguity in an existing identifier name vocabulary. Groups will work on a UML class diagram to identify vocabulary used in identifier names that they think may need to be revised. The intention of the exercise is for participants to understand from fellow group members the extent to which ambiguity may inhibit their understanding of source code and what kind of revisions might make a positive contribution to program comprehension.

Both exercises will be followed by a discussion in which participants will be encouraged to comment on the reasons for the decisions made during each exercise. Given the subjective nature of the topic, dissenting voices from each group will also be encouraged to express their opinions.
Detailed timetable:00:00 - 00:05 : Introduction
00:05 - 00:20 : First exercise
00:20 - 00:30 : Discussion
00:30 - 00:55 : Second exercise
00:55 - 01:10 : Discussion of second exercise
01:10 - 01:15 : Wrap up
Outputs:A set of web pages recording each exercise, a synopsis of the output of each group during the exercises and a summary of each discussion.
1. Simon Butler
Computing Department The Open University
2. Helen Sharp
The Open University