From Letters to Words - University of Washington

From Letters to Words: Efficient Stroke-based Word Completion for Trackball Text Entry

Jacob O. Wobbrock1,2 and Brad A. Myers2

1The Information School

2Human-Computer Interaction Institute

University of Washington

School of Computer Science

Mary Gates Hall, Box 352840

Carnegie Mellon University

Seattle, Washington 98195-2840

5000 Forbes Avenue, Pittsburgh, PA 15213 USA

wobbrock@ischool.washington.edu

bam@cs.cmu.edu

ABSTRACT

We present a major extension to our previous work on Trackball EdgeWrite--a unistroke text entry method for trackballs--by taking it from a character-level technique to a word-level one. Our design is called stroke-based word completion, and it enables efficient word selection as part of the stroke-making process. Unlike most word completion designs, which require users to select words from a list, our technique allows users to select words by performing a fluid crossing gesture. Our theoretical model shows this word-level design to be 45.0% faster than our prior model for character-only strokes. A study with a subject with spinal cord injury comparing Trackball EdgeWrite to the onscreen keyboard WiViK, both using word prediction and completion, shows that Trackball EdgeWrite is competitive with WiViK in speed (12.09 vs. 11.82 WPM) and accuracy (3.95% vs. 2.21% total errors), but less visually tedious and ultimately preferred. The results also show that word-level Trackball EdgeWrite is 46.5% faster and 36.7% more accurate than our subject's prior peak performance with character-level Trackball EdgeWrite, and 75.2% faster and 40.2% more accurate than his prior peak performance with his preferred on-screen keyboard. An additional evaluation of the same subject over a two-month field deployment shows a 43.9% reduction in unistrokes due to strokebased word completion in Trackball EdgeWrite.

Categories and Subject Descriptors

H.5.2 [Information interfaces and presentation]: User interfaces -- Input devices and strategies. K.4.2 [Computers and society]: Social issues -- assistive technologies for persons with disabilities.

General Terms

Design, Experimentation, Human Factors, Theory.

Keywords

Word prediction and completion, word-level text entry, text input, goal crossing, unistrokes, gestures, trackballs, Fitts' law, HickHyman law, Steering law, Zipf's law, EdgeWrite, WiViK.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASSETS'06, October 22?25, 2006, Portland, Oregon, USA. Copyright 2006 ACM 1-59593-290-9/06/0010...$5.00.

Figure 1. Trackball EdgeWrite with stroke-based word completion. A "t" has been made, and the current stroke is an "h", so "th-" completions are

currently offered.

1. INTRODUCTION

Although many technologies exist for alternative computer access [7], studies show that less than 60% of people who need access solutions actually use them [9]. Furthermore, at least 35% of purchased solutions are never adopted [8]. Voice recognition systems, in particular, are subject to high abandonment rates [15]. Reasons cited for these failures include the high cost of devices, device complexity, and the need for extensive customization. Of prime importance, then, is simplicity in both the design of devices and in the process of adoption [8].

In an effort to provide a simpler desktop access technology, we previously introduced a method of desktop text entry for use with trackballs called Trackball EdgeWrite [30]. Unlike most trackball text entry methods, which require users to mouse over an onscreen keyboard, Trackball EdgeWrite uses unistroke gestures, which allow users to write "by feel" rather than "by sight." The result is a faster and less tedious method of trackball text entry for people who already use trackballs but cannot touch-type on a physical keyboard. Such users may have repetitive stress injuries, spinal cord injuries, arthritis, or some neuromuscular disorders.

Despite Trackball EdgeWrite's initial success compared to onscreen keyboards, it has been, until now, constrained to characterlevel entry. This is a limitation in many text entry systems, one which limits speeds. To address this problem, we introduce a major extension to Trackball EdgeWrite that takes it from a character-level method to a word-level one. Our extension is called stroke-based word completion (Figure 1), and it allows users to complete words using strokes instead of selecting words

from a list. Stroke-based word completion is a generalization of a more specific technique of ours intended for styli [29] which relied extensively on "pigtail loops" as in-stroke delimiters. Trackball EdgeWrite, on the other hand, uses no loops, relying only on straight line segments, and includes context-based word predictions along with fixed frequency-based word completions.

To our knowledge, this is the first word prediction and completion system that is stroke-based instead of pointing-based, and therefore designed to leverage feel rather than sight. This gives Trackball EdgeWrite significant advantages over on-screen keyboards. Our results for a subject with spinal cord injury show that stroke-based word completion provides a 46.5% increase in speed (8.25 vs. 12.09 WPM), a 36.7% decrease in errors (6.24% vs. 3.95% total errors), and a 43.9% reduction in strokes compared to his prior peak performance with character-level Trackball EdgeWrite. Furthermore, word-level Trackball EdgeWrite is 75.2% faster (6.90 vs. 12.09 WPM) and 40.2% more accurate (6.60% vs. 3.95% total errors) than our subject's prior peak performance with his own preferred on-screen keyboard, which he has used for 15 years. Now he uses Trackball EdgeWrite with stroke-based word completion instead.

2. RELATED WORK

2.1 Trackball Mousing

Most studies show that for able-bodied users, trackballs are slower and less accurate than conventional mice for pointing, dragging, and steering [2,18,19]. However, relative to other devices, trackballs perform reasonably well for short straight ballistic movements when crossing a goal. Capitalizing on goal crossing was the rationale behind the Trackball EdgeWrite design.

Despite their inferior performance compared to mice for ablebodied users, trackballs are preferred by many people [10,33]. Reasons include that trackballs do not require the wrist or forearm to elevate. They also do not require much space, making them suitable for placement in a user's lap or on a wheelchair tray. Rolling a trackball requires little strength, and if clutching is necessary, one must only lift one's finger or hand, not the device itself. Furthermore, trackballs are simple, readily available, robust, and cheap. A benefit of having an integrated text entry method for trackballs is that users who already use trackballs for mousing do not have to switch to other devices when entering text.

2.2 Trackball Text Entry

Prior trackball text entry methods have mostly used on-screen keyboards. Although on-screen keyboards are easy to learn, they have many drawbacks. For instance, they exacerbate mouse travel to and from a document. They introduce a second focus-of-attention such that a user's eyes cannot remain on his or her document [4]. They also require repeated target acquisitions for which trackballs are not well suited. Furthermore, they are visually fatiguing, equivalent to typing in a "hunt-and-peck" fashion. Finally, they consume screen real estate, considerably reducing one's visual workspace and increasing the need for window management. Note that although word prediction systems may increase speed, they do not alleviate these concerns.

In contrast to on-screen keyboards, a few relevant stroke-based text entry methods have been devised. Besides Trackball

EdgeWrite [30], two other methods can be used with trackballs. One is Dasher [28], which allows any pointing device to enter text by moving through expanding letter regions whose sizes correspond to a letter's likelihood of entry. However, Dasher can be overwhelming because its letter regions continually rush toward the user. Another method is MDITIM [14], which defines letters using only the four cardinal directions. A drawback is that MDITIM's strokes generally do not resemble Roman letters.

2.3 Word-level Text Entry Methods

Researchers have noted that character-level entry is inherently limited [34]. As a result, recent attention has shifted to word-level techniques, in which single strokes or operations produce entire words. Cirrin [23] and Quikwriting [24] are two such techniques. In both designs, a person moves a stylus through fixed letter regions arranged around the periphery of a circle or square. These techniques are word-level in the sense that whole words are made in single (rather long) strokes, but each character within the word must still be acquired by the stylus.

An innovative approach to word-level stroking is SHARK [34], which presents a stylus keyboard over which strokes can be made. The shapes of these strokes are determined by the arrangement of letters on the keyboard. Users can gradually ramp from tapping words to stroking them, enabling higher speeds. This emphasis on gradual learning has been preserved in Trackball EdgeWrite's stroke-based word completion, since users can still stroke individual characters as they always have.

2.4 Word Prediction and Completion

A common approach to enhancing text input rates is to use word prediction and completion to populate a list with candidate words. Users select from the list to enter entire words or suffixes. Although the number of user actions is reduced, numerous studies show that additional perceptual and cognitive processes often make such systems slower instead of faster [11,16,25]. These findings highlight the challenge of designing effective word prediction and completions systems.

In the case of trackball text entry with an on-screen keyboard, candidate words appear as additional mouse targets, which further exacerbate mouse travel and the need for accurate target pointing. Although Anson et al. (2005) reported that word prediction and completion improved entry rates with on-screen keyboards, subjects reported high frustration because they disliked looking from their document to the word list and "felt that searching through the word list was tedious and distracting" [4]. With stroke-based word completion in Trackball EdgeWrite, we overcome the drawbacks of visuallyintensive word selection by providing a gestural alternative that performs as well or better.

3. STROKE-BASED WORD COMPLETION

3.1 Brief Overview of Trackball EdgeWrite

Until now, Trackball EdgeWrite has provided a character-level means of writing with a trackball. Trackball EdgeWrite allows users to "pulse" the trackball toward the four corners of a virtual EdgeWrite square (Figure 2, next page). As users connect these corners in patterns similar to Roman characters, letters are produced [32]. Segmentation between letters is achieved when force (i.e. motion) ceases on the trackball.

Figure 2. EdgeWrite gestures for "s" and "o". The stroke segments are arcs determined by the corners entered.

Figure 3. In pulsing to corners, users are performing goal crossings. This figure shows three crossings for "a". The first crossing determines the gesture's initial corner.

The fundamental concept underlying Trackball EdgeWrite is crossing [1] (Figure 3). Goal crossing contrasts with pointing, for which one must acquire an area target and remain within it. With crossing, one must only pass a goal line--like a football player scoring a touchdown--regardless of how fast one moves. The demand for accuracy can be lessened with crossing instead of pointing because one does not have to remain inside a target [3].

In Trackball EdgeWrite, when a user pulses the trackball towards a corner, he or she is performing a crossing task. Although the mouse cursor is invisible, it is actually crossing the circumference of a circle. The angle at which this occurs determines the intended corner. The benefit of this type of motion is that no targets must be pointed at and the mouse can move arbitrarily fast--all that matters are the angles formed by pulses on the trackball.

In Trackball EdgeWrite, to switch from mousing to writing, users can "capture" the mouse by pressing a button. Alternatively, a user can dwell in the corner of the desktop screen. When captured, the mouse cursor becomes invisible and subsequent trackball motion creates strokes within a revealed EdgeWrite square, sending text to the active application (e.g. Microsoft Word). When the user is done writing, a button press or dedicated stroke releases the cursor to resume mousing. Note that no buttons are necessary for writing--an advantage for motor-impaired users.

3.2 Word Completion in Trackball EdgeWrite

3.2.1 Interaction Design

A key aspect of our design is that character-level stroking remains unchanged from the prior version of Trackball EdgeWrite. This is important for two reasons: (1) it allows current users to remain effective with the software, and (2) it allows users to gradually ramp up to using word completions at their own pace.

When a user strokes, candidate words are shown at the four corners of the EdgeWrite square (Figure 4). In order to provide completions, the current stroke is recognized after each corner is entered. We call this continuous recognition feedback, which also displays the

Figure 4. As the user makes an "h", the system recognizes an "i" and "v" along the way, offering English frequency-based completions as each corner

is entered.

current stroke result in the center of the EdgeWrite square. Thus, the user knows what his or her stroke will produce before the stroke is segmented by a slight pause. If the user slips, he or she can simply restart the stroke before segmenting using a feature called nonrecognition retry [30].

After a stroke is segmented and a letter is produced, the user can continue stroking letters or, alternatively, make a short gesture to select a word. This gesture is a singular motion from the center of the EdgeWrite square to the corner containing the desired word.

In the event of an erroneous completion, the user can make a backspace stroke along the bottom of the square, undoing the selection and restoring the completions as they appeared before. This makes completions quickly undoable.

An important aspect of this design is that the same completion is always shown in the same corner for the same prefix. This is because completions are based only on English word frequencies, not on context. This consistency is important for enabling users to rely on the positions of words. For example, after stroking a "t", the word "the" is always shown in the lower-right corner (Figure 1). Thus, users can come to rely on the position of "the" and stroke it by feel rather than by sight. Zipf's law for language says that a small percentage of words make up a large percentage of written language [34,35], so an increase in the entry rates of a few words can produce an overall speed gain. The consistency of word positions also may reduce cognitive load as motor performance comes to dominate.

In addition to showing frequency-based word completions, Trackball EdgeWrite also shows context-dependent word predictions after a word ends. Word predictions are, by definition, contextual and thus cannot be stroked by feel.

3.2.2 Language Coverage

Trackball EdgeWrite's design for stroke-based word completion avoids high perceptual search times by showing only four words at a time, generally less than most word completion systems [16]. But how useful are only four words? To answer this question, we wrote a computer program to calculate the amount of language coverage obtained for 1-5 letter prefixes showing only four frequency-based completions per letter (Figure 5, next page). We used KuceraFrancis frequencies for the 17,805 most common English words [17]. According to the graph, users have a 49.0% chance of seeing their intended word after just one letter! After two letters, this climbs to 70.8%. After three, it's 89.3%. This is the Zipf's law effect [35].

As explained elsewhere [29], one can achieve a slightly higher language coverage by not re-showing the same word completions once they have been shown for a given word being entered. For example, when "t" is written, "the" is one possible completion. If an

"h" is written next, should "the" be re-shown? Or, since the user did not select "the", should a different word be shown in its place? In Trackball EdgeWrite, we implement the latter because users sometimes miss the initial appearance of the word they want, entering more letters than necessary. If completions are not reshown, then this behavior costs them their chance to select their desired word.

Figure 5. Coverage of the 17,805 most common English words [17] based on 1-5 letter prefixes and four frequency-based completions shown per entered

letter.

3.3 Implementation

Our word prediction and completion system has four main components: (1) a vocabulary list of words and frequencies, (2) an optional user-defined vocabulary list, (3) a trigram list with trigram frequencies, and (4) an adaptive bigram cache that stores a user's words at runtime. The first and second provide "fixed" frequencybased word completions as words are being made. The third and fourth provide context-dependent word predictions after a word has been completed (i.e. after a SPACE has been entered). The vocabulary list is stored in an alphabetically sorted array enabling binary search for fast lookups. Each array slot contains a word string and the word's frequency count. This is all the data necessary for fixed frequency-based word completions. Also in each slot is a hash table whose keys are word indices and whose values are a list of word indices. The slot's word string represents the first word of a trigram, its hash table keys represent second words, and its hash table list values represent third words. These data structures allow fast lookups for both fixed completions and contextdependent predictions. When a letter is entered, words that match the current prefix are gathered from the vocabulary list. If a user-defined vocabulary list is loaded, its words with matching prefixes are also gathered. These words are then sorted in a separate list according to their frequencies. The top four words are then offered as completions. Since frequencies are based on English, these four completions will always be the same for a given prefix. When four frequency-based words are retrieved from the language model, they are assigned to corners such that the highest priority word is given the corner in which the current stroke resides. The two adjacent corners receive the next two words, and the lowest

priority word is placed at the diagonal away from the stroke's current corner. Once a word has been shown, it is stored in a hash table along with its corner and a half-life. If a word is shown again, it will be shown in the same corner as it was before. If the word goes unused for awhile, it will "decay" and be eligible for reassignment. If a collision occurs with two words vying for the same corner, the highest priority word wins.

When a SPACE is entered, context-dependent predictions are offered. The most recent two words are used to look up possible third word predictions. The first word is found in the vocabulary array using binary search. The second word's index, which was found when the word was entered, is hashed upon in the first word's hash table. The value returned, if any, is a list of possible third words. The top four are shown as predictions.

Predictions also come from an adaptive bigram cache. The cache holds recent bigrams so that when a user enters a previously used word, words that followed it can be offered as predictions. The cache is a list maintained in priority order such that when a new bigram is entered or an old bigram reused, it is placed at the top. Unlike the trigrams, the adaptive bigram cache accommodates outof-vocabulary words, enabling the prediction of last names from first names, etc.

The English vocabulary list and trigrams were built by parsing 850MB of news articles from the Wall Street Journal, Ziff Davis, Los Angeles Times, and Associated Press. This parsing was carried out with the CMU-Cambridge Statistical Language Modeling toolkit [6]. Our own custom parsers then pared down the toolkit's results, keeping 20,000 of the most common words, and only trigrams that occurred 20+ times. After certain abbreviations were removed, the result was a 258KB vocabulary list of 19,122 words with frequency counts totaling 132,701,943. The maximum frequency count was for the word "the" at 7,686,122, or 5.79%. Our trigram list is 10.6MB and contains 517,988 trigrams with frequency counts totaling 40,230,622. The maximum frequency count is for the trigram "the United States" at 46,947, or 0.12%. Although we used news articles, our procedure could easily be re-run over other corpora (e.g. email).

Our stroke-based word prediction and completion system is part of an EdgeWrite library (DLL) that can be used with any .NET language. The library is built in C# and provides full EdgeWrite text entry in a few lines of code. Its API comes fully documented and is available for free at .

3.4 Theoretical Model

In our original discussion of Trackball EdgeWrite [30], we calculated a theoretical upper-bound speed based on the Steering law [1]. Using Fitts' coefficients based on prior studies, we calculated "perfect" character entry in Trackball EdgeWrite to be 23.1 WPM. Although this speed is probably unachievable, it is reasonable as an upper-bound in light of expert speeds with other unistroke systems [20].

We now extend this theoretical model to incorporate frequencybased word completions. Using the same Fitts' coefficients and formulae for calculating individual character speeds as before [30], we wrote a computer program to calculate WPM assuming that each completion is selected when it appears. We did this for all words in Trackball EdgeWrite's list of 19,122 words, a list large enough to contain most words used in everyday English.

We can calculate the speed Scps for our corpus using Equation 1:

S cps

=

wC

w +1 ?

Tw

Fw

?1000

(1)

Here, Scps is the weighted speed of text entry in characters per second (CPS), w is a word in corpus C with length |w|, Tw is the time to write word w in milliseconds (ms), and Fw is the frequency of word w such that Fw = 1.00. The "+1" in the numerator is for the space that is added after a completion is selected, and the "?1000"

converts from characters per ms (CPMS) to CPS.

To calculate Tw in ms for each word in the corpus, we need to calculate the time T to perform each letter wp, where wp is the minimum prefix that will show w as a completion (1 |wp| |w|). To this we add Tselect, the time to select the completion itself (Equation 2). Note that part of the time included in T and Tselect is , the segmentation time after a letter or completion is made. As in our

prior model, we use = 150 ms. Readers interested in the

calculation of T itself are directed to the prior model [30].

Tw

=

lw

Tl

p

+ Tselect

(2)

For words which themselves are prefixes of at least four other more common words (e.g. "ad"), there is no such wp that will show w as a completion. For these words, w must be entered fully along with a trailing space, which is modeled by Equation 3:

Tw

=

lw

Tl

+

Tspace

(3)

To convert Scps in Equation 1 from CPS to WPM, we use the standard definition of 5 characters per word:

S wpm

=

S cps

?

60 sec 1 min

?

1 word 5 chars

(4)

Using Equations 1-4, our model yields an upper-bound text entry rate of 52.5 WPM. This is 227% faster than the 23.1 WPM obtainable with only character-level strokes. Like before, this result is unachievable by a real user. It represents perfect entry, lacking considerations for hesitation, cognitive processes, visual search, slips, or mistakes. Still, it is useful as an upper-bound for theoretical comparisons with prior models.

For a better estimate, we can enrich our model by adding a term for visual search time based on the Hick-Hyman law [12,13]. This term Tn is added after the entry of every letter and represents the time it takes for a user to find their word amidst n choices, where n is the number of completions offered for the current prefix (0 n 4). Using the rationale from [26], our formula for Tn in ms is:

Tn = 0.2 ? log2 (n)?1000

(5)

Incorporating the Hick-Hyman law, Equations 2-3 become:

( ) Tw

=

lwp

Tl

+ Tn

+

Tselect

(6)

( ) Tw

= lw

Tl + Tn

+

Tspace

(7)

Using Equations 5-7, our result drops 36.2% from 52.5 WPM to 33.5 WPM. This is a more realistic result. Note, however, that even with the addition of visual search time, this result still represents perfect entry. This result is 45.0% faster than the 23.1 WPM result from the character-level model [30].

A limitation of this model is that it does not account for word prediction. However, modeling word prediction is more difficult because it depends on context, including the user's adaptive cache of recent words. Such a model is therefore beyond the current scope.

4. EMPIRICAL VALIDATION

In order to empirically test stroke-based word prediction and completion in Trackball EdgeWrite, we conducted two evaluations with a 15-year trackball veteran with a spinal cord injury. The first was a comparison to the WiViK on-screen keyboard. The second was an analysis of our subject's log files over two months of intermittent use.

4.1 Comparison to On-screen Keyboard

4.1.1 Subject

Our subject, who we will call "Jim," has had a spinal cord injury for over 15 years and has used a trackball for about as long. Although he also uses voice recognition, he is often dissatisfied with it and, until recently, has relied on an on-screen keyboard as a complementary method. His on-screen keyboard of choice has been the Microsoft Accessibility Keyboard, which does not have word prediction or completion. However, the keyboard does have useful visual feedback when hovering over keys, which Jim relies on to enter text since he cannot reliably click. About six months ago, Jim stopped using on-screen keyboards in favor of Trackball EdgeWrite, even before it had word completion capabilities.

Jim's best prior performance with character-level Trackball EdgeWrite was 8.25 WPM with 6.24% total errors, and with his onscreen keyboard was 6.90 WPM with 6.60% total errors. However, these entry rates seemed to be plateaus. Our goal was therefore to see how Jim's speeds would compare when these methods were given word prediction and completion.

4.1.2 Apparatus

Since Jim's preferred on-screen keyboard does not have word prediction, we configured the popular WiViK on-screen keyboard () to match Jim's desired settings: 550 ms dwell, about 650?250 pixels in size, and no "dead space" between keys. For word prediction and completion, WiViK uses a program called WordQ, which we loaded with the "US Advanced" database containing 15,000 words. WiViK shows a vertical 6-item word list to the left of the keyboard. The same action for selecting a key selects a word in the word list--in Jim's case, by hovering for 550 ms.

Jim keeps his monitor set to 800?600 resolution. Test phrases [21] were randomly presented using the TextTest program, which creates XML log files that can be analyzed with the StreamAnalyzer program [31]. StreamAnalyzer produces results according to the measures in [27,31].

4.1.3 Procedure

The study was a single-subject 2-factor design, with factors for method (WiViK, EdgeWrite) and word prediction (on, off). Jim did the word prediction versions second within both methods. A coin

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download