How To Break Software: James A. Whittaker [PDF]

  • 0 0 0
  • Gefällt Ihnen dieses papier und der download? Sie können Ihre eigene PDF-Datei in wenigen Minuten kostenlos online veröffentlichen! Anmelden
Datei wird geladen, bitte warten...
Zitiervorschau

Copyright  2000 by James A. Whittaker. All rights reserved. Please do not print, copy or distribute without permission of the author. This paper is work-in-progress and some sections are incomplete. Please check with the author if it is out-of-date, there may be a newer version.

How to Break Software James A. Whittaker Abstract— This paper describes a number of methods (called “attacks”) to expose design and development flaws in software. The attacks are manual, exploratory tests designed and executed on-the-fly with little or no overhead. The attacks were conceived after studying hundreds of real software bugs and generalizing their cause and symptoms. Two semesters of refinement at the hands of software testing students at the Florida Institute of Technology have identified dozens of approaches for attacking software with the intent of finding bugs. The attacks have been very successful, resulting in hundreds of additional bugs— all found as a direct result of the attack strategies— in a very short period of time with little or no familiarity with the products involved. This paper describes a subset of the attacks and demonstrates their use to find real bugs in released products. Introduction What is it that makes good testers good? What instinct do they possess that guides them so reliably to bugs? Is this valuable talent teachable? These questions are the subject of this paper. I believe that good testers are guided by more than instinct, indeed, it appears that over the years many testers build an arsenal of standard attacks. Each time they are faced with a new testing problem they orchestrate their attacks and invariably find bugs. Although these attacks strategies rarely get written down, they serve an important role in manual testing and in mentor-based training of new testers. We have begun the process of documenting this arsenal by studying real testers and real bugs. In this paper, we explore a subset of the existing attack strategies that have resulted from this work. Our next challenge is to begin the process of automating the attacks and to derive measures of their effectiveness. The attacks fall into at least one of three general categories: q

Input/output attacks

q

Data attacks

q

Computation attacks

Within each category specific types of attacks can be identified that yield very interesting software failures. In the sections that follow I describe a number of attack types from each category and include real bugs as examples. Each bug I demonstrate comes from products developed at Microsoft Corporation. This should not be viewed as an anti-Microsoft stance on my part. Indeed their reign as the top software company on this planet makes them an obvious target. But do not assume that Microsoft products are more buggy than those of other vendors. The attacks described in this paper have broken software from many vendors on almost every conceivable operating platform. My experience indicates that developers write bugs at a fairly uniform rate regardless of their application domain, which operating system they use or whether or not they publish their source code. Unless, of course, they are web developers, very little strategy is required to break web software… it is pretty good at dying all on its own. Attacking Input/Output Attacks on input and output are what most testers call “black box” testing because no information about internal data or computation is required to pull them off. Indeed, this is the most common type of testing because looking at source code is tedious, time consuming and largely unproductive unless you really know what you are looking for (we’ll discuss what you should be looking for in the next two sections). I/O attacks include the 10 items listed below, organized by input value, input value combination and input order. Discussion of each attack and example bugs are discussed next.

J. Whittaker is an associate professor and chair of software engineering at the Florida Institute of Technology ([email protected]). His research interests include the technical side of software engineering, specifically, coding, testing and dependability measurement. He generally steers clear of project management and software process.

How to Break Software

Page 1

6/12/00

Copyright  2000 by James A. Whittaker. All rights reserved. Please do not print, copy or distribute without permission of the author. This paper is work-in-progress and some sections are incomplete. Please check with the author if it is out-of-date, there may be a newer version.

1. Force all error messages to appear 2. Force default values to be assigned 3. Explore allowable character sets

Attacks by input value

4. Force output size to change 5. Overflow display areas Input/Output Attacks

6. Force screen refresh problems Attacks by input value combination

7. Force invalid output 8. Find inputs that cannot co-exist 9. Force invalid output

Attacks by input order

10. Repeat times

input

sequences

numerous

Attacks by Input Value This class of attacks require investigation of behavior using only single inputs or input values (in the case of variable input). We attempt to find a single value that breaks the application even though many other values work just fine. There are many ways to choose values other than simply finding the boundary between acceptable and unacceptable input, particularly if you want to find bugs that developers will fix and not pass off as undocumented “features.” I begin with simple but sometimes difficult to achieve advice: Make sure you see all the error messages. It’s amazingly difficult to make a program fail gracefully and such difficulty usually means bugs. Some error messages are no-brainers, simply pause execution to display the message and then continue on the next input or when a timer expires. However, other error messages result from an exception being thrown and an exception handler being executed. Exception handlers (or any centralized error routine) are problematic because the instruction pointer changes abruptly without corresponding changes to the data state. Suddenly, the exception handler is executing and all kinds of data problems can ensue: files could still be open, memory could still be allocated, data could remain uninitialized. When control once again returns to the main routine, it is hard to say at what point the error handler got called and what leftover side-effects might be waiting to trip up unwary developers: opening a file could fail because the file might already be open, or you might begin using data Figure 1: Error Message Appears without it being properly initialized. If we ensure that we’ve seen all the error messages and the system still works well, we’ve

How to Break Software

Page 2

6/12/00

Copyright  2000 by James A. Whittaker. All rights reserved. Please do not print, copy or distribute without permission of the author. This paper is work-in-progress and some sections are incomplete. Please check with the author if it is out-of-date, there may be a newer version.

done a huge service to our users (not to mention our maintenance developers). Figure 1 shows an interesting bug my students found in Microsoft Word 2000 in which an error message appeared twice in a row for no particular reason. This bug was found when attacking the error handling routines by investigating single values of inputs. Make sure you force the software to establish default values. Developers very often forget to establish proper default values when users enter data out of range or configure parameters improperly. Sometimes forcing defaults means doing nothing at all— an act that can trip up even good developers because it is so unexpected. For example, in Word 2000 the following dialog has an options menu that when left unchanged actually makes controls disappear when the dialog is redisplayed. Compare the dialog on the left with the one on the right. Notice any missing controls?

Sometimes forcing defaults requires changing values from their initial settings once and then changing them a second time to an improper configuration. These back-to-back changes ensure that the default settings can be reestablished once they are changed to other valid values. Explore allowable character sets for variable input. Some input values are simply problematic, particularly when you consider that special characters like $, %, #, quotation marks and so forth have special meaning in many programming languages and often require special handling when they are read as input. If the developer failed to consider this situation then these inputs may cause the program to fail when they are encountered. Force output size to change by replacing large inputs with small inputs and vice versa. Focusing on the disposition of output is a lucrative and little-used technique to find bugs. The idea is to think of an output or behavior that would signify a bug and then try to come up with the inputs that will force that behavior to occur. One convenient attack along these lines is forcing output areas to be recomputed by changing the length of inputs and input strings. A good conceptual example is setting a clock to 9:59 and watching it roll over to 10:00. In the first case the display area is 4 characters long and the second it is 5. Going the other way, we establish 12:59 (5 characters) and then watch the text shrink to 1:00 (4 characters). Too often developers write code to work with the initial case of a blank display area and are often disappointed when the display area already has data in it and new data of different size is used to replace it. For example, “WordArt” in PowerPoint has an interesting problem. Suppose we enter a long string as shown below. Notice that the entire string doesn’t display because it is so long. But that’s not what is really important. Two things went on when the OK button was pressed. First, the routine computed the size of the output field needed and then populated the field with the text we entered. Now let’s edit the string and replace it with a single character.

How to Break Software

Page 3

6/12/00

Copyright  2000 by James A. Whittaker. All rights reserved. Please do not print, copy or distribute without permission of the author. This paper is work-in-progress and some sections are incomplete. Please check with the author if it is out-of-date, there may be a newer version.

Notice that the display area stays the same size despite the fact that only one character is inserted and the font size was not changed. Let’s pursue this further. If we edit the string again and type a multi-line string the output is even more interesting. I think the point is made and we can move on to the next attack. Make sure you explore the edges of display areas. This is another attack based on outputs that is very similar to the previous attack. However, instead of looking for ways to cause the area inside the display to get corrupted, we are going to concentrate on outside the display area. This time we are going to do things we hope don’t require recalculation of the display boundaries but simply overflow them. Considering PowerPoint again, we can draw a textbox and fill it with a superscripted string. Changing the size of the superscript to a large font causes the top of the exponent to be truncated. This feature is demonstrated below in conjunction with the following related problem. Try to force screen refresh problems. This is a major problem for users of modern windows-based GUIs. It is an even bigger problem for developers: refresh too often and you slow down your application, failing to refresh causes anything from minor annoyances (i.e., requiring the user to force refresh) to major bugs (preventing the user from getting work done). The general idea in searching for refresh problems is to add, delete and move objects around on the screen. This causes the background object to redisplay and if it doesn’t do it properly and in a timely fashion, you have just found the classic refresh bug. It is a good idea to try varying the distance you move an object from its original location. Move it a little, then move it a lot; move it a once or twice, then move it a dozen times. Continuing with the large superscript example from above, try moving it around on the screen a little at a time. Note the nasty refresh problem shown below. Another recurring problem in Office 2000 associated with screen refresh is disappearing text. This is most annoying in Word just around the page boundaries.

How to Break Software

Page 4

6/12/00

Copyright  2000 by James A. Whittaker. All rights reserved. Please do not print, copy or distribute without permission of the author. This paper is work-in-progress and some sections are incomplete. Please check with the author if it is out-of-date, there may be a newer version.

Attacks by Input Value Combination The second class of I/O bugs deal with multiple inputs that are processed together or that influence one another. For example, a API that can be called with two parameters requires selection of values for one parameter based on the value chosen for the other parameter. Often it is the combination of values that was misprogrammed because of the complexity of the logic in the code. Find input value combinations that cannot coexist. So which combinations are problematic? This is an issue still being actively researched but an approach we have found to be especially effective is to determine an output you want to generate and then try to find input combinations that cause the output to occur. Try to make the target application produce an invalid output. This is a very effective attack for testers who really understand their problem domain. For example, if you are testing a calculator and understand that some functions have a restricted range for their result then trying to find input value combinations that force that result is a worthwhile effort. However, if you do not understand mathematics, then it is likely that such an endeavor will be a waste of time— you might even interpret an incorrect results as correct. Sometimes the window itself will give you clues about which inputs are interrelated. When this is the case, then testers can experiment with ranges of values and try to violate the stated relationship. Attacks by Input Order Software inputs form a formal language. Individual inputs make up the alphabet of the language and strings of inputs constitute sentences of the language. Some such sentences should be prevented by the interface via enabling and disabling of controls and input fields and this behavior can be tested by applying numerous strings of input and varying the order of inputs as much as possible. Select input strings that force invalid output. This is a good strategy for identifying problematic input sequences just as it is a good strategy for finding problematic input combinations as described above. For example, when we noticed the disappearing text problem in Office 2000 we formulated an attack on the title text box on PowerPoint slides. The following series of screen shots shows how a specific sequence of inputs causes the text to disappear.

It is interesting to note that just rotating the text box 180 degrees does not reveal the bug. One must follow the sequence of rotate commands described: rotate 10°(or more) followed by 180°. Undo-ing the sequence of operation does not correct the problem either, each time one clicks outside the title area, it disappears. The reason that input sequencing is such a bug-rich attack strategy is that many operations complete successfully but leave side-effects that cause future operations to fail. A thorough investigation of input sequences will expose many of these problems. Sometimes, the amount of variation with the input sequence doesn’t have to be particularly diverse in order to find a bug as the next attack shows.

How to Break Software

Page 5

6/12/00

Copyright  2000 by James A. Whittaker. All rights reserved. Please do not print, copy or distribute without permission of the author. This paper is work-in-progress and some sections are incomplete. Please check with the author if it is out-of-date, there may be a newer version.

Repeat the same input or input sequence over and over again. This has the effect of gobbling resources and stressing an application’s stored data space, not to mention uncovering undesirable side-effects. Unfortunately, most applications are unaware of their own space and time limitations and many developers like to assume that plenty of resources are always available. An example of this can be found in Word’s equation editor which seems to be unaware that it can only handle 10 levels of nested brackets. Attacking Data Data is the lifeblood of software; if you manage to corrupt it the software will eventually have to use the bad data and what happens then may not be pretty. So it is worthwhile to understand how and where data values are established. Essentially, data is stored either by reading input and then storing it internally or by storing the result of some internal computation. So it is through supplying input and forcing computation that we enable data to flow through the application under test. The attacks on data follow this simple fact as outlined below. 1. Force incorrectly typed data to be stored

Attacks by variable value

2. Force data values to exceed allowable range Data Attacks

3. Overflow input buffers Attacks by data element size

4. Force too many values to be stored 5. Force too few values to be stored

Attacks by data access

6. Find alternate ways to modify the same data

Attacks by Variable Value This class of attacks require investigation of the data type and allowable values associated with internally stored data objects. If one has access to the source then this information is readily available, however, significant type information can be determined through a little exploratory testing and attention to error messages. Vary the data type used in input fields to find type mismatches. Entering characters where the program expects integers (and similar attacks) have long proven fruitful but we have found that such attacks are less successful than before because of the ease at which type checking and type conversion are handled by modern programming languages. Try to exceed allowable ranges of data values. Variable data that is stored is subject to the same attacks as variable data entered as input. Attacks by Data Element Size The second class of data attacks is aimed at overflowing and underflowing data structures. In other words, the attacks attempts to find data that violates the predetermined size constraints of data objects. The first such attack is the classic buffer overflow. Try to overflow input buffers. This idea here to enter long strings to overflow input buffers. This is a favorite attack by hackers because sometimes the application is still executing a process after it crashes. If a hacker attaches an executable string to the end of the long input string, the process may execute it. A buffer overflow in Word 2000 is one such exploitable bug. The bug is in the Find/Replace feature is shown below. It is interesting to note that Find field is properly con-

How to Break Software

Page 6

6/12/00

Copyright  2000 by James A. Whittaker. All rights reserved. Please do not print, copy or distribute without permission of the author. This paper is work-in-progress and some sections are incomplete. Please check with the author if it is out-of-date, there may be a newer version.

strained but the Replace field is not. Force too many values to be stored in a data structure. Complex data structures such as arrays, matrices and lists are subject not only to attacks concerning the values stored but also the number of values stored. Force a data structure to attempt storing too few values. When data structures allow information to be both added and removed, an often successful attack is to make n adds followed by or intermixed with n-1 removes. Attacks by Data Access My friend Alan Jorgensen likes the phrase “the right hand knoweth not what the left hand doeth” to describe this class of bugs. The idea is simple and developers leave themselves wide open to this attack: in most programs there are lots of ways to do almost anything. What this means to testers is that the same function can be invoked from numerous entry points, each of which must ensure that the initial conditions of the function are met. An excellent example is the crashing bug my student found in PowerPoint regarding the size of tabular data. The act of creating the table is constrained to 25×25 as the maximum size. However, one can create such a table, then add rows and columns to it from another location in the program— crashing the application. The right hand knew better than to allow a 26×26 table but the left hand wasn’t aware of the rule. Attacking Computation 1. Force computation with illegal operand

Attacks by operand

2. Find illegal operand combinations Computation Attacks

3. Force a computation result to be too large

Attacks by result

4. Force a computation result to be too small Attacks by feature interaction

5. Find features that share data poorly

Attacks by Operand This class of attacks require investigation of the data type and allowable values associated with operands in one or more internal computations. If one has access to the source then this information is obtainable. Otherwise, testers must do their best at determining what computation is taking place and what type of data is being used. Try to make a computation occur with an illegal operand. Sometimes inputs or stored data are well within the legal boundaries but are illegal for some types of computation. Division by zero is a good example. Zero is a valid integer but invalid as the denominator of a division computation. Try to find a combination of operands that cannot coexist. Computations that have more than one operand are subject to not only the above attack but also to potential operand conflict. Attacks by Result The second class of computation attacks is aimed at overflowing and underflowing data objects that store computation results. Try to force the computation of a result that is too large to store. Even simple computations like y=x+1 are problematic around boundary values. If both x and y are 2 byte integers and x has the value 32768 then this computation will fail because the result will overflow its storage. Try to force the computation of a result that is too small to store. Same as above but use y=x-1 and assign x the value –32767. Attacks by Feature Interaction This last attack category discussed in this paper is perhaps the granddaddy of them all and the one that separates testing novices from the pros: feature interaction. The problem here is nothing new: that different application features share the same data space and either through differing assumptions about the disposition of the data or through the generation of undesirable side-effects, the interaction of the two features causes the application to fail. But which features share data and could interpret it in conflicting ways is an open question in testing. Right now we are stuck with trial and error. So this example must suffice.

How to Break Software

Page 7

6/12/00

Copyright  2000 by James A. Whittaker. All rights reserved. Please do not print, copy or distribute without permission of the author. This paper is work-in-progress and some sections are incomplete. Please check with the author if it is out-of-date, there may be a newer version.

This example shows an unexpected result when combining footnotes and dual columns on a single page in Word 2000. The problem is that Word computes the page width of a footnote from the reference point of the note. Thus, if one has two footnotes on the same page, one referenced from a dual column location and one from a single column location, the single column footnote pushes the dual column footnote to the next page. Also pushed to the next page is any text between the notes reference point and the bottom of the page. The following screen shots illustrate the problem vividly. Where is the second column of text? On the next page along with the footnote. Can you live with the document looking like this? You’ll have to unless you find a workaround (which means time spent away from preparing your document). Conclusion Simply going through the 21 attacks outlined above should exercise a great deal of an application’s functionality. Indeed, staging a successful attack usually means experimentation with dozens of possibilities and pursuing a number of dead-ends. But just because some of this exploration doesn’t find bugs does not mean that it is not useful. First of all, the time spent using the application familiarizes testers with the range of possible functionality and leads to new ideas for additional attacks. Second, successful tests are good news! They indicate that a product is reliable: particularly if that set of tests are malicious attacks as outlined above. If code can withstand this treatment, it may very well withstand whatever users can dish out. Also, never underestimate the value of having a concrete goal in mind when you are testing. I’ve seen too many testers waste time poking at a keyboard or making random API calls hoping something breaks. Staging attacks means formulating clear goals— based specifically on things that could go wrong— and then designing the tests that investigate the goal. This way, every test has a purpose and progress can be readily monitored. Finally, remember always that testing should be fun. The attack analogy supports this good natured view of testing and adds a little more spice to a very enjoyable pastime. Happy hunting! Want more information about Dr. Whittaker’s research? The following papers are available through published literature sources. Some of these are posted on http://se.fit.edu. J. A. Whittaker, “What is software testing. And why is it so hard,” IEEE Software, 17, 1 pp. 70-79, (2000). J. A. Whittaker and A. Jorgensen, “Why software fails,” ACM SIGSOFT Software Engineering Notes, 24, 4, (1999). J. A. Whittaker, “Stochastic software testing,” The Annals of Software Engineering, 4, pp. 115-131 (1997). J. A. Whittaker and M. G. Thomason, “A Markov chain model for statistical software testing, IEEE Transactions on Software Engineering, 20, 10, pp. 812-824 (1994). J. A. Whittaker and J. H. Poore, “Markov analysis of software specifications,” ACM Transactions on Software Engineering and Methodology, 3, 1, pp. 93-106 (1993). J. A. Whittaker and M. Al-Ghafees, “Selecting software test data using black-box data flow information,” submitted to ACM Transactions on Software Engineering and Methodology. J. A. Whittaker and J. M. Voas, “Toward a more reliable theory of software reliability,” submitted to IEEE Computer. J. A. Whittaker, “Software’s invisible users,” submitted to IEEE Software.

How to Break Software

Page 8

6/12/00