Monday, June 29, 2009

Scalability - Increasing The Size Of The Problem

Heuristics can shave some time off a lengthy computation, but they cannot generally find the best solution. And when number of possibilities rises steeply, as it does in TSP, the solution may not be all that great. The ability of an algorithm to perform well as the amount of data rises is called scalability. An increase in data usually means an increase in the number of calculations, which results in a longer run time. To scale well, an algorithm's run time should not rise prohibitively with an increase in data.

Consider an algorithm that searches for every instance of a certain word in a document. The simplest algorithm would examine one word, see if there is a match, and then move on. If the size of the document is doubled, then the number of words will be about twice as large, and the search algorithm will take twice as long to search, all other things being equal. A document four times as large would take four times as long. This increase is linear - the increase in run time is proportional to the increase in data.

Now consider the brute force approach to solving a TSP. The number of possible routes rises at an astonishing rate with even a slight increase in the number of cities, which is a much worse situation than in the search algorithm. If the input of the search algorithm goes from four words to eight, then the time doubles; for brute force TSP calculations, an increase from four cities to eight increases the time by a factor of 1,680. The search algorithm scales at a tolerable level, but the brute force TSP is a disaster.

Advances in computer technology have resulted in blazingly fast computers, yet even with these speedy machines, algorithm scalability is important. Linear increases, such as the simple search algorithm displays, do not present much of a problem even with relatively large data sets, but this kind of tame behavior is the exception rather than the rule. The TSP is a notoriously difficult case, but there are many other problems that scale poorly. Even with heuristics and approximations, solving these problems when the data set is large takes an extraordinary amount of time, if it is possible at all. To the chagrin of scientists and engineers, many of the more interesting and important problems in science and engineering belong in this category.

There is another practical concern besides run time. Computers hold instructions and data in memory as the program runs. Although some computers have a huge amount of memory, it is still a limited resource that must be conserved. Scientists and engineers who share the use of the most advanced supercomputers are allotted only a certain amount of memory, and users who pay a supercomputer facility to use one of their machines are often charged for the memory as well as the time their programs consume.

Memory can also be an important factor in the run time of a program. Computer memory consists of fast devices such as random access memory (RAM) circuits that hold millions or billions of chunks of data known as bytes, so retrieving a piece of data or loading the next instruction takes little time. But if a program exceeds the amount of available memory, the computer must either stop or, in most cases, will use space on the hard disk to store the excess. Disk operations are extremely slow when compared to memory, so this procedure consumes a great deal of time. A personal computer, for example, should have ample memory, otherwise large programs such as video games will be uselessly slow even if the processor is the fastest one available.

The memory versus speed trade-off was more imperative back in the 1980s, when memory was measured in thousands of bytes rather than today's billions. Although memory is still an important factor in certain applications, the main issue in general is time. When a programmer examines an algorithm's efficiency, he or she is usually concerned with its speed.

An algorithm's complexity can be measured by its run time - this is called time complexity - or the amount of memory resources it uses, which is known as space complexity. But speed, like the amount of memory, depends on the computer - some computers are faster than others. A measure of run time will apply only to a certain machine rather than to all of them.

To avoid the need to analyze an algorithm for each kind of computer, a general measure, such as the number of steps required by the algorithm, can be used. The number of steps does not usually change very much from machine to machine, but it does depend on the size of the problem, as well as the specific outcome. For example, a search algorithm may end early if the target is found, truncating the rest of the search. In order to be as general as possible, an analysis of an algorithm usually considers the worst-case scenario, which requires the most steps. For the example of a search algorithm that quits when the target is found, the worst case is that the target is in the last piece of data, or is not found at all.

You will find interesting information about Emtek door hardware and Epilady hair removal at Kile's latest websites.

No comments:

Post a Comment