有限状态自动机（Finite state automon）

来源：互联网发布：反转故事知乎神经病编辑：程序博客网时间：2024/04/27 14:20

A finite state automaton is a way of looking at a computer, a way of solving parsing problems and a way of designing special purpose hardware. The automaton at any time in is one of a finite number of internal states. It is a black box. You feed it input, usually a character at a time. It turns its internal crank and applies its rules and goes into a new state and accepts more input. Every once in a while it also produces some output. A simple parser might classify Java source as either java commands or comment. It would have states for having seen /,//, /*, /*… *, /*… */ etc.

ImplementationsBooksStrategyLinksSample Code

Implementations

Lexers and regex packages are tools for creating finite state automatons.

In the days of C, the state was stored as an integer 0 .. n. You had a complicated nest of switch statements that examined both the current state and the input to determine the next state.

In 1.4-, one way of writing finite state automata was to have a singleton class represent each possible state. There is a state variable that represents the current state. You feed the input to a standard method of the class’s interface and it computes the next state. That way states that are very similar can inherit default behaviours. You can have a static method to categorise the input and and a separate instance method to handle each categroy, or use a common method and a switch to dispatch to code based on input category to decide the next state. You don’t need any switch code based on current state. The dynamic method overriding features of Java handle that.

In Java version 1.5 or later, one way to write finite state automata is to use an enum constant to represent each state. A custom next method on each enum constant examines the input and calculates the next state. The various parsers used by JDisplay work using enums. The problem with this approach is you must usestatics for variables shared between states. You can’t instantiate several different finite state automata.

Finite state automata come in two flavours: DFA (Deterministic Finite Automaton) and NFA (Non-deterministic Finite Automaton). NFAs (Non-deterministic Finite Automatons) don’t always give the same answer.

You might wonder what use a non-deterministic program could be. Consider integration in two dimensions by the Monte Carlo method. All you need is a way of determining if a point is inside or outside the area you want to integrate. Then you generate random points and count how many are inside and how many outside. The ratio tells you the ratio of the area inside to outside which when applied to the total area gives you an approximation of the area of the region to integrate. You want a random pattern of test points. Any regular pattern could be thrown off by regularities in the shape of the region you are trying to integrate.

Strategy

Here is my strategy for writing finite state automata.

You need two classes, both enums. One has an enum for each category of character input you process and a method for categorising characters. The other has one enum for each state. Each enum constant implements an abstract method called nextState that takes the category of the next character and the next character as parameters and returns the enum constant for the next state. The nextState method may emit output.
Typically a nextState method will have a switch based on the category of the incoming chararacter.
You start by removing complications to the problem and working out the states and categories for just that simplification. Gradually add the complications which will usually require more categories and states.
If you see a combinatorial explosion in your number of states, factor your finite state automaton and track its state with two or more different state variables, often using simple auxiliary booleans booleans.
Write explicit descriptions of what it means when you are in each of your states. Proofread by focusing on each state and checking off the processing for each category of input, closing your eyes to everything else. It is amazing how easy it is to write code that works first time if you do this. Don’t use default to cover actual categories because it makes it too easy to forget a category.
Keep all your switch statements and states in standard order, usually alphabetical. This will make them easier to find, easier to compare and easier to check for inadvertent deletions.
Don’t implement any auxiliary routines you need, just write stubs. You may find as you work out the implementation, that you will make drastic changes to these methods. There is no point in implementing them until the design specification settles down.
Write some unit tests for your various auxiliary routines.
Put debug code in your main driver routine to dump the current state, the next character and the category of the next character. Run your automaton through on various test cases, just looking for anomalies.
Proof read your state machine by looking for symmetries and repeating patterns. Make sure deviations from the symmetries are intentional.
Test on a small document that contains all the corner (pathological cases) you can think of.
Test on real world documents. I forgot all about comments in my HTML (Hypertext MarkupLanguage) compactor finite state automaton on my first cut. I did not notice the problem until I started testing on real world documents.

Sample Code

Here is the source code for a the Compactor finite state automaton that removes excess white space fromHTML to make it more compact for transmission. Compactor: compacts a group of files, a single file or a string.

HTMLCharCategory: categorises characters.HTMLState: the finite state automaton that parse the HTML and decides which spaces and new view

0 0