As soon as the connection is made the application program transmits a query to the server and waits to receive the results sent back by the server. During the process the query has to undergo different stages before it produces the output.
Here is the short description of the stages of query processing.
Parse Stage:
The parser stage checks the query transmitted by the application program for correct syntax and creates a query tree.
The rewrite system:
The rewrite system takes the query tree created by the parser stage and looks for any rules (stored in the system catalogs) to apply to the query tree. It performs the transformations given in the rule bodies.
The planner/optimizer:
The planner/optimizer takes the (rewritten) query tree and creates a query plan that will be the input to the executor.
The executor:
The executor recursively steps through the plan tree and retrieves rows in the way represented by the plan.
A detailed understanding of each step is as follows.
Select name, address from emp whre sno=1;
Once the session is established and query is accepted, the query undergoes parser stage.
The parser stage consists of two parts:
- Parser stage
The parser stage consists of two sub sections, One, the lexer(defined in the file scan.l) does pattern matching on the text: recognizing identifiers and other keywords, amongst other things. Each one is converted into a token. These tokens are then put through the parser(defined in the file gram.y) which assembles them into a parse tree.
i.e., Query -> Divide into tokens(scan.l) -> validate tokens for grammar rules(gram.y) -> If success, generate parse tree, if fail, return syntax error message.
- Transformation process.
During the transformation process, the parse tree undergoes semantic interpretation needed to understand which tables, functions, and operators are referenced by the query. It uses system catalogs here to do semantic check.
The data structure that is built to represent this information is called the query tree.
The parts of the query tree are
- the command type
- the range table
- the result relation
- the target list
- the qualification
- the join tree
- the others
The detailed description is shown with below example.,
i.e., parse tree -> semantic check using system catalogs -> query tree.
- Traffic Cop:
The traffic cop is the agent that is responsible for differentiating between simple and complex query commands. Transaction control commands such as BEGIN and ROLLBACK are simple enough so as to not need additional processing, whereas other commands such as SELECT and JOIN are passed on to the rewriter. This discrimination reduces the processing time by performing minimal optimization on the simple commands, and devoting more time to the complex ones.
- Rewrite system/rule system:
The rule system modifies queries to take rules into consideration, and then passes the modified query to the query planner for planning and execution.
For example.,
For the query SELECT * FROM shoelace;
The query tree should be something like this
SELECT shoelace.sl_name, shoelace.sl_avail,
shoelace.sl_color, shoelace.sl_len,
shoelace.sl_unit, shoelace.sl_len_cm
FROM shoelace shoelace;
and this is given to the rule system. The rule system walks through the range table and checks if there are rules for any relation. When processing the range table entry for shoelace (the only one up to now) it finds the _RETURN rule with the query tree:
SELECT s.sl_name, s.sl_avail,
s.sl_color, s.sl_len, s.sl_unit,
s.sl_len * u.un_fact AS sl_len_cm
FROM shoelace old, shoelace new,
shoelace_data s, unit u
WHERE s.sl_unit = u.un_name;
- Planner/Optimizer:
The task of the planner/optimizer is to create an optimal execution plan. The planner uses the internal statistics, determines the estimated cost and is sent to executor.
- Executor
The executor takes the plan created by the planner/optimizer and recursively processes it to extract the required set of rows.