How DB runs your query in back end
If we run a query like
select student from Student db where student age >15
How it access the database files, B trees and indexes so on. Given query can be processed in multiple ways and those ways will have different costs such as time taken for run the query, access data, process data etc.
Basic steps in query processing
- Query plan : Input of SQL query
- Parser and Translator : Translate SQL to Relational algebra expression
- Relational — algebra expression : There can be one or many relational algebra expressions
- Optimizer : How relational algebra expression made efficient and to optimise it use statistics about data such as past history informations on what are attributes more used, tuples actually exist and so on
- Execution plan : Based on optimizer result we can have execustion plan with less cost. It contains what are the actions we need to do in terms of accessing the different indexes and different B+ tree nodes
- Evaluation engine : This gets the data from B+ trees and do the joining . merging and sorting operations in the execution plan.
- Out put : Results of the SQL query
Relational Algebra Vs Equivalant Expressions
Select Instructor.name
From Instructor
Where salary < 75000
Query 1 : Find salary index and get salary < 75000
Query 2 : Find Salary index and discard instructor with salary > 75000
So in here to find out the best execution plan we need to use Optimization.
Query optimization
Out of all equivalent evaluation plans we try to find the one that gives some minimum cost. Cost is estimated based on certain statistical information from the database catalog, number of tuples in the relation, size of tuples, attributes on which frequent conditions are tested.
Measures of Query Cost
Cost is generally measured as total elapsed time for answering query.
- Disk access
- CPU usage
- Network Communication
Typically disk access is predominant cost, and is also relatively easy to estimate. As we look into storage structure we may have typiocally magnetic disk which where the head has to move to the correct cylinder to find the block where the record can be located. This process is called seek.
Disk access measures :
- Number of seeks * average seek cost
- Number of blocks read * average block read cost
- Number of blocks written * average block write cost
Usually cost to wrtie a block is greater than cost to read a block. Normally data is read back after being written to ensure that the write was successful.
Warning : In here we ignore CPU costs for simplicity. But real systems do take CPU cost into account. In here this does not include cost of writing output to disk.
Hints : There are several algorithms can reduce disk IO by using extra buffer space. For example if one block has read in and we are just using one record of that in the next operation we have to use some records whihc is already existing in that block and that block is maintained in that buffer then we do not need to go back to disk and read the block once again.
So more of the buffer space that we can provide naturally the performance would become better but certainly the memory required for keeping the buffer would be higher and it also often difficult to decide as to whether particular block in the buffer or not.
Amount of real memory available to buffer depend on other concurrent queries and OS processes, known only during execution. Better too use worst case estimation.
Selection Algorithm
Linear search
Scan each file block and test all records to see whether they satisfy the selection condition.
If selection is on key attribute, search can stop on finding record.
Linear search can be applied regardless of
- Selection condition
- Ordering of record in the file
- Availability of indices
Binary Search
Binary search does not generally make sense since data is not stored consecutively except when there is a index available, and binary search requires more seeks than index search.
Index Scan (Search algorithms that use index)
If we have some index on B+ tree that we are going to use to do selection.
So if we are using Primary index and looking for equality on key. For retrieve a single record that satisfy the corresponding equality condition.
Retrieve Single record (Primary index, equality on key)
Height means number of blocks need to read so each one take transfer time + seek time, because they are not consecutively located. Height of B+ tree and additional one is used to get the data record read.
Retrieve Multiple records (Primary index, equality on non key)
Retrieve records on Secondary index (Secondary index, equality on non-key)
Retrieve single record if search key is candidate key
Retrieve multiple records if search key is not candidate key
Retrieve records on Comparison
Retrieve Records (Primary index, comparison)
In here it uses B+ tree to find the first element and after that it just transfer the records
Also we can just start sequentially based on primary index from the beginning
Retrieve Records (Secondary index, comparison)
In either case, retrieve records that are pointed to
- requires an I/O for each record
- Linear file scan may be cheaper
Select With Conjunctions
σθ1^ σθ2 ^ ……………. σθn(Relation)
Retrieve records (conjunctive selection using one index)
- Select a combination of θi and algorithms we discussed so far.
- Test other conditions on tuple after fetching it into memory buffer.
Retrieve records (conjunctive selection using composite index)
- Use appropriate composite (multiple-key) index if available.
Retrieve records (conjunctive selection by intersection of identifiers)
- Requires indices with record pointers
- Use corresponding index for each condition and take intersection of all obtained sets of record pointers
- Then fetch records from file
- If some conditions do not have appropriate indices, apply test in memory.
Select With Disjunctions
σθ1∨ σθ2 ∨……………. σθn(Relation)
Retrieve records (disjunctive selection by union of identifiers)
- Applicable if all conditions have available indices, otherwise we need to use linear scan
- Use corresponding index for each condition, and take union of all the obtained sets of record pointers
- After that we can fetch records from file.
Negation
- Use linear scan on file
- If very few records satisfy negation, also an index is applicable to θ. We need to find satisfy record using the index and fetch from file.
Sorting
If we build an index on the relation, and then we can use the index to read the relation in sorted order. This is the behaviour of B+ tree in the in order traversal. This may lead to one disk block access for each tupple at time. If all the records can fit into the memory then we can use some memory algorithm like quicksort, but often that will not be the case, because relations are much bigger. So what will have to do is will have to take recourse to external sort and merge strategy. This method is old strategy but very effective.
External sort merge
What we do is certainly we cannot read that whole realation in terms of memory into memory. So what we do we take different parts and say we are taking in groups of 3 for example.
merge passe1
We sort them in memory. We take record groups we can fit into memory and sort them. Once you sort them you can make these are sorted sub lists of the original set. After that you can merge them according to the merge strategy. After that you can write this back.
merge pass 2
Now we have bigger sorted lists. These are called runs. After that we merge them depending on the actual size of the file and the size of the memory directly fits in. We might have to do multiple runs until we get sorted output.
External sort merge estimations (M-memory size in page, i-sort runs )
- Repeatedly do
- Read M blocks of relation into memory
- Sort the in-memory blocks
- Write sorted data to run i; i++
2. Merge the runs (N<M)
- Use N blocks of memroy to buffer input runs, and 1 block to buffer output. Read the first block of each run into its buffer page.
- repeat : Select first record among all buffer pages. Write the recoird to the output buffer. If the output buffer is full write it to disk. Delete the record from its input buffer page, if the buffer page become empty then read the next block of the run into the buffer.
- Until all input buffer pages are empty.
Join Operations
Join operations Algorithms
- Nested loop join
- Block nested loop join
- Indexed nested loop join
- Merge join
- Hash Join
Nested -Loop Join
In here if we have 2 relations called r and s and we have condition θ and we are doing θ join.
The basic cartesian product is all records of r will have to be matched to all records of s. This can be done using nested for loops so for each tuple in r we try out each tuple n s take the tr, ts pair and if they satisfy the condition theta then the go to the output otherwise leave that or discard that.
Cost estimation
- nr is the number of records in the relation
- br is the number of blocks
For every record we have to access all the blocks. For every tuple of relation r we have to actually access all the blocks of relation s.
Optimization 1 :
If smaller relation can entirely fit into memory. We do not need to do the repeated read for both reations.
example:
If smaller relation(student) fits entirely in memory, the cost estimation will be 500 block transfers
Block Nested Loop Join
Cost :
Indexed nested loop join
Indexed lookups can replace file scan if join is equi-join or natural join and
an index is available on the inner relation’s join attribute. we can construct an index just to compute a join.
For each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple tr.
- Worst case: buffer has space for only one page of r, and, for each tuple in r, we perform an index lookup on s.
- Cost of the join: br (tT+ts) + nr* c
• Where c is the cost of traversing index and fetching all matching s tuples for one tuple or r
• c can be estimated as cost of a single selection on s using the join condition.
If indices are available on join attributes of both r and s, use the relation with fewer tuples as the outer relation.
Example
Duplicate Elimination
Projection
- Perform projection on each tuple
- Followed by duplicate elimination
Aggregation (Group by)
This can be implemented in a manner similar to duplicating elimination.