- Name: Yi-Ming Chang
- UFID: 83816537
- Email: [email protected]
- A. Introduction
- B. Functions ProtoTypes and Descriptions
- C. Design Details
- D. Appendix
Firstly, we'll going to show the usage on executing the program. After execute the makefile, we can run the program by the command below.
./bplustree [input_file name]
The result will be store into output_file.txt.
This project builds a B+ tree by using C++. So, there are 3 major functionalities list as below:
- Maintain B+ Tree property
- Arrangement of the tree node's interior values
- Command execution
Accordingly, we seperate the project into 3 files
- bPlusTree.cpp - build to maintain the B+ tree structure.
- treeNode.cpp - focusing on the interior of a tree node.
- main.cpp - execute the input command, and output the result.
Moreover, the project files only exist in the first directory, with header files showing the parameters and functions of each class. The file list is showed as below:
π¦ BplusTree
β£ π bPlusTree.cpp
β£ π bPlusTree.hpp
β£ π treeNode.cpp
β£ π treeNode.hpp
β£ π main.cpp
------
β£ π constant.hpp // constant stored
β£ π bplustree // exec file
β π makefile
This section show the functions input output parameters and simply describe how the process works to use these functions to build the B+ tree.
Function Names | input | return | description | |
---|---|---|---|---|
a | bPlusTree | int degree | None | tree constructure |
b | init | int degree | None | init tree parameters |
Function Names | input | return | description | |
---|---|---|---|---|
a | searchLeaf | int key | treeNode* | traverse to the leaf contains the key |
b | search | int key | int success | output the value of the key |
c | searchRange | int start, int finish | int success | output the values in the range |
Function Names | input | return | description | |
---|---|---|---|---|
a | insertion | int key, double value | int success | insert data {key, value} |
The progress of this function is using 2-(a) searchLeaf to find the spot to insert the data. After the node insert 8-(b) insertLeafNode) or 8-(a) insertIndexNode, there might cause a node overfull, a new sibling nod will be created (See detailed in 8.) which will trigger a propagation insert up to the upper level node. . To simplify the propagation. We shown the flow chart as below.
graph LR
A{overfull?} --> | NO | A1[done]
A --> | YES | B[split]
B --> C[mid key abd new nodes] --> |insert| C1[parent key] --> |organize| D[check parent] --> A
As the flow char shown, the middle key will be insert into the parent. However, if the parent node is missing, parent node will have to be created. According to the overfull situation, the organize contains key and child pointers re-arrangement in the parent node. So, if the split node is INDEX node, we not only have to organize the key but also have to set some of the children pointer correctly.
Function Names | input | return | description | |
---|---|---|---|---|
a | deletion | int key, double value | int | delete data {key, value} |
b | borrowFromIndex | treeNode* parent, treeNode* deficient | bool | borrow key from sibling index node |
c | borrowFromLeaf | treeNode* parent, treeNode* deficient | bool | borrow key from sibling leaf node |
d | combineWithIndex | treeNode* parent, treeNode* deficient | bool | combine with index node |
e | combineWithLeaf | treeNode* parent, treeNode* deficient | bool | combine with sibling leaf node |
f | getInvalidParentKeyIdx | treeNode* parent, iterator changNodeIt | int distance | return the deficient index location of the key and child |
Also using 2-(a) searchLeaf to find the spot to delete the data. After the interior LEAF node delete 9-(a) deleteLeafNode, the LEAF node might be deficient, which will trigger the a propagation process fixing the deficient node. The flow chart shows a brief of the fixing process.
graph LR
A{deficient?} --> | NO | A1[done]
A --> | YES | B{borrow?}
B --> | Yes | B1[done]
B --> | NO | C[combine] --> D[check parent] --> A
Most importantly, combining with sibling might cause the parent becoming deficient, the process might propogates from leaf to root.
Function Names | input | return | description | |
---|---|---|---|---|
a | getTreeDegree | None | int degree | return tree degree |
b | printLeafList | None | None | print out leaf double link list |
c | printTree | None | None | print out the whole tree by DSF |
d | getRoot | None | treeNode* | return tree root |
Function Names | input | return | description | |
---|---|---|---|---|
a | treeNode | int degree, int key, bool insert | None | construct INDEX node |
b | treeNode | int degree, int key, double value, bool insert | None | construct LEAF node |
Function Names | input | return | description | |
---|---|---|---|---|
a | searchIndexNode | int key | treeNode* | return the index node |
b | searchLeafNode | int key | pair<bool, double> | return the data |
Function Names | input | return | description | |
---|---|---|---|---|
a | insertIndexNode | treeNode*, pair<int,double> | pair<int, treeNode*> | insert key into index node |
b | insertLeafNode | treeNode*, pair<int,double>, list<treeNode*> &leafList | pair<int, treeNode*> | insert key into leaf node |
c | getMiddleKey | None | map<int, double> | return the middle key-value after insert |
d | copyAndDeleteKeys | treeNode *newNode, iterator start, iterator end | int success | split occurs, copy keys to new node, delete keys from old |
e | copyAndDeleteChilds | int key | int success | split occurs, copy childs to new node, delete childs from old |
When split occurs, a new sibling node will be create. Then we copy the proper keys and childs to the new node, and remove them from the old node.
graph LR
A{overfull?} --> | NO | A1[done]
A --> | YES | B[split]
B --> C[create sibling] --> C1[copyAndDelete] --> D[ return push key and new node]
Function Names | input | return | description | |
---|---|---|---|---|
a | deleteLeafNode | int key | bool isDeficient | delete the key in data |
Function Names | input | return | description | |
---|---|---|---|---|
a | getIsLeaf | None | bool isLeaf | return tree node is LEAF or INDEX |
b | getKeyPairs | None | map<int,double>& | return tree node's key-values |
c | getChildPointers | None | vector<treeNode*>& | return tree node's child pointers |
Function Names | input | return | description | |
---|---|---|---|---|
a | printNodeKeyValue | None | None | print the node key-values |
This secection I'm going to show the data structures I used in bPlusTree and treeNode classes and why I use it.
class bPlusTree {
private:
int degree;
int minPairsSize;
treeNode *root;
vector<treeNode*> tracePath;
list<treeNode*> leafList;
}
-
degree=m - tree is an m-way B+ tree
-
minPairsSize - The minimum number of pairs for all nodes, which is ceil(m/2) - 1
-
*root - point to the root of the tree.
-
Use vector as stack, as we traverse down from root to leaf we push the nodes in the path into stack.
-
Use to construct double link list in the bottom of the tree.
In this part, the crucial decision is the data structure on using map for keyPairs and vector for childPointers.
class treeNode {
private:
/**
* Maximum keyPairs = degree-1;
* Minimum keyPairs = ceil(degree/2)-1
*/
bool isLeaf;
int degree;
int maxPairsSize;
int minPairsSize;
map<int,double> keyPairs;
vector<treeNode*> childPointers;
-
degree=m - set node degree to m way
-
maxPairsSize - The maximum number of pairs in the map keyPairs
-
minPairsSize - The minimum number of pairs in the map keyPairs
-
The reason to use map to store the the key-values
- To make the insert key order up, the complexity will be only O(logn)
- With N number insertions, the complexity will be O(n)
Since the map in C++ is build in red-black tree, this result will be better than other std containers insertions, which most of them need to be sort in O(nlogn).
-
Using vector is easily to insert and erase and access, since we will often have to re-organize the child pointers.
Many database engine uses B+ tree as there structure, for example InnoDB, Microsoft SQL server, and SQLite. The reason the DBMS often uses B+ tree as the engine structure is because of the properties of the tree.
- Balance tree with same tree height for all data nodes
- Leaf node is using double link list, which supports random access as well as sequential access.
- If we also update the index node value always match the data key, we also can easily to get a subtree of data by fast traverse.