T-Tree、T*-Tree的理解与简单内存数据库的实现

欢子 2019-07-01

章节目录

  1. T*-tree的介绍
  2. T*-tree节点与C语言实现
  3. T*-tree的插入、删除、查找与旋转
  4. 实现简单的key-value内存数据库
  5. 附件、代码
  6. 待加入功能
  7. 参考文献

T-tree和T*-tree极为相似,他们的不同主要是T×-tree的节点结构比T-tree多了一个指向successor的指针位,指向successor的指针的存在减小了树的寻找和遍历的时间复杂度.
注:本文中关于ttree的head file :ttree.h和ttree_defs.h来源于Dan Kruchinin <[email protected]>,Github:dkruchinin/libttree

T*-tree的介绍

在计算机科学中,T-tree是一种二叉树,它有一个左子树和一个右子树,由主存储器数据库使用,例如Datablitz,EXtremeDB,MySQL Cluster,Oracle TimesTen和MobileLite。T树是一种平衡的索引树数据结构,针对索引和实际数据都完全保存在内存中的情况进行了优化,就像B树是针对面向块的辅助存储设备(如硬盘)上的存储而优化的索引结构。 T树寻求获得内存树结构(如AVL树)的性能优势,同时避免它们常见的大存储空间开销。T树不保留索引树节点本身内的索引数据字段的副本。 相反,它们利用了这样的事实:实际数据总是与索引一起在主存储器中,因此它们只包含指向实际数据字段的指针。
T-Tree、T*-Tree的理解与简单内存数据库的实现

虽然T树以前被广泛用于主存数据库,但最近的研究表明它们在现代硬件上的表现并不比B树好。主要原因是现代的内存和硬盘的速度差异越来越大了,内存访问速度比硬盘访问速度快,并且CPU的核心缓存容量也越来越大。

T-Tree、T*-Tree的理解与简单内存数据库的实现T-Tree、T*-Tree的理解与简单内存数据库的实现

T*-tree的节点结构与C语言实现

T树节点通常由指向父节点,左右子节点,有序数据指针数组和一些额外控制数据的指针组成。 具有两个子树的节点称为内部节点(internal nodes),没有子树的节点称为叶子节点(leaf nodes),而仅具有一个子树的节点称为半叶节点(half-leaf nodes)。 如果值在节点的当前最小值和最大值之间,则该节点称为(bounding node)。对于每个内部节点,它的左子树中会有一个叶子节点或半叶子节点中有一个predecessor(称为最大下限-GLB(Greatest Lower Bound)),还在右子树中包含其最大数据值的后继者(LUB-lower upper bound)的节点(包含GLB或LUP的节点或许距离参照内部节点的距离很远,但也有可能恰好相邻。正因为T-tree的每个节点是有序的,并且不同节点之间保证左子树的数据都比节点数据的最小值小,右子树的数据都比节点数据的最大值大,因此B-tree最左边的节点中最左边的数据是整棵树最小的数据,最右边的节点中的最大值是整棵树最大的数据。叶子和半叶节点中的数据元素量在1~最大元素量之间,内部节点将其占用保持在预定义的最小元素量和最大元素数之间。如图:
T-Tree、T*-Tree的理解与简单内存数据库的实现

T-tree,T-treenode的插入、删除、旋转和查找代码来自于:Github:dkruchinin/libttree

typedef struct ttree_node {
        struct ttree_node *parent;     /**< Pointer to node's parent */
        struct ttree_node *successor;  /**< Pointer to node's soccussor */
        union {
            struct ttree_node *sides[2];
            struct  {
                struct ttree_node *left;   /**< Pointer to node's left child  */
                struct ttree_node *right;  /**< Pointer to node's right child */
            };
        };
        union {
            uint32_t pad;
            struct {
                signed min_idx     :12;  /**< Index of minimum item in node's array */
                signed max_idx     :12;  /**< Index of maximum item in node's array */
                signed bfc         :4;   /**< Node's balance factor */
                unsigned node_side :4;  /**< Node's side(TNODE_LEFT, TNODE_RIGHT or TNODE_ROOT) */
            };
        };

T*-tree的插入、删除、查找与旋转

插入

int ttree_insert(Ttree *ttree, void *item)
{
    TtreeCursor cursor;

    /*
     * If the tree already contains the same key item has and
     * tree's wasn't allowed to hold duplicate keys, signal an error.
     */
    if (ttree_lookup(ttree, ttree_item2key(ttree, item), &cursor) && ttree->keys_are_unique) {
        return -1;
    }

    ttree_insert_at_cursor(&cursor, item);
    return 0;
}

void ttree_insert_at_cursor(TtreeCursor *cursor, void *item)
{
    Ttree *ttree = cursor->ttree;
    TtreeNode *at_node, *n;
    TtreeCursor tmp_cursor;
    void *key;

    TTREE_ASSERT(cursor->ttree != NULL);
    //TTREE_ASSERT(cursor->state == CURSOR_PENDING);
    key = ttree_item2key(ttree, item);

    n = at_node = cursor->tnode;
    if (!ttree->root) { /* The root node has to be created. */
        at_node = allocate_ttree_node(ttree);
        at_node->keys[first_tnode_idx(ttree)] = key;
        at_node->min_idx = at_node->max_idx = first_tnode_idx(ttree);
        ttree->root = at_node;
        tnode_set_side(at_node, TNODE_ROOT);
        ttree_cursor_open_on_node(cursor, ttree, at_node, TNODE_SEEK_START);
        return;
    }
    if (cursor->side == TNODE_BOUND) {
        if (tnode_is_full(ttree, n)) {
            /*
             * If node is full its max item should be removed and
             * new key should be inserted into it. Removed key becomes
             * new insert value that should be put in successor node.
             */
            void *tmp = n->keys[n->max_idx--];

            increase_tnode_window(ttree, n, &cursor->idx);
            n->keys[cursor->idx] = key;
            key = tmp;

            ttree_cursor_copy(&tmp_cursor, cursor);
            cursor = &tmp_cursor;

            /*
             * If current node hasn't successor and right child
             * New node have to be created. It'll become the right child
             * of the current node.
             */
            if (!n->successor || !n->right) {
                cursor->side = TNODE_RIGHT;
                cursor->idx = first_tnode_idx(ttree);
                goto create_new_node;
            }

            at_node = n->successor;
            /*
             * If successor hasn't any free rooms, new value is inserted
             * into newly created node that becomes left child of the current
             * node's successor.
             */
            if (tnode_is_full(ttree, at_node)) {
                cursor->side = TNODE_LEFT;
                cursor->idx = first_tnode_idx(ttree);
                goto create_new_node;
            }

            /*
             * If we're here, then successor has free rooms and key
             * will be inserted to one of them.
             */
            cursor->idx = at_node->min_idx;
            cursor->tnode = at_node;
        }

        increase_tnode_window(ttree, at_node, &cursor->idx);
        at_node->keys[cursor->idx] = key;
        cursor->state = CURSOR_OPENED;
        return;
    }

create_new_node:
    n = allocate_ttree_node(ttree);
    n->keys[cursor->idx] = key;
    n->min_idx = n->max_idx = cursor->idx;
    n->parent = at_node;
    at_node->sides[cursor->side] = n;
    tnode_set_side(n, cursor->side);
    cursor->tnode = n;
    cursor->state = CURSOR_OPENED;
    fixup_after_insertion(ttree, n, cursor);
}

删除

void *ttree_delete(Ttree *ttree, void *key)
{
    TtreeCursor cursor;
    void *ret;

    ret = ttree_lookup(ttree, key, &cursor);
    if (!ret) {
        return ret;
    }

    ttree_delete_at_cursor(&cursor);
    return ret;
}

void *ttree_delete_at_cursor(TtreeCursor *cursor)
{
    Ttree *ttree = cursor->ttree;
    TtreeNode *tnode, *n;
    void *ret;

    TTREE_ASSERT(cursor->ttree != NULL);
    TTREE_ASSERT(cursor->state == CURSOR_OPENED);
    tnode = cursor->tnode;
    ret = ttree_key2item(ttree, tnode->keys[cursor->idx]);
    decrease_tnode_window(ttree, tnode, &cursor->idx);
    cursor->state = CURSOR_CLOSED;
    if (UNLIKELY(cursor->idx > tnode->max_idx)) {
        cursor->idx = tnode->max_idx;
    }

    /*
     * If after a key was removed, T*-tree node contains more than
     * minimum allowed number of items, the proccess is completed.
     */
    if (tnode_num_keys(tnode) > min_tnode_entries(ttree)) {
        return ret;
    }
    if (is_internal_node(tnode)) {
        int idx;

        /*
         * If it is an internal node, we have to recover number
         * of items from it by moving one item from its successor.
         */
        n = tnode->successor;
        idx = tnode->max_idx + 1;
        increase_tnode_window(ttree, tnode, &idx);
        tnode->keys[idx] = n->keys[n->min_idx++];
        if (UNLIKELY(cursor->idx > tnode->max_idx)) {
            cursor->idx = tnode->max_idx;
        }
        if (!tnode_is_empty(n) && is_leaf_node(n)) {
            return ret;
        }

        /*
         * If we're here, then successor is either a half-leaf
         * or an empty leaf.
         */
        tnode = n;
    }
    if (!is_leaf_node(tnode)) {
        int items, diff;

        n = tnode->left ? tnode->left : tnode->right;
        items = tnode_num_keys(n);

        /*
         * If half-leaf can not be merged with a leaf,
         * the proccess is completed.
         */
        if (items > (ttree->keys_per_tnode - tnode_num_keys(tnode))) {
            return ret;
        }

        if (tnode_get_side(n) == TNODE_RIGHT) {
            /*
             * Merge current node with its right leaf. Items from the leaf
             * are placed after the maximum item in a node.
             */
            diff = (ttree->keys_per_tnode - tnode->max_idx - items) - 1;
            if (diff < 0) {
                memcpy(tnode->keys + tnode->min_idx + diff,
                       tnode->keys + tnode->min_idx, sizeof(void *) *
                       tnode_num_keys(tnode));
                tnode->min_idx += diff;
                tnode->max_idx += diff;
                if (cursor->tnode == tnode) {
                    cursor->idx += diff;
                }
            }
            memcpy(tnode->keys + tnode->max_idx + 1, n->keys + n->min_idx,
                   sizeof(void *) * items);
            tnode->max_idx += items;
        }
        else {
            /*
             * Merge current node with its left leaf. Items the leaf
             * are placed before the minimum item in a node.
             */
            diff = tnode->min_idx - items;
            if (diff < 0) {
                register int i;

                for (i = tnode->max_idx; i >= tnode->min_idx; i--) {
                    tnode->keys[i - diff] = tnode->keys[i];
                }

                tnode->min_idx -= diff;
                tnode->max_idx -= diff;
                if (cursor->tnode == tnode) {
                    cursor->idx -= diff;
                }
            }

            memcpy(tnode->keys + tnode->min_idx - items, n->keys + n->min_idx,
                   sizeof(void *) * items);
            tnode->min_idx -= items;
        }

        n->min_idx = 1;
        n->max_idx = 0;
        tnode = n;
    }
    if (!tnode_is_empty(tnode)) {
        return ret;
    }

    /* if we're here, then current node will be removed from the tree. */
    n = tnode->parent;
    if (!n) {
        ttree->root = NULL;
        free(tnode);
        return ret;
    }

    n->sides[tnode_get_side(tnode)] = NULL;
    fixup_after_deletion(ttree, tnode, NULL);
    free(tnode);
    return ret;
}

查找

void *ttree_lookup(Ttree *ttree, void *key, TtreeCursor *cursor)
{
    TtreeNode *n, *marked_tn, *target;
    int side = TNODE_BOUND, cmp_res, idx;
    void *item = NULL;
    enum ttree_cursor_state st = CURSOR_PENDING;

    /*
     * Classical T-tree search algorithm is O(log(2N/M) + log(M - 2))
     * Where N is total number of items in the tree and M is a number of
     * items per node. In worst case each node on the path requires 2
     * comparison(with its min and max items) plus binary search in the last
     * node(bound node) excluding its first and last items.
     *
     * Here is used another approach that was suggested in
     * "Tobin J. Lehman , Michael J. Carey, A Study of Index Structures for
     * Main Memory Database Management Systems".
     * It reduces O(log(2N/M) + log(M - 2)) to true O(log(N)).
     * This algorithm compares the search
     * key only with minimum item in each node. If search key is greater,
     * current node is marked for future consideration.
     */
    target = n = ttree->root;
    marked_tn = NULL;
    idx = first_tnode_idx(ttree);
    if (!n) {
        goto out;
    }
    while (n) {
        target = n;
        cmp_res = ttree->cmp_func(key, tnode_key_min(n));
        if (cmp_res < 0)
            side = TNODE_LEFT;
        else if (cmp_res > 0) {
            marked_tn = n; /* mark current node for future consideration. */
            side = TNODE_RIGHT;
        }
        else { /* ok, key is found, search is completed. */
            side = TNODE_BOUND;
            idx = n->min_idx;
            item = ttree_key2item(ttree, tnode_key_min(n));
            st = CURSOR_OPENED;
            goto out;
        }

        n = n->sides[side];
    }
    if (marked_tn) {
        int c = ttree->cmp_func(key, tnode_key_max(marked_tn));

        if (c <= 0) {
            side = TNODE_BOUND;
            target = marked_tn;
            if (!c) {
                item = ttree_key2item(ttree, tnode_key_max(target));
                idx = target->max_idx;
                st = CURSOR_OPENED;
            }
            else { /* make internal binary search */
                struct tnode_lookup tnl;

                tnl.key = key;
                tnl.low_bound = target->min_idx + 1;
                tnl.high_bound = target->max_idx - 1;
                item = lookup_inside_tnode(ttree, target, &tnl, &idx);
                st = (item != NULL) ? CURSOR_OPENED : CURSOR_PENDING;
            }

            goto out;
        }
    }

    /*
     * If we're here, item wasn't found. So the only thing
     * needs to be done is to determine the position where search key
     * may be placed to. If target node is not empty, key may be placed
     * to its min or max positions.
     */
    if (!tnode_is_full(ttree, target)) {
        side = TNODE_BOUND;
        idx = ((marked_tn != target) || (cmp_res < 0)) ?
            target->min_idx : (target->max_idx + 1);
        st = CURSOR_PENDING;
    }

out:
    if (cursor) {
        ttree_cursor_open_on_node(cursor, ttree, target, TNODE_SEEK_START);
        cursor->side = side;
        cursor->idx = idx;
        cursor->state = st;
    }

    return item;
}

旋转

static void __rotate_single(TtreeNode **target, int side)
{
    TtreeNode *p, *s;
    int opside = opposite_side(side);

    p = *target;
    TTREE_ASSERT(p != NULL);
    s = p->sides[side];
    TTREE_ASSERT(s != NULL);
    tnode_set_side(s, tnode_get_side(p));
    p->sides[side] = s->sides[opside];
    s->sides[opside] = p;
    tnode_set_side(p, opside);
    s->parent = p->parent;
    p->parent = s;
    if (p->sides[side]) {
        p->sides[side]->parent = p;
        tnode_set_side(p->sides[side], side);
    }
    if (s->parent) {
        if (s->parent->sides[side] == p)
            s->parent->sides[side] = s;
        else
            s->parent->sides[opside] = s;
    }

    *target = s;
}

/*
 * There are two cases of single rotation possible:
 * 1) Right rotation (side = TNODE_LEFT)
 *         [P]             [L]
 *        /  \            /  \
 *      [L]  x1    =>   x2   [P]
 *     /  \                 /  \
 *    x2  x3               x3  x1
 *
 * 2) Left rotation (side = TNODE_RIHGT)
 *      [P]                [R]
 *     /  \               /  \
 *    x1  [R]      =>   [P]   x2
 *       /  \          /  \
 *     x3   x2        x1  x3
 */
static void rotate_single(TtreeNode **target, int side)
{
    TtreeNode *n;

    __rotate_single(target, side);
    n = (*target)->sides[opposite_side(side)];

    /*
     * Recalculate balance factors of nodes after rotation.
     * Let X was a root node of rotated subtree and Y was its
     * child. After single rotation Y is new root of subtree and X is its child.
     * Y node may become either balanced or overweighted to the
     * same side it was but 1 level less.
     * X node scales at 1 level down and possibly it has new child, so
     * its balance should be recalculated too. If it still internal node and
     * its new parent was not overwaighted to the opposite to X side,
     * X is overweighted to the opposite to its new parent side,
     * otherwise it's balanced. If X is either half-leaf or leaf,
     * balance racalculation is obvious.
     */
    if (is_internal_node(n)) {
        n->bfc = (n->parent->bfc != side2bfc(side)) ? side2bfc(side) : 0;
    }
    else {
        n->bfc = !!(n->right) - !!(n->left);
    }

    (*target)->bfc += side2bfc(opposite_side(side));
    TTREE_ASSERT((abs(n->bfc < 2) && (abs((*target)->bfc) < 2)));
}

/*
 * There are two possible cases of double rotation:
 * 1) Left-right rotation: (side == TNODE_LEFT)
 *      [P]                     [r]
 *     /  \                    /  \
 *   [L]  x1                [L]   [P]
 *  /  \          =>       / \    / \
 * x2  [r]                x2 x4  x3 x1
 *    /  \
 *  x4   x3
 *
 * 2) Right-left rotation: (side == TNODE_RIGHT)
 *      [P]                     [l]
 *     /  \                    /  \
 *    x1  [R]               [P]   [R]
 *       /  \     =>        / \   / \
 *      [l] x2             x1 x3 x4 x2
 *     /  \
 *    x3  x4
 */
static void rotate_double(TtreeNode **target, int side)
{
    int opside = opposite_side(side);
    TtreeNode *n = (*target)->sides[side];

    __rotate_single(&n, opside);

    /*
     * Balance recalculation is very similar to recalculation after
     * simple single rotation.
     */
    if (is_internal_node(n->sides[side])) {
        n->sides[side]->bfc = (n->bfc == side2bfc(opside)) ? side2bfc(side) : 0;
    }
    else {
        n->sides[side]->bfc =
            !!(n->sides[side]->right) - !!(n->sides[side]->left);
    }

    TTREE_ASSERT(abs(n->sides[side]->bfc) < 2);
    n = n->parent;
    __rotate_single(target, side);
    if (is_internal_node(n)) {
        n->bfc = ((*target)->bfc == side2bfc(side)) ? side2bfc(opside) : 0;
    }
    else {
        n->bfc = !!(n->right) - !!(n->left);
    }

    /*
     * new root node of subtree is always ideally balanced
     * after double rotation.
     */
    TTREE_ASSERT(abs(n->bfc) < 2);
    (*target)->bfc = 0;
}

static void rebalance(Ttree *ttree, TtreeNode **node, TtreeCursor *cursor)
{
    int lh = left_heavy(*node);
    int sum = abs((*node)->bfc + (*node)->sides[opposite_side(lh)]->bfc);

    if (sum >= 2) {
        rotate_single(node, opposite_side(lh));
        goto out;
    }

    rotate_double(node, opposite_side(lh));

    /*
     * T-tree rotation rules difference from AVL rules in only one aspect.
     * After double rotation is done and a leaf became a new root node of
     * subtree and both its left and right childs are half-leafs.
     * If the new root node contains only one item, N - 1 items should
     * be moved into it from one of its childs.
     * (N is a number of items in selected child node).
     */
    if ((tnode_num_keys(*node) == 1) &&
        is_half_leaf((*node)->left) && is_half_leaf((*node)->right)) {
        TtreeNode *n;
        int offs, nkeys;

        /*
         * If right child contains more items than left, they will be moved
         * from the right child. Otherwise from the left one.
         */
        if (tnode_num_keys((*node)->right) >= tnode_num_keys((*node)->left)) {
            /*
             * Right child was selected. So first N - 1 items will be copied
             * and inserted after parent's first item.
             */
            n = (*node)->right;
            nkeys = tnode_num_keys(n);
            (*node)->keys[0] = (*node)->keys[(*node)->min_idx];
            offs = 1;
            (*node)->min_idx = 0;
            (*node)->max_idx = nkeys - 1;
            if (!cursor) {
                goto no_cursor;
            }
            else if (cursor->tnode == n) {
                if (cursor->idx < n->max_idx) {
                    cursor->tnode = *node;
                    cursor->idx = (*node)->min_idx +
                        (cursor->idx - n->min_idx + 1);
                }
                else {
                    cursor->idx = first_tnode_idx(ttree);
                }
            }
        }
        else {
            /*
             * Left child was selected. So its N - 1 items
             * (starting after the min one)
             * will be copied and inserted before parent's single item.
             */
            n = (*node)->left;
            nkeys = tnode_num_keys(n);
            (*node)->keys[ttree->keys_per_tnode - 1] =
                (*node)->keys[(*node)->min_idx];
            (*node)->min_idx = offs = ttree->keys_per_tnode - nkeys;
            (*node)->max_idx = ttree->keys_per_tnode - 1;
            if (!cursor) {
                goto no_cursor;
            }
            else if (cursor->tnode == n) {
                if (cursor->idx > n->min_idx) {
                    cursor->tnode = *node;
                    cursor->idx = (*node)->min_idx + (cursor->idx - n->min_idx);
                }
                else {
                    cursor->idx = first_tnode_idx(ttree);
                }
            }

            n->max_idx = n->min_idx++;
        }

no_cursor:
        memcpy((*node)->keys + offs,
               n->keys + n->min_idx, sizeof(void *) * (nkeys - 1));
        n->keys[first_tnode_idx(ttree)] = n->keys[n->max_idx];
        n->min_idx = n->max_idx = first_tnode_idx(ttree);
    }

out:
    if (ttree->root->parent) {
        ttree->root = *node;
    }
}

实现简单的key-value内存数据库

实现简单的key-value内存数据库,用hashtable来链接key-value的关系。key不光插入到ttree中,而且还存到hash-table中。
hash_table采用了macro:hash-table(uthash.h)`uthash.h的帮助文档:macro:uthash.h帮助文档
hashkey-value对的插入:
插入之前先HASH_FIND_INT看看,key-value存不存在,如果不存在则可以插,存在的话不能插入。

void add_user(int user_id, char *name) {
    struct my_struct *s;
    HASH_FIND_INT(users, &user_id, s);  /* id already in the hash? */
    if (s==NULL) {
      s = (struct my_struct *)malloc(sizeof *s);
      s->id = user_id;
      HASH_ADD_INT( users, id, s );  /* id: name of key field */
    }
    strcpy(s->name, name);
}

解析存有key-value格式的文件

找到一个key-value格式的实例:xlarge.del
fopen()读取文件,读完之后 fclose()关闭。
因为待会儿要用strtok来拆开每一行,所以malloc个file_line

FILE * fp;
    fp = fopen("/home/vory/programing/c/key_value_mmdb/xlarge.del","r");
    file_line = malloc(1000 * sizeof(char));
    memset(file_line, 0, 1000 * sizeof(char));
    ......
    
    fclose(fp);

fgets获取每一行:

char * buf;
    buf = malloc(sizeof(input));
    memset(buf,0,sizeof(input));

    while(fgets(input ,256,fp)){

        strcpy(buf,input);
        ......
        
    }

strtok切割每一行为若干字符串。strtok将目标字符串的符合delim中的元素全部替换为'0',strtok用了之后,原来的目标代码就被破坏了,因此,新malloc一个接受复制的字符串,将目标字符串strcpy()之后,对复制的字符串进行strtok()操作。用完之后free()。

strcpy(buf,source_string);
        
        token = strtok(buf,delim);
//        printf("%s\n",token);
        parameter[parametercount] = malloc(sizeof(input));
        strcpy(parameter[parametercount],token);
        parametercount++;
        token = strtok(NULL,delim);
//        printf("%s\n",token);
        parameter[parametercount] = malloc(sizeof(input));
        strcpy(parameter[parametercount],token);

 实例的xlarge.del 文件的内容大概是这样:每一行分成两部分,KEY 和VALUE被逗号隔开着。

41231234,"Teenage Caveman"
3061234,"Banger Sisters, The"
18861234,"Hope Floats"
29381234,"No Looking Back"
1288,"Erotic Confessions: Volume 8"
2954,"Normal Life"
43901234,"Utomlyonnye solntsem"
20801234,"Island of Dr. Moreau, The"
3019,"One Hell of a Guy"
6712341,"Adventures of Pluto Nash, The"
33031234,"Pronto"
34701234,"Ripper, The"
106612341,"Devotion"
39481234,"Starship Troopers"
32381234,"Polish Wedding"
30551234,"Oscar and Lucinda"
42391,"Tomcats"
1661123411,"Gojira ni-sen mireniamu"
10611234,"Devil in a Blue Dress"
61612341,"Bully"
102612341,"Defenders: Taking the First, The"
1650,"Go Fish"
43512341,"Black Rose of Harlem"

strtok不光会把第一个delim(',')给标记成'0',而且会把所有的','都给标记为'0',当遇到这种行的时候,value就出问题了,得改。

6712341,"Adventures of Pluto Nash, The"
这行有俩逗号,strtok会把俩逗号都标记为'\0',得改。

解析从文件读来的每一行:每一行最多解析为2个参数([KEY] [VALUE])。

void parse_file(char * string){
    char * buf;
    char * delim;
    delim = NULL;
    delim = ",";
    char * token;
    token = NULL;
    
    buf = malloc(1000*sizeof(char));
    memset(buf,0, 1000*sizeof(char));
    
    if (!parf0){
        parf0 =malloc(500*sizeof(char));
    }
    
    memset(parf0,0, 500*sizeof(char));
    
    if (!parf1){
        parf1 =malloc(500*sizeof(char));        
    }
    
    memset(parf1,0, 500*sizeof(char));

    strcpy(buf, string);
    token = strtok(buf, delim);
    if(token != NULL) {
        strcpy(parf0, token);
    }
    token = strtok(NULL, delim);
    if (token != NULL){
        strcpy(parf1, token);
    }
    free(buf);
}

上面的解析从文件每一行的key-value会导致value不全。2019年3月20号,修改如下:

void parse_file(char * string){
    char * buf;
    char * delim;
    delim = NULL;
    delim = ",";
    char * token;
    token = NULL;

    buf = malloc(1000*sizeof(char));
    memset(buf,0, 1000*sizeof(char));

    if (!parf0){
        parf0 =malloc(500*sizeof(char));
    }
    
    memset(parf0,0, 500*sizeof(char));
    if (!parf1){
        parf1 =malloc(500*sizeof(char));
    }
    
    memset(parf1,0, 500*sizeof(char));
    strcpy(buf, string);
    strcpy(parf1 , strstr(buf,delim));


    token = strtok(buf, delim);
    if(token != NULL) {
        strcpy(parf0, token);
    }
    int fori = 0;
    for (fori = 0;fori < strlen(parf1) - 2;fori++){
        parf1[fori] = parf1[fori + 2];
    }
    
    parf1[fori -2] = '\0';
    free(buf);
}

即首先利用char * strstr()来返回第一个带有','的字符串,然后再用strtok标记逗号为'0',最后写一个简单的循环,让parf1的所有字符向前覆盖两个字符,将末尾字符赋'0',修改完成。

strtol将字符型的数据转换成long int型:
long int strtol(const char nptr,char *endptr,int base);
strtol不仅可以识别十进制整数,还可以识别其它进制的整数,取决于base参数,base为10,则识别十进制整数。

all_items[bcount].key = strtol(parf0,NULL ,10);
        bcount++;
        hash_user_id = strtol(parf0,NULL ,10);
        strcpy(hash_name,parf1);

解析从命令行输入的命令([COMMANDS] [KEY] [VALUE])

从文件读取每一行是 [KEY] [VALUE]的格式,但是从命令行读取是[COMMANDS] [KEY] [VALUE]的格式。
我将copy_string,par1,par2,par3定义在了别处。
这样就可以解析从stdin输入的句子了,被空格隔开的句子最多分为发出3路给par1,par2,par3。如果stdin输入的句子只包含了一个空格(也就是含有[COMMANDS] [KEY]的结构)则只能被分发为2路给par1和par2.
malloc()之后要free(),我在别处free() 了。

void parseinput(char *string){
    char * delim;
    delim = " ";
    copy_string = malloc(150*sizeof(char));
    memset(copy_string,0,150* sizeof(char));
    char * token;
    token = NULL;
    
    par1 = malloc(50*sizeof(char));
    par2 = malloc(50*sizeof(char));
    par3 = malloc(50*sizeof(char));
    memset(par1,0,50*sizeof(char));
    memset(par2,0,50*sizeof(char));
    memset(par3,0,50*sizeof(char));

    strcpy(copy_string,string);
    printf("%s is copystring .\n ",copy_string);
    printf("%s is string . \n",string);

    token = strtok(copy_string,delim);
    if (token != NULL){
        printf("%s is token1 \n",token);
        strcpy(par1,token);
    }

    token = strtok(NULL,delim);

    if (token != NULL){
        printf("%s is token2 \n",token);

        strcpy(par2,token);
    }
    token = strtok(NULL,delim);
    if (token != NULL){
        printf("%s is token3 \n",token);

        strcpy(par3,token);
    }
    free(copy_string);
}

设置枚举COMMANDS,配合switch语句来对不同COMMANDS进行处理

enum commands{
    FIND=0,
    INSERT,
    DELETE,
    PRINT,
};
switch (COMMANDS) {
            case FIND:
//                printf("NOW IN FIND SWITCH \n");
                finds = find_user(strtol(par2, NULL, 10));
                if (finds != NULL) {

                    strcpy(out_buff, finds->name);
                } else {
                    strcpy(out_buff, "key  NOT FOUND!");
                }
                printf("FIND OUT PUT IS :%s\n", out_buff);
                break;
            case INSERT:
                printf("RECEIVE INSERT\n");
                finds = find_user(strtol(par2, NULL, 10));
                if (finds == NULL) {

                    printf("The key you want to insert doesn't in MMDB\n .......Inerting now......\n");
                } else {
                    printf( "key already EXIST!!!\n!");
                    break;
                }
                *insertkey = strtol(par2, NULL, 10);
                printf("inserkey = %ld\n", *insertkey);
                ret = ttree_insert(&ttree, insertkey);

                strcpy(hash_name, par3);
                hash_user_id = strtol(par2, NULL, 10);
                add_user(hash_user_id, hash_name);
                if (ret < 0) {

                    fprintf(stdout, "Failed to insert  key %ld!\n",  strtol(par2,NULL,100));

                }else{
                    printf("SUCCESSFULLY INSERTED %ld",hash_user_id);
                }

                ////insert to ttree ,& insert to hash_table////
                break;
            case DELETE:
                *insertkey = strtol(par2, NULL, 10);

                finds = find_user(*insertkey);
                if(finds == NULL){
                    printf("KEY DOESN'T EXIT\n");
                    break;

                }
                else{
                    printf("key  %ld deleted ! ", *insertkey);
                }

                ttree_delete(&ttree, &insertkey);
                delete_user(finds);
                res = ttree_delete(&ttree, insertkey);
                if (res == NULL) {
                    printf("Failed to delete item %ld on step ",strtol(par2, NULL, 10));
                }

                break;
            case PRINT:
                printf("go print\n");


                for (s = test1_users; s != NULL; s = s->hh.next) {


                    memset(printid, 0, 500 * sizeof(char));
                    memset(printname, 0, 500 * sizeof(char));
                    strcpy(printname, s->name);

                    sprintf(printidstring,"%ld",s->user_id);
                    strcpy(printid, printidstring);
                    strcat(print_very_long, printid);;
                    strcat(print_very_long, printname);
                }
                printf("%s",print_very_long);

                break;
            default:
                printf("this is default\n");
                strcpy(out_buff, "switch go to default");
                break;
        }

初始化T*-tree

#define ttree_init(ttree, num_keys, is_unique, cmpf, data_struct, key_field) _ttree_init(ttree, num_keys, is_unique, cmpf, offsetof(data_struct, key_field))

int __ttree_init(Ttree *ttree, int num_keys, bool is_unique, ttree_cmp_func_fn cmpf, size_t key_offs);
...........

    ret = ttree_init(&ttree, 8, false, __cmpfunc, struct item, key);
    if (ret < 0) {
        fprintf(stderr, "Failed to initialize T*-tree. [ERR=%d]\n", ret);
        free(all_items);
        exit(EXIT_FAILURE);
    }

将读取的每一行插入t*tree,并将key-value插入hashtable

在一个循环中解析每一行,当真个文件的所有行都读完则跳出循环。

while (fgets(file_line, 1000, fp)) {
        parse_file(file_line);
        all_items[bcount].key = strtol(parf0, NULL, 10);
        hash_name = malloc(500 * sizeof(char));
        memset(hash_name, 0, 500 * sizeof(char));
        hash_user_id = strtol(parf0, NULL, 10);
        strcpy(hash_name, parf1);
        s = find_user(hash_user_id);
        if (s == NULL) { add_user(hash_user_id, hash_name); }
        free(hash_name);
        memset(file_line, 0, 1000 * sizeof(char));
    }
    
    
    for (i = 0; i < num_keys; i++) {
        ret = ttree_insert(&ttree, &all_items[i]);
        if (ret < 0) {
            fprintf(stderr, "Failed to insert item %d with key %ld! [ERR=%d]\n", i, all_items[i].key, ret);
            free(all_items);
            exit(EXIT_FAILURE);
        }
    }

打印出t*tree的所有key

for (i = 0; i < num_keys; i++) {
        printf("%ld ", all_items[i].key);
    }

给t*tree的所有key排序

从小到大排序,递归实现。

printf("\nSorted keys:\n");
    printf("{ ");
    tnode = ttree_node_leftmost(ttree.root);
    while (tnode) {
        tnode_for_each_index(tnode, i) {

            printf("%d ", *(int *) tnode_key(tnode, i));
        }
        tnode = tnode->successor;
    }
    printf("}\n");

程序结束前free(),释放内存空间

free(par1);
free(par2);
free(par3);
fclose(fp);
ttree_destroy(&ttree);
free(all_items);

附件&代码

github代码

本文章所有的代码都在我的github里:Slarsar's github/key_value_mmdb
关于T*tree的参考代码:Github/bernardobreder/demo-ttree
参考的ttree的headfile:Github/dkruchinin/libttree
hashtable的headfile:Github/troydhanson/uthash

待加入功能

数据库操作日志......

从不同的目录地址读入数据库......

数据库备份......

TCP/IP 远程操作......

参考文献

[1].Tobin J. Lehman and Michael J. Carey. 1986. A Study of Index Structures for Main Memory Database Management Systems. In Proceedings of the 12th International Conference on Very Large Data Bases (VLDB '86), Wesley W. Chu, Georges Gardarin, Setsuo Ohsuga, and Yahiko Kambayashi (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 294-303.
[2].Kong-Rim Choi and Kyung-Chang Kim, "T*-tree: a main memory database index structure for real time applications," Proceedings of 3rd International Workshop on Real-Time Computing Systems and Applications, Seoul, South Korea, 1996, pp. 81-88.
doi: 10.1109/RTCSA.1996.554964
[3].wikipidia about T-tree
[4].An Open-source T*-tree Library

相关推荐