Binary Search Trees란 무엇인가?

* 해당 글은 학과 수업으로 배운 내용과 코드가 포함되어 있으며 개인적 공부 목적으로 업로드하였습니다.

n entries를 가진 다양한 dictionaries들의 시간 복잡도를 우리는 아래와 같이 정리할 수 있다.

	Unsorted Array	Array sorted by key	Linked list	Hash Table
lookup	O(n)	O(logn)	O(n)	O(1)
insert	O(1)	O(n)	O(1)	O(1)

위 표를 살펴보면 우리는 hash dictionary가 가장 efficient하게 보인다.

-Average: O(1)

-Amortized: O(1)

하지만 가장 작은 key를 가진 entry를 찾는 것 등의 operation은 O(n)의 cost가 필요하다

이버 공부에 우리의 목적은 lookup, insert 그리고 find_min에서 worst-case 복잡도가 항상 O(log n)으로 보장되도록 하는

data structure를 찾자

binary search

우리가 어떤 array에서 binary search로 특정 숫자를 찾으려고 할때 우리는 언제나 midpoin에서 시작한다.

예를 들어 sorted array가 [-2,0,4,7,12,19,22,42,65]라고 하자.

midpoint value가 12인데 이때 우리는 3가지 경우의 수를 맞이한다.

1) x=12 ( 더 찾아나갈 필요가 없다)

2) x<12

3) x>12

2) x <12라면 우리는 왼쪽 절반 array에서 다시 midpoint(=4)를 찾게 된다.

3) x>12 라면 우리는 오른쪽 절반 array에서 다시 midpoint(=42)를 찾게 된다.

2번째 케이스인 x<12라고 해보자.

우리는 midpoint (=4) 를 체크하게 된다.

2-1) x=4 ( 더 찾아나갈 필요가 없다.)

2-2) x<4라면 다음 midpoint(=0)

2-3) x>4라면 x = 7

2번째 케이스인 x<4라고 해보자

x=0이라면 더 찾아나갈 필요가 없고 x<0이라면 x=-2가 된다.

우리는 binary search를 통해서 볼 수 있는 이 모든 과정을 map할 수 있다.

array는 모든 element에 접근할 수 있도록 하지만 위의 tree에서는 그렇지 않다.

각 element를 two pointers로 pair up 해서 같은 access pattern을 얻을 수 있다. 따라서 직접적인 access는 가지지 못한다.

이 tree구조를 우리는 type declaration을 통해서 이 아이디어를 직접 구현할 수 있다.

typedef struct tree_node tree;
struct tree_node{
   tree* left;
   int data;
   tree* right;
}

이를 우리는 node라고 부른다.

그리고 memory에서 이 data arrangement를 우리는 tree라고 부른다.

tree를 코드로 구현할때 우리는 아래와 같이 구현할 수 있다.

tree* T = alloc(tree);
T->data = 12;
T->left = alloc(tree);
T->left->data = 4;
T->right = alloc(tree);
T->right->data = 42;
...

빈 left/right node는 어떻게 표현해야 할까?

-> 우리는 이를 NULL로 표현한다.

->각 left/right pointers는 NUll-terminated list가 될 것이다.

Searching

위의 tree구조에서 elemetn를 찾을 때 우리는 binary search 와 같은 step을 가지게 된다.

-> cost : O(log n)

Inserting

Insertion 역시 binary search와 같은 step을 가지면서 올바른 위치를 찾으면 insert하면 된다.

->cost: O(log n)

Finding the smallest key

가장 작은 key를 찾기 위해서는 우리는 계속해서 left nodes로 내려가면 된다

만약 array에 n개의 element가 있다면 우리는 가장 left node까지 O(log n)만큼 갈 수 있다.

-> cost: O(log n)

이 arrangement of data를 우리는 tree 라고 부른다.

-> 각 item은 node라고 부르며 node에 달려있는 tree의 일부분을 branch(subtree)라고 부른다

tree의 가장 윗부분에 있는 node를 root라고 부르며 가장 아래에 있는 node들을 leaves라고 부른다.

그리고 그 사이에 있는 모든 nodes를 inner nodes라고 부른다.

하나의 node에서 왼쪽의 node를 left child라고 부르며 오른쪽의 node를 right child이며 그 node는 parent라고 불리게 된다.

그림으로 종합해보면 아래와 같다.

Tree는 다양한 형태로 보여질 수 있다.

1) empty tree

->tree가 empty일 경우 EMPTY라고 표현한다.

2) root를 기준으로 왼쪽 tree와 오른쪽 tree가 있는 경우

(아래와 같이 그려진다)

우리는 이때에도 tree invariant를 적용할 수 있다. 두가지 케이스 모두 tree_node struct에서 data field가 NULL일 수 없다.

이를 통해 우리는 is_tree function을 recursive하게 쓸 수 있다.

-> base case: empty tree

-> recursive case: non-empty tree

bool is_tree(tree* T){
  //code for emtpy tree
  if (T==NULL) return true;
  
  //code for non-empty tree
  return is_tree(T->left) 
       && T->data !=NULL;
       && is_tree(T->right);

Binary Search Tree

우리가 lookup에서 binary search를 사용했기 때문에 tree의 data는 ordered되어야 한다.

-왼쪽 subtree에 더 작은 값들, 오른쪽 subtree에 더 큰 값들이어야 한다.

이렇게 ordered 된 tree를 우리는 binary search tree라고 부른다.

따라서 주어진 tree가 binary search tree인지 확인하는 함수는 아래와 같이 작성할 수 있다.

bool is_bst(tree* T){
  return is_tree(T)
       && is_ordered(T);
}

BST lookup implementation

tree structure를 생각했을 때 이 역시 recursive function이 될 수 있다.

base case: empty tree여서 key를 찾을 수 없을 때

recursive case: 1) root가 key를 가지고 있다면 더 이상 찾지 않아도 된다.

2) key가 root 보다 작다면 root의 왼쪽으로 간다

3) key가 root보다 크다면 root의 오른쪽으로 간다.

entry bst_lookup(tree*T, key k)
//@requires is_bst(T);
{
   //code for empty tree
   if (T==NULL) return NULL;
   
   //code for non-empty tree
   if (k==T->data) return T->data;
   if (k<T->data) return bst_lookup(T->left, k);
   //@assert k > T->data;
   return bst_lookup(T->right, k);
 }

하지만 < 나 > 는 오직 integers에서만 가능하다.

따라서 dictionary 에서 entries 를 저장하고 keys를 저장할 수 있어야 한다.

이에 따라 1) entry에서 key를 뽑는 함수와 2) 두가지 key를 비교하는 함수가 필요하다

이에 따라 client interface에서 client에세 entry 와 key type을 요청하고 entry에서 key를 뽑는 함수와 두가지 key를 비교하는 함수를 결정해야 한다.

그리고 이 함수들을 통해서 lookup function을 다시 작성할 수 있다.

entry bst_lookup(tree* T, key y)
//@requires is_bst(T);
//@ensures \result== NULL || key_compare(entry_key(\result, k)==0);
{
   //code for emtpy tree
   if (T==NULL) return NULL;
   
   //code for non-empty tree
   int cmp = key_compare(k, entry_key(T->data));
   if (cmp==0) return T->data;
   if (cmp<0) return bst_lookup(T->left, k);
   //@assert cmp >0;
   return bst_lookup(T->right, k);
}

우리는 앞에서도 언급했듯이 binary search tree는 왼족에서 오른쪽으로 갈수록 값이 커져야 한다.

이를 코드에서 구현도 가능하다

bool is_ordered(tree* T)
//@requires is_tree(T);
{
   //code for empty tree
   if (T==NULL) return true;
   
   //code for non-empty tree
   return (T->left == NULL || T->left->data < T->data)
       && (T->right == NULL || T->data < T->right->data)
       && is_ordered(T->left)
       && is_ordered(T->right);
}

위 코드만을 보면 아래의 tree는 true를 보여주지만 실제로는 ordered tree가 아니다.

제대로 된 ordered tree가 되기 위해서는 root를 기준으로 모든 left subtree (TL)이 root(y)보다 작아야 하고 root(y)는 모든 right subtree(TR)보다 작아야 한다.

TL < y < TR

이를 구현하기 위해서 우리는 2개의 helper function이 필요하다

- gt_tree: y > TL

-lt_tree: y < TR

bool gt_tree(key k, tree* T)
//@requires is_tree(T);
{
   // code for empty tree
   if (T==NULL) return true;
   
   //code for non-empty tree
   return key_compare(k, entry_key(T->data)) >0
       && gt_tree(k, T->left)
       && gt_tree(k, T->right);
 }

gt_tree는 O(n)의 cost가 필요하다

-k를 tree의 모든 node와 비교해야하기 때문이다.

이 helper functions를 이용해서 우리는 is_ordered function을 작성할 수 있다.

bool is_ordered(tree* T)
//@requires is_tree(T);
{
  //code for empty tree
  if (T==NULL) return true;
  
  //code for non-empty tree
  key k = entry_key(T->data);
  return is_ordered(T->left) && gt_tree(k, T->left)
      && is_ordered(T->right) && lt_tree(k, T->right);
 }

gt_tree와 lt_tree를 각 node 마다 적용하기 때문에 O(n^2)의 cost가 든다.

혹시 더 efficient하게 할 수 있을까?

어떠한 key k를 보든 항상 특정 [lo, hi] range안에 둔다면 그 range안에서만 보기 때문에 cost가 줄어든다.

bool is_ordered(tree* T)
//@requires is_tree(T);
{
  //code for empty tree
  if (T==NULL) return true;
  
  //code for non-empty tree
  key k = entry_key(T->data);
  return (lo == NULL || key_compare(entry_key(lo),k) <0)
      && (hi == NULL || key_compare(k, entry_key(hi)) <0) 
      && is_ordered(T->left,lo,T->data)
      && is_ordered(T->right,T->data,hi);
 }

O(n) 의 cost가 든다.

이에 따라 is_bst도 update해야 한다.

bool is_bst(tree* T){
  return is_bst(T) 
     && is_ordered(T, NULL, NULL);
}

BST insert implementation

새로운 entry를 insert하기 위해서는 그 entry가 들어갈 새로운 node가 만들어져야 한다.

void bst_insert(tree* T, entry e)
//@requires is_bst(T) && e!=NULL
{
  //code for empty tree
  if (T==NULL){
     tree* R = alloc(tree);
     R->data = e;
     return R; 
  }

line 8에서 T=R일 경우 새로 만들어진 node를 return할 수 없게 된다.

만약 똑같은 entry가 이미 존재한다면 overwrite한다.

tree* bst_insert(tree* T, entry e)
//@requires is_bst(T) && e!=NULL;
//@ensures is_bst(\result) && \result !=NULL;
//@ensures bst_lookup(\result, entry_key(e))==e;
{
   //code for empty tree
   if (T==NULL){
      tree* R = alloc(tree);
      R->data = e;
      return R;
   }
   
   //code for non-empty tree
   int cmp = key_compare(entry_key(e), entry_key(T->data));
   if (cmp==0) T->data = e;
   else if (cmp<0) T->left = bst_insert(T->left, e);
   else{
   	 //@assert cmp >0;
     T->right = bst_insert(T->right, e);
   }
   return T;
}

left subtree나 right subtree에 insert하고자 할때 우리는 recursive call의 결과를 다시 reattach시킨다.

또한 T가 NULL일 때 새로운 node를 만들어내는 경우를 helper function으로 구현할 수 있다.

tree* leaf(entry e)
//@requires e!=NULL;
//@ensures is_bst(\result) && \result !=NULL;
{
    tree* R = alloc(tree);
    R->data = e;
    R->left = NULL;
    R->right = NULL;
    return R;
}

tree* bst_insert(tree* T, entry e)
//@requires is_bst(T) && e!=NULL;
//@ensures is_bst(\result) && \result !=NULL;
//@ensures bst_lookup(\result, entry_key(e))==e;
{
   //code for empty tree
   if (T==NULL) return leaf(e);
   
   //code for non-empty tree
   int cmp = key_compare(entry_key(e), entry_key(T->data));
   if (cmp==0) T->data = e;
   else if (cmp<0) T->left = bst_insert(T->left, e);
   else{
   	 //@assert cmp >0;
     T->right = bst_insert(T->right, e);
   }
   return T;
}

BST dictionaries

library interface의 lookup 과 insert를 bst_lookup과 bst_insert와 비교해보면 서로 match 하지 않는다.

entry bst_lookup(tree* T, key k);	entry dict_lookup(dict_t D, key k);
tree* bst_insert(tree* T, entry e);	void dict_insert(dict_t D, entry e);

-bst_insert는 tree*를 return하지만 dict_insert 는 아무것도 return하지 않는다.

이를 해결하기 위해 우리는 header를 define할 수 있다.

struct dict_header{
  tree* root;
  int size;
}
typedef struct dict_header dict;

이를 그림으로 나타내면 아래와 같을 것이다.

specification function도 이에 따라 아래와 같이 쓸 수 있다.

bool is_dict(dict* D){
   return D!=NULL //dictionary 자체는 NULL이 될 수 없다.
      && is_bst(D->root);
}

bst dictionary lookup , insertion, bst_new, 그리고 find_min도 작성할 수 있다.

entry dict_lookup(dict* D, key k)
//@requires is_dict(D);
//@ensrues \result ==NULL || key_compare(entry_key(\result), k)==0;
{
   return bst_lookup(D->root, k);
}

void dict_insert(dict* D, entry e)
//@requires is_dict(D) && e!=NULL;
//@ensures dict_lookup(D, entry_key(e))==e;
//@ensures is_dict(D);
{
  D->root = bst_insert(D->root, e);
}

dict* dict_new()
//@ensures is_dict(\result);
{
  dict* D = alloc(dict);
  D->root = NULL;
  return D;
}

entry find_min(dict* D)
//@requires is_dict(D);
{
  if (D->root==NULL) return NULL;
  tree* T = D->root;
  while(T->left !=NULL)
    T = T->left;
  return T->data;
}

bst를 compile할때 우리는

#cc0 -d client definitions file library file application file

순서로 compile이 가능하다

이를 통해 우리는 bst dictionary의 lookup, insert 그리고 find_min 은 모두 O(log n)의 cost를 가짐을 확인할 수 있다.

'👩‍💻Developer > Data Structure and Algorithms' 카테고리의 다른 글

AVL tree란 무엇인가? (0)	2022.11.14
Function Pointer란 무엇인가? (0)	2022.11.13
Hash Dictionary란 무엇인가? (0)	2022.11.13
Generic Pointers란 무엇인가? (0)	2022.11.13
Amortized Analysis를 통해 Unbounded Arrays를 알아보자 (0)	2022.11.04

블루누들의 생각냉장고

Binary Search Trees란 무엇인가?

'👩‍💻Developer > Data Structure and Algorithms' 카테고리의 다른 글

댓글

티스토리툴바

Binary Search Trees란 무엇인가?

'👩‍💻Developer > Data Structure and Algorithms' 카테고리의 다른 글

관련글

댓글

티스토리툴바