[C]Bytecode In C

* 해당 글은 학과 수업으로 배운 내용과 코드가 포함되어 있으며 개인적 공부 목적으로 업로드하였습니다.

Compilation

C program을 execute할 때 우리는 우선 compilation이 필요하다

이때 우리는 compiler를 활용해서 source program을 machine code로 번역할 수 있다.

이는 processor가 이해하고 execute바로 할 수 있는 program이다.

Interpretation

Interpreter는 source program의 각 라인을 읽고 이를 hardware에 simulate하는 것이다.

이 또한 machine code에서 program이다.

비유를 하자면 compilation은 text전체를 번역하는 것이지만 interpretation은 실시간으로 각 문장별로 번역하는 것과 같다.

compilation에서는 executable을 통해 프로그램을 실행할 수 있고 코드가 굉장히 빨리 작동한다.

프로그램 크기가 클 수록 시간이 오래걸리고 새로운 하드웨어에서 실행할 경우 새로운 compiler가 필요한다.

interpretation하기 위해서는 executable source code뿐만 아니라 interpreter도 있어야 한다.

이 역시 새로운 하드웨어에서 실행할 경우 새로운 interpreter가 필요하고 각 source instruction별로 simulate된다.

High-level source program에서 lower level 로 compile하기 위해서는 intermediate representation이 필요하다. 이는 곧

two-staege execution이라고 한다.

이 intermediate representation 언어가 source langugae보다 훨씬 간단할 수록 이득을 본다.

이러한 intermediate language 를 one byte로 bytecode라고 나타낸다.

이전에 우리는 cc0을 활용해서 c0 program을 C로 바꿔주고 gcc로 compile할 수 있었다.

-이는 쉽고 바르고 cc0를 portable하게 만들어주기 때문이다

coin을 활용한다면 c0 program을 bytecode data structure로 compile할 수 있다.

우리는 아래와 같은 command로 c0 program을 c0vm bytecode로 compile할 수 있다.

#cc0 -b foo.c0

bytecode file을 c0 virtual machine 로 아래와 같이 execute할 수 있다.

#c0vm foo.bc0

c0 program compilation

아래의 프로그램을 bytecode로 compile해보자

이는 text file이 될것이다.

int main(){
  return (3+4) * 5/2;
}

C0 C0 FF EE  #magic number
00 17  #version 11, arch = 1 (64 bits)

00 00 # int pool count
# int pool

00 00 #string pool total size
#string pool

00 01 #function count
#function_pool

#<main>
00     #number of arguments=0
00     #number of local variables=0
00 0C #code length=12 bytes
10 03 #bipush 3 #3
10 04 #bipush 4 #4
60 #iadd #(3+4)
10 05 #bipush 5 #5
68 #imul #((3+4)*5)
10 02 #bipush 2 #2
6C #idiv #(((3+4)*5)/2)
B0 #return #

00 00 # native count
#native pool

hexadecimal bytes 가 왼쪽에 나와있고 이를 한줄로 합친것이 실제 bytecode sequence가 될것이다.

위 bytecode는 다섯개의 segment로 이루어져 있다.

메인에서 arithmetic operation을 살펴보면 아래와 같은데

(3+4) * 5/2

이를 우리는 bytecode에서 postfilx notation으로 나타낼 수 있다.

3 4 + 5 * 2 /

또한 #뒤에는 #bipush 3처럼 instruction이 나타나져 있다.

위 postfilx notation에 나오는 모든 item 3, 4...는 instruction으로 나타낼 수 있다.

각 1byte instruction을 우리는 opcode라고 부른다.

ex) bipush의 opcode는 0x10이다.

그러나 몇 instruction은 operand를 동반한다.

stack을 활용해 여태까지의 instruction을 아래와 같이 나타낼 수 있다.

0x10	bipush <b>	S -> S, x:w32
0x60	iadd	S, x:32, y:w32 -> S, x+y:w32
0x68	imul	S, x:32, y:w32 -> S, x*y:w32
0x6C	idiv	S, x:32, y:w32 -> S, x/y:w32
0xB0	return	., v -> .

*return의 경우 stack에 single value가 있다고 생각하고 이를 caller에게 돌려주는 것을 기대한다.

Bytecode에서 현재의 instruction이 execute되고 난 후 PC는 다음 instruction에 point되기 위해 PC(program counter)를 update한다.

얼마나 업데이트해야할지는 현재 instruction에서 operand 개수에 따라 결정된다.

다음으로 아래의 c0 program을 bytecode로 compile해보자

int mid(int lo, int hi){
   int mid = lo+(hi-lo)/2;
   return mid;
}

int main(){
  return mid(3,6);
}

C0 C0 FF EE       # magic number
00 17             # version 11, arch = 1 (64 bits)

00 00             # int pool count
# int pool

00 00             # string pool total size
# string pool

00 02             # function count
# function_pool

#<main>
00                # number of arguments = 0
00                # number of local variables = 0
00 08             # code length = 8 bytes
10 03    # bipush 3           # 3
10 06    # bipush 6           # 6
B8 00 01 # invokestatic 1     # mid(3, 6)
B0       # return             # 


#<mid>
02                # number of arguments = 2
03                # number of local variables = 3
00 10             # code length = 16 bytes
15 00    # vload 0            # lo
15 01    # vload 1            # hi
15 00    # vload 0            # lo
64       # isub               # (hi - lo)
10 02    # bipush 2           # 2
6C       # idiv               # ((hi - lo) / 2)
60       # iadd               # (lo + ((hi - lo) / 2))
36 02    # vstore 2           # mid = (lo + ((hi - lo) / 2));
15 02    # vload 2            # mid
B0       # return             # 

00 00             # native count
# native pool

각 함수들은 4 bytes header로 시작한다.

1 byte = # of function arguments

1 byte = # of local variables

2 bytes = # of bytes in the bytecode of the function

Local variables

local variables 는 새로운 run time data structure인 local variable array V에서 저장된다.

이 V에서 실행할 수 있는 두가지의 bytecode instruction에는 두개가 있다.

00x15	vload <i>	S -> S, v. (v=V[i])
0x36	vstore <i>	S, v ->S, (V[i] =v)

vload i는 V의 ith value를 operand stack에 push한다.

vstore i는 operand stackd에서 pop한 뒤 V의 ith value에 넣는다.

위의 bytecode에서 살펴보면

1) V[0]의 lo를 push한다.

2) hi, lo를 push하고 (hi-lo)/2를 계산한다.

3) 이를 더한다 => lo + (hi-lo)/2

4) mid에 저장한다 -> V[2]에 넣는다.

5) mid를 stack에 load한다.

6)caller에 return 한다.

이때 vstore 2 는 stack에서 mid 를 pop하고

vload2 는 stack에 mid를 다시 밀어넣는다.

다음으로 아래를 compile해보자

int next_rand(int last){
  return last*1664525 + 1013904223;
}

int main(){
  return next_rand(0xdeadbeef);
}

C0 C0 FF EE       # magic number
00 17             # version 11, arch = 1 (64 bits)

00 03             # int pool count
# int pool
00 19 66 0D
3C 6E F3 5F
DE AD BE EF

00 00             # string pool total size
# string pool

00 02             # function count
# function_pool

#<main>
00                # number of arguments = 0
00                # number of local variables = 0
00 07             # code length = 7 bytes
13 00 02 # ildc 2             # c[2] = -559038737
B8 00 01 # invokestatic 1     # next_rand(-559038737)
B0       # return             # 


#<next_rand>
01                # number of arguments = 1
01                # number of local variables = 1
00 0B             # code length = 11 bytes
15 00    # vload 0            # last
13 00 00 # ildc 0             # c[0] = 1664525
68       # imul               # (last * 1664525)
13 00 01 # ildc 1             # c[1] = 1013904223
60       # iadd               # ((last * 1664525) + 1013904223)
B0       # return             # 

00 00             # native count
# native pool

bipush의 경우 [-128, 127] 의constants만 handle할 수 있기 때문에

integer poll에서 더 큰 constants를 사용해야한다.

Integer Pool

integer pool은 c0vm bytecode의 두번째 segment로

여기에 사용할 수 있는 instruction은 ildc i이다.

00x13

ildc <c1, c2>

S -> S, x:w32. (x=int_pool[c1<<8|c2]

ildc i는 stack에 integer pool의 i번째 integer를 push한다.

Function Pool

함수들은 c0vm bytecode의 4번째 segment에 있다.

이것은 함수의 개수와 함수들 자체를 담고 있다.

-늘 main이 첫번째 함수로 담긴다.

우리는 이 function pool에서 invokestatic 를 통해 i번째 함수를 부를 수 있다.

- 부르고자 하는 함수 g는 stack에 있어야 하며

-g를 return할때 returned value가 stack에 push된다.

0xB8	invokestatic <c1, c2>	S, v1, v2....vn -> S, v. function_pool(c1<<8\|c2] => g, g(v1...vn) =v)
0xB0	return	., v->.

함수를 부르기 전에 우리는 함수 execution을 끝내고 나머지를 이어서 진행할 수 있어야 한다.

- caller stack을 저장하고

-caller local variable array를 저장하고

-program counter를 저장하고

-caller가 누구였는지 저장해야한다.

이를 위해 우리는 새로운 data structure인 call stack이 필요하다.

-call stack은 불려지는 각 함수를 위해 frame을 포함해야 한다.

c0vm의 5가지 segment를 정리해보면

1) header

-4 byte magic number를 가지고 있으며

-bytecode의 버전과 target architecture를 가지고 있다.

2) integer pool

3) string pool

4) function pool

5) native pool

- function pool과 동일하지만 library function이 담겨있다.

이러한 local organization을 반영하는 data structure를 아래와 같이 쓸수있다.

struct bc0_file{
  /*header*/
  uint32_t magic;
  uint116_t version;
  
  /*integer constant pool*/
  uint16_t int_count;
  int32_t* int_pool;  //\length(int_pool) == int_count
  
  /*string literal pool*/
  /*stores all strings consecutively with NUL terminators*/
  uint16_t string_count;
  char* string_pool;  //\length(string_pool) == string_count
  
  /*function pool*/
  uint16_t function_count;
  struc function_info * function_pool; //\length(function_pool) == function_count
  
  /*native function tables */
  uint16_t native_count;
  struct native_info* native_pool; //\length(native_pool) == native_count
 };

함수는

struct function_info {
  uint8_t num_args;
  uint8_t num_vars;
  uint16_t code_length;
  ubyte8 code;
};

native function의 경우

struct native_info{
  uint16_t num_args;
  uint16_t function_table_index;
};

위의 data structure를 살펴보면 우리는 항상 fixed된 size의 integer를 사용하고 있다.

다음 예시는 loop를 포함하고 있다.

int main(){
  int sum = 0;
  for (int i=1; i<100; i+=2)
    sum +=1;
  return sum;
}

C0 C0 FF EE       # magic number
00 17             # version 11, arch = 1 (64 bits)

00 00             # int pool count
# int pool

00 00             # string pool total size
# string pool

00 01             # function count
# function_pool

#<main>
00                # number of arguments = 0
02                # number of local variables = 2
00 26             # code length = 38 bytes
10 00    # bipush 0           # 0
36 00    # vstore 0           # sum = 0;
10 01    # bipush 1           # 1
36 01    # vstore 1           # i = 1;
# <00:loop>
15 01    # vload 1            # i
10 64    # bipush 100         # 100
A1 00 06 # if_icmplt +6       # if (i < 100) goto <01:body>
A7 00 14 # goto +20           # goto <02:exit>
# <01:body>
15 00    # vload 0            # sum
10 01    # bipush 1           # 1
60       # iadd               # 
36 00    # vstore 0           # sum += 1;
15 01    # vload 1            # i
10 02    # bipush 2           # 2
60       # iadd               # 
36 01    # vstore 1           # i += 2;
A7 FF E8 # goto -24           # goto <00:loop>
# <02:exit>
15 00    # vload 0            # sum
B0       # return             # 

00 00             # native count
# native pool

conditional branch instructions

if_cmpeq 와 같은 conditiona branch의 경우 stack의 가장 top value가 condition을 만족할 경우 bytecode의 특정 point로 이동하게 된다.

unconditional branch instruction

condition과 상관없이 항상 그 특정 point로 이동한다.

0xA1	if_cmplt<o1, o2>	S, x:w32, y:w32 -> S. (pc=pc+(o1<<8\|o2) if x<y)
0xA7	goto <o1. o2>	S ->S (pc=pc+(o1 <<8\|o2)

<o1, o2> 는 16 bit signed offset이다.

따라서 instruction에 따라 움직이지 않고 byte에 따라 움직인다.

다음 예시를 살펴보자

typedef struct list_node list;
struct list_node{
  char data;
  list* next;
};

list* prepend(list* l, char c){
   List* res = alloc(list);
   res->data =c;
   res->next=l;
   return res;
}

int main(){
  list* l = prepend(NULL, 'a');
  return 0;
}

C0 C0 FF EE       # magic number
00 17             # version 11, arch = 1 (64 bits)

00 00             # int pool count
# int pool

00 00             # string pool total size
# string pool

00 02             # function count
# function_pool

#<main>
00                # number of arguments = 0
01                # number of local variables = 1
00 0B             # code length = 11 bytes
01       # aconst_null        # NULL
10 61    # bipush 97          # 'a'
B8 00 01 # invokestatic 1     # prepend(NULL, 'a')
36 00    # vstore 0           # l = prepend(NULL, 'a');
10 00    # bipush 0           # 0
B0       # return             # 


#<prepend>
02                # number of arguments = 2
03                # number of local variables = 3
00 15             # code length = 21 bytes
BB 10    # new 16             # alloc(list)
36 02    # vstore 2           # res = alloc(list);
15 02    # vload 2            # res
62 00    # aaddf 0            # &res->data
15 01    # vload 1            # c
55       # cmstore            # res->data = c;
15 02    # vload 2            # res
62 08    # aaddf 8            # &res->next
15 00    # vload 0            # l
4F       # amstore            # res->next = l;
15 02    # vload 2            # res
B0       # return             # 

00 00             # native count
# native pool

characters는 그들의 ASCII value로 나타내진다.

operand stack에서는 32 bit integer로 나타내지며 boolean 역시 32 bit integer로 나타내진다.

Pointer

pointer의 경우 (unsigned) 8 bytes 로 나타내진다.

0x01

aconst_null

S -> S, null:*

이때 operand stack은 64 bit pointers 혹은 32 bit integers를 포함할 수 있다.

이를 통틀어 우리는 stack element 타입을 c0_value라고 나타낼 수 있다.

이에 따라 casting을 할 수 있는 함수들을 아래와 같이 나타낼 수 있다.

c0_value int2val9int32_t i);

int32_t val2int(c0_value v);

c0_value ptr2val(void* p);

void* val2ptr(c0_value v);

Memory Allocation

c0에서 alloc(tp)는 new s로 compile될 수 있다.( 이때 s는 byte )

새로운 memory address가 push되는 것이다.

0xBB

new <s>

S -> S, a*

Fields

field는 struct의 시작을 기준으로 compile되는 offset이다.

aaddf f는 address a를 stack에서 pop하고 f bytes 이후의 a address를 push한다.

0x62

aaddf <f>

S, a* ->S, (a+f)* (a!=NULL; f field offset in bytes)

Manipulating Heap Values

heap values는 stack에서 읽혀야 하고 heap에 써져야한다.

c0vm에서는 heap value의 사이즈가 무엇이냐에 따라 instruction이 달라진다.

0x2E	imload	S, a* -> S, x:w32
0x4E	imstore	S, a, x:w32 -> S (a =x, a!=NULL, store 4 bytes)

char/bool의 경우 (1 byte)

0x34	cmload	S, a* -> S, x:w32. (x = (w32)(*a), a!=NULL, load 1 byte)
0x55	cmstore	S, a, x:w32 -> S (a = x & 0x7f, a!=NULL, store 1 byte)

pointer의 경우 (8 bytes)

0x2F	amload	S, a* -> S, b* ( b = *a, a!=NULL, load address)
0x4F	amstore	S, a, b -> S (*a = b, a!=NULL, store address)

Arrays

다음으로 아래 예시를 compile해보자

int main(){
  int[] A = alloc_array(int, 100);
  for (int i = 0; i<100; i++)
    A[i] = i;
  return A[99];
}

C0 C0 FF EE       # magic number
00 17             # version 11, arch = 1 (64 bits)

00 00             # int pool count
# int pool

00 00             # string pool total size
# string pool

00 01             # function count
# function_pool

#<main>
00                # number of arguments = 0
02                # number of local variables = 2
00 2D             # code length = 45 bytes
10 64    # bipush 100         # 100
BC 04    # newarray 4         # alloc_array(int, 100)
36 00    # vstore 0           # A = alloc_array(int, 100);
10 00    # bipush 0           # 0
36 01    # vstore 1           # i = 0;
# <00:loop>
15 01    # vload 1            # i
10 64    # bipush 100         # 100
A1 00 06 # if_icmplt +6       # if (i < 100) goto <01:body>
A7 00 15 # goto +21           # goto <02:exit>
# <01:body>
15 00    # vload 0            # A
15 01    # vload 1            # i
63       # aadds              # &A[i]
15 01    # vload 1            # i
4E       # imstore            # A[i] = i;
15 01    # vload 1            # i
10 01    # bipush 1           # 1
60       # iadd               # 
36 01    # vstore 1           # i += 1;
A7 FF E7 # goto -25           # goto <00:loop>
# <02:exit>
15 00    # vload 0            # A
10 63    # bipush 99          # 99
63       # aadds              # &A[99]
2E       # imload             # A[99]
B0       # return             # 

00 00             # native count
# native pool

c0에서 alloc_array(tp, n)는 newarray s 로 compile할 수 있다.

(이때 s는 필요한 bytes이다)

0xBC

newarray<s>

S, n:w32 -> S, a*

array assessment는 aadds s로 가능하다

(이때 s 는 역시 각element가 필요한 byte이다)

그리고 해당 element의 index가 항상 stack가장 위에 존재한다.

0x63

aadds <s>

S, a*, i:w32 -> S, (elems(a)+s*i)* (a!=NULL, 0<=i<\length(a))

String

다음 예시를 살펴보자

#use <string>
#use <conio>

int main(){
  string h = "Hello";
  string hw = string_join(h, "World!\n");
  print(hw);
  return string_length(hw);
}

C0 C0 FF EE       # magic number
00 17             # version 11, arch = 1 (64 bits)

00 00             # int pool count
# int pool

00 0E             # string pool total size
# string pool
48 65 6C 6C 6F 00  # "Hello"
57 6F 72 6C 64 21 0A 00  # "World!\n"

00 01             # function count
# function_pool

#<main>
00                # number of arguments = 0
02                # number of local variables = 2
00 1B             # code length = 27 bytes
14 00 00 # aldc 0             # s[0] = "Hello"
36 00    # vstore 0           # h = "Hello";
15 00    # vload 0            # h
14 00 06 # aldc 6             # s[6] = "World!\n"
B7 00 00 # invokenative 0     # string_join(h, "World!\n")
36 01    # vstore 1           # hw = string_join(h, "World!\n");
15 01    # vload 1            # hw
B7 00 01 # invokenative 1     # print(hw)
57       # pop                # (ignore result)
15 01    # vload 1            # hw
B7 00 02 # invokenative 2     # string_length(hw)
B0       # return             # 

00 03             # native count
# native pool
00 02 00 64       # string_join
00 01 00 06       # print
00 01 00 65       # string_length

string literals의 경우 string pool에 저장되어있다.

native function을 통해 compute된 string은 heap에 저장된다.

이러한 string literals는 aldc를 통해 접근이 가능하다.

0x14

aldc <c1, c2>

S -> S, a* (a = &string_pool[c1<<8|c2]

이는 string의 1번째 character의 address를 stack에 push한다.

System library function의 경우 native pool에서 가져올 수 있다.

각 entry 에 함수가 있으며 각 entry는 4 bytes이다. (nn nn aa aa 으로 nn nn 은 arguments 개수, aa aa는 함수를 어디서 꺼내올지)

이들은 invokenative를 통해 불러올 수 있다.

0xB7

invokenative <c1, c2>

S, v1, v2...vn -> S, v

Contracts

마지막 예시를 살펴보자

int main(){
  int[] A = alloc_array(int, 7);
  //@assert(\length(A)==7);
  return 0;
}

C0 C0 FF EE       # magic number
00 17             # version 11, arch = 1 (64 bits)

00 00             # int pool count
# int pool

00 3C             # string pool total size
# string pool
28 70 72 6F 67 72 61 6D 20 73 74 61 72 74 29 00  # "(program start)"
65 78 32 2E 63 30 3A 20 33 2E 36 2D 33 2E 32 38 3A 20 40 61 73 73 65 72 74 20 61 6E 6E 6F 74 61 74 69 6F 6E 20 66 61 69 6C 65 64 00  # "ex2.c0: 3.6-3.28: @assert annotation failed"

00 01             # function count
# function_pool

#<main>
00                # number of arguments = 0
03                # number of local variables = 3
00 28             # code length = 40 bytes
14 00 00 # aldc 0             # s[0] = "(program start)"
36 00    # vstore 0           # _caller = "(program start)";
10 07    # bipush 7           # 7
BC 04    # newarray 4         # alloc_array(int, 7)
36 02    # vstore 2           # A = alloc_array(int, 7);
15 02    # vload 2            # A
BE       # arraylength        # \length(A)
10 07    # bipush 7           # 7
9F 00 06 # if_cmpeq +6        # if (\length(A) == 7) goto <00:cond_true>
A7 00 08 # goto +8            # goto <01:cond_false>
# <00:cond_true>
10 01    # bipush 1           # true
A7 00 05 # goto +5            # goto <02:cond_end>
# <01:cond_false>
10 00    # bipush 0           # false
# <02:cond_end>
14 00 10 # aldc 16            # s[16] = "ex2.c0: 3.6-3.28: @assert annotation failed"
CF       # assert             # assert (\length(A) == 7) [failure message on stack]
10 00    # bipush 0           # 0
36 01    # vstore 1           # _result = 0;
15 01    # vload 1            # _result
B0       # return             # 

00 00             # native count
# native pool

-d로 compile할때에 contracts는 conditional이 되며 모든 contracts에 동일하게 해당된다

bytecode에서는 assert에 의해 handle되며

stack은 boolean x와 string pointer a를 가지고 있다.

0xCF

assert

S, x:w32, a* -> S (c0_assertion_failure(a) if x==0)

이때 x는 contract expression 값에 의해 결정된다.

\length의 경우 c0vm에서 arraylength instruction에 의해 결정된다

0xBE

arraylength

S, a* -> S, n:w32 (n=\length(a))

'👩‍💻Developer > Language' 카테고리의 다른 글

[C] Search in Graph (0)	2022.12.16
[C] Graph In C (0)	2022.12.15
[C] C에서 숫자란? (0)	2022.12.14
[C/C++] C언어의 Memory model (1)	2022.11.29
[C] C언어에 대해 알아보자 (0)	2022.11.25

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

블루누들의 생각냉장고

[C]Bytecode In C

'👩‍💻Developer > Language' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

[C]Bytecode In C

'👩‍💻Developer > Language' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역