.\" README.EXT - -*- Text -*- created at: Mon Aug 7 16:45:54 JST 1995 This document explains how to make extention modules for Ruby. 1.Basic knowledge In C, variables have types and data do not have types. In contrast, Ruby variables do not have static type and data themselves have types. So, data need to be converted across the languages. Data in Ruby represented C type `VALUE'. Each VALUE data have its data-type. To retrieve an C data from the VALUE, you need to: (1) Identify VALUE's data type (2) Convert VALUE into C data Converting to wrong data type may cause serious promblems. 1.1 Data-types Ruby interpreter has data-types as below: T_NIL nil T_OBJECT ordinaly object T_CLASS class T_MODULE module T_FLOAT floating point number T_STRING string T_REGEXP regular expression T_ARRAY array T_FIXNUM Fixnum(31bit integer) T_HASH assosiative array T_STRUCT (Ruby) structure T_BIGNUM multi precision integer T_TRUE true T_FALSE false T_DATA data Otherwise, there are several other types used internally: T_ICLASS T_MATCH T_VARMAP T_SCOPE T_NODE Most of the types are represented by C structures. 1.2 Check Data Type of the VALUE The macro TYPE() defined in ruby.h shows data-type of the VALUE. TYPE() returns the constant number T_XXXX described above. To handle data-types, the code will be like: switch (TYPE(obj)) { case T_FIXNUM: /* process Fixnum */ break; case T_STRING: /* process String */ break; case T_ARRAY: /* process Array */ break; default: /* raise exception */ Fail("not valid value"); break; } There is the data-type check function. void Check_Type(VALUE value, int type) It raises an exception, if the VALUE does not have the type specified. There are faster check-macros for fixnums and nil. FIXNUM_P(obj) NIL_P(obj) 1.3 Convert VALUE into C data The data for type T_NIL, T_FALSE, T_TRUE are nil, true, false respectively. They are singletons for the data type. The T_FIXNUM data is the 31bit length fixed integer (63bit length on some machines), which can be conver to the C integer by using FIX2INT() macro. There also be NUM2INT() which converts any Ruby numbers into C integer. The NUM2INT() macro includes type check, so the exception will be raised if conversion failed. Other data types have corresponding C structures, e.g. struct RArray for T_ARRAY etc. VALUE of the type which has corresponding structure can be cast to retrieve the pointer to the struct. The casting macro RXXXX for each data type like RARRAY(obj). see "ruby.h". For example, `RSTRING(size)->len' is the way to get the size of the Ruby String object. The allocated region can be accessed by `RSTRING(str)->ptr'. For arrays, `RARRAY(ary)->len' and `RARRAY(ary)->ptr' respectively. Notice: Do not change the value of the structure directly, unless you are responsible about the result. It will be the cause of interesting bugs. 1.4 Convert C data into VALUE To convert C data to the values of Ruby: * FIXNUM left shift 1 bit, and turn on LSB. * Other pointer values cast to VALUE. You can determine whether VALUE is pointer or not, by checking LSB. Notice Ruby does not allow arbitrary pointer value to be VALUE. They should be pointers to the structures which Ruby knows. The known structures are defined in . To convert C numbers to Ruby value, use these macros. INT2FIX() for intergers within 31bits. INT2NUM() for arbitrary sized integer. INT2NUM() converts integers into Bignums, if it is out of FIXNUM range, but bit slower. 1.5 Manipulate Ruby data As I already told, it is not recommended to modify object's internal structure. To manipulate objects, use functions supplied by Ruby interpreter. Useful functions are listed below (not all): String funtions rb_str_new(char *ptr, int len) Creates a new Ruby string. rb_str_new2(char *ptr) Creates a new Ruby string from C string. This is equivalent to rb_str_new(ptr, strlen(ptr)). rb_str_cat(VALUE str, char *ptr, int len) Appends len bytes data from ptr to the Ruby string. Array functions rb_ary_new() Creates an array with no element. rb_ary_new2(int len) Creates an array with no element, with allocating internal buffer for len elements. rb_ary_new3(int n, ...) Creates an n-elements array from arguments. rb_ary_new4(int n, VALUE *elts) Creates an n-elements array from C array. rb_ary_push(VALUE ary, VALUE val) rb_ary_pop(VALUE ary) rb_ary_shift(VALUE ary) rb_ary_unshift(VALUE ary, VALUE val) rb_ary_entry(VALUE ary, int idx) Array operations. The first argument to each functions must be an array. They may dump core if other types given. 2. Extend Ruby with C 原理的にRubyで書けることはCでも書けます.RubyそのものがCで記 述されているんですから,当然といえば当然なんですけど.ここで はRubyの拡張に使うことが多いだろうと予測される機能を中心に紹 介します. 2.1 Add new features to Ruby You can add new features (classes, methods, etc.) to the Ruby interpreter. Ruby provides the API to define things below: * Classes, Modules * Methods, Singleton Methods * Constants 2.1.1 Class/module definition To define class or module, use functions below: VALUE rb_define_class(char *name, VALUE super) VALUE rb_define_module(char *name) These functions return the newly created class ot module. You may want to save this reference into the variable to use later. 2.1.2 Method/singleton method definition To define methods or singleton methods, use functions below: void rb_define_method(VALUE class, char *name, VALUE (*func)(), int argc) void rb_define_singleton_method(VALUE object, char *name, VALUE (*func)(), int argc) The `argc' represents the number of the arguments to the C function, which must be less than 17. But I believe you don't need that much. :-) If `argc' is negative, it specifies calling sequence, not number of the arguments. If argc is -1, the function will be called like: VALUE func(int argc, VALUE *argv, VALUE obj) where argc is the actual number of arguments, argv is the C array of the arguments, and obj is the receiver. if argc is -2, the arguments are passed in Ruby array. The function will be called like: VALUE func(VALUE obj, VALUE args) where obj is the receiver, and args is the Ruby array containing actual arguments. There're two more functions to define method. One is to define private method: void rb_define_private_method(VALUE class, char *name, VALUE (*func)(), int argc) The other is to define module function, which is private AND singleton method of the module. For example, sqrt is the module function defined in Math module. It can be call in the form like: Math.sqrt(4) or include Math sqrt(4) To define module function void rb_define_module_function(VALUE module, char *name, VALUE (*func)(), int argc) Oh, in addition, function-like method, which is private method defined in Kernel module, can be defined using: void rb_define_global_function(char *name, VALUE (*func)(), int argc) 2.1.3 Constant definition We have 2 functions to define constants: void rb_define_const(VALUE class, char *name, VALUE val) void rb_define_global_const(char *name, VALUE val) The former is to define constant under specified class/module. The latter is to define global constant. 2.2 Use Ruby features from C There are several ways to invoke Ruby's features from C code. 2.2.1 Evaluate Ruby Program in String Easiest way to call Ruby's function from C program is to evaluate the string as Ruby program. This function will do the job. VALUE rb_eval_string(char *str) Evaluation is done under current context, thus current local variables of the innermost method (which is defined by Ruby) can be accessed. 2.2.2 ID or Symbol You can invoke methods directly, without parsing the string. First I need to explain about symbols (which data type is ID). ID is the integer number to represent Ruby's identifiers such as variable names. It can be accessed from Ruby in the form like: :Identifier You can get the symbol value from string within C code, by using rb_intern(char *name) In addition, the symbols for one character operators (e.g +) is the code for that character. 2.2.3 Invoke Ruby method from C To invoke methods directly, you can use the function below VALUE rb_funcall(VALUE recv, ID mid, int argc, ...) This function invokes the method of the recv, which name is specified by the symbol mid. 2.2.4 Accessing the variables and constants You can access class variables, and instance variables using access functions. Also, global variables can be shared between both worlds. There's no way to access Ruby's local variables. The functions to access/modify instance variables are below: VALUE rb_ivar_get(VALUE obj, ID id) VALUE rb_ivar_set(VALUE obj, ID id, VALUE val) id must be the symbol, which can be retrieved by rb_intern(). To access the constants of the class/module: VALUE rb_const_get(VALUE obj, ID id) See 2.1.3 for defining new constant. 3. Informatin sharing between Ruby and C 3.1 Ruby constant that C can be accessed from C Following Ruby constants can be referred from C. Qtrue Qfalse Boolean values. Qfalse is false in the C also (i.e. 0). Qnil Ruby nil in C scope. 3.2 Global variables shared between C and Ruby Information can be shared between two worlds, using shared global variables. To define them, you can use functions listed below: void rb_define_variable(char *name, VALUE *var) This function defines the variable which is shared by the both world. The value of the global variable pointerd by `var', can be accessed through Ruby's global variable named `name'. You can define read-only (from Ruby, of course) variable by the function below. void rb_define_readonly_variable(char *name, VALUE *var) You can defined hooked variables. The accessor functions (getter and setter) are called on access to the hooked variables. void rb_define_hooked_variable(char *name, VALUE *var, VALUE (*getter)(), VALUE (*setter)()) If you need to supply either setter or getter, just supply 0 for the hook you don't need. If both hooks are 0, rb_define_hooked_variable() works just like rb_define_variable(). void rb_define_virtual_variable(char *name, VALUE (*getter)(), VALUE (*setter)()) This function defines the Ruby global variable without corresponding C variable. The value of the variable will be set/get only by hooks. The prototypes of the getter and setter functions are as following: (*getter)(ID id, void *data, struct global_entry* entry); (*setter)(VALUE val, ID id, void *data, struct global_entry* entry); 3.3 Encapsulate C data into Ruby object To wrapping and objectify the C pointer as Ruby object (so called DATA), use Data_Wrap_Struct(). Data_Wrap_Struct(class,mark,free,ptr) Data_Wrap_Struct() returns a created DATA object. The class argument is the class for the DATA object. The mark argument is the function to mark Ruby objects pointed by this data. The free argument is the function to free the pointer allocation. The functions, mark and free, will be called from garbage collector. You can allocate and wrap the structure in one step. Data_Make_Struct(class, type, mark, free, sval) This macro returns an allocated Data object, wrapping the pointer to the structure, which is also allocated. This macro works like: (sval = ALLOC(type), Data_Wrap_Struct(class, mark, free, sval)) Arguments, class, mark, free, works like thier counterpart of Data_Wrap_Struct(). The pointer to allocated structure will be assigned to sval, which should be the pointer to the type specified. To retrieve the C pointer from the Data object, use the macro Data_Get_Struct(). Data_Get_Struct(obj, type, sval) The pointer to the structure will be assigned to the variable sval. See example below for detail. 4.Example - Create dbm module OK, here's the example to make extension library. This is the extension to access dbm. The full source is included in ext/ directory in the Ruby's source tree. (1) make the directory % mkdir ext/dbm Make a directory for the extension library under ext directory. (2) create MANIFEST file % cd ext/dbm % touch MANIFEST There should be MANIFEST file in the directory for the extension library. Make empty file now. (3) design the library You need to design the library features, before making it. (4) write C code. 拡張ライブラリ本体となるC言語のソースを書きます.C言語のソー スがひとつの時には「モジュール名.c」を選ぶと良いでしょう.C 言語のソースが複数の場合には逆に「モジュール名.c」というファ イル名は避ける必要があります.オブジェクトファイルとモジュー ル生成時に中間的に生成される「モジュール名.o」というファイル とが衝突するからです. Rubyは拡張ライブラリをロードする時に「Init_モジュール名」と いう関数を自動的に実行します.dbmモジュールの場合「Init_dbm」 です.この関数の中でクラス,モジュール,メソッド,定数などの 定義を行います.dbm.cから一部引用します. -- Init_dbm() { /* define DBM class */ cDBM = rb_define_class("DBM", rb_cObject); /* DBM includes Enumerate module */ rb_include_module(cDBM, rb_mEnumerable); /* DBM has class method open(): arguments are received as C array */ rb_define_singleton_method(cDBM, "open", fdbm_s_open, -1); /* DBM instance method close(): no args */ rb_define_method(cDBM, "close", fdbm_close, 0); /* DBM instance method []: 1 argument */ rb_define_method(cDBM, "[]", fdbm_fetch, 1); : } -- DBMモジュールはdbmのデータと対応するオブジェクトになるはずで すから,Cの世界のdbmをRubyの世界に取り込む必要があります. dbm.cではData_Make_Structを以下のように使っています. -- struct dbmdata { int di_size; DBM *di_dbm; }; obj = Data_Make_Struct(class,struct dbmdata,0,free_dbm,dbmp); -- ここではdbmstruct構造体へのポインタをDataにカプセル化してい ます.DBM*を直接カプセル化しないのはclose()した時の処理を考 えてのことです. Dataオブジェクトからdbmstruct構造体のポインタを取り出すため に以下のマクロを使っています. -- #define GetDBM(obj, dbmp) {\ Data_Get_Struct(obj, struct dbmdata, dbmp);\ if (dbmp->di_dbm == 0) closed_dbm();\ } -- ちょっと複雑なマクロですが,要するにdbmdata構造体のポインタ の取り出しと,closeされているかどうかのチェックをまとめてい るだけです. DBMクラスにはたくさんメソッドがありますが,分類すると3種類の 引数の受け方があります.ひとつは引数の数が固定のもので,例と してはdeleteメソッドがあります.deleteメソッドを実装している fdbm_delete()はこのようになっています. -- static VALUE fdbm_delete(obj, keystr) VALUE obj, keystr; { : } -- 引数の数が固定のタイプは第1引数がself,第2引数以降がメソッド の引数となります. 引数の数が不定のものはCの配列で受けるものとRubyの配列で受け るものとがあります.dbmモジュールの中で,Cの配列で受けるもの はDBMのクラスメソッドであるopen()です.これを実装している関 数fdbm_s_open()はこうなっています. -- static VALUE fdbm_s_open(argc, argv, class) int argc; VALUE *argv; VALUE class; { : if (rb_scan_args(argc, argv, "11", &file, &vmode) == 1) { mode = 0666; /* default value */ } : } -- このタイプの関数は第1引数が与えられた引数の数,第2引数が与え られた引数の入っている配列になります.selfは第3引数として与 えられます. この配列で与えられた引数を解析するための関数がopen()でも使わ れているrb_scan_args()です.第3引数に指定したフォーマットに 従い,第4変数以降に指定した変数に値を代入してくれます.この フォーマットは,第1文字目が省略できない引数の数,第2文字目が 省略できる引数の数,第3文字目が対応する相手が無いあまりの引 数があるかどうかを示す"*"です.2文字目と3文字目は省略できま す.dbm.cの例では,フォーマットは"11"ですから,引数は最低1つ で,2つまで許されるという意味になります.省略されている時の 変数の値はnil(C言語のレベルではQnil)になります. Rubyの配列で引数を受け取るものはindexesがあります.実装はこ うです. -- static VALUE fdbm_indexes(obj, args) VALUE obj, args; { : } -- The first argument is the receiver, the second one is the Ruby array which contains the arguments to the method. ** Notice GC should know about global variables which refers Ruby's objects, but not exported to the Ruby world. You need to protect them by void rb_global_variable(VALUE *var) (5) prepare extconf.rb If there exists the file named extconf.rb, it will be executed to generate Makefile. If not, compilation scheme try to generate Makefile anyway. extconf.rbはモジュールのコンパイルに必要な条件のチェックなど を行うことが目的です.extconf.rbの中では以下のRuby関数を使う ことが出来ます. have_library(lib, func): ライブラリの存在チェック have_func(func): 関数の存在チェック have_header(header): ヘッダファイルの存在チェック create_makefile(target): Makefileの生成 以下の変数を使うことができます. $CFLAGS: コンパイル時に追加的に指定するフラグ(-Iなど) $LDFLAGS: リンク時に追加的に指定するフラグ(-Lなど) モジュールをコンパイルする条件が揃わなず,そのモジュールはコ ンパイルしない時にはcreate_makefileを呼ばなければMakefileは 生成されず,コンパイルも行われません. (6) prepare depend (optional) If the file named depend exists, Makefile will include that file to check dependency. You can make this file by invoking % gcc -MM *.c > depend It's no harm. Prepare it. (7) MANIFESTファイルにファイル名を入れる % find * -type f -print > MANIFEST % vi MANIFEST Append file names into MANIFEST. The compilation scheme requires MANIFEST only to be exist. But, you'd better take this step to distinguish required files. (8) make Rubyのディレクトリでmakeを実行するとMakefileを生成からmake, 必要によってはそのモジュールのRubyへのリンクまで自動的に実行 してくれます.extconf.rbを書き換えるなどしてMakefileの再生成 が必要な時はまたRubyディレクトリでmakeしてください. (9) debug You may need to rb_debug the module. The modules can be linked statically by adding directory name in the ext/Setup file, so that you can inspect the module by the debugger. (10) done, now you have the extension library You can do anything you want with your library. The author of Ruby will not claim any restriction about your code depending Ruby API. Feel free to use, modify, distribute or sell your program. Appendix A. Rubyのソースコードの分類 Rubyのソースはいくつかに分類することが出来ます.このうちクラ スライブラリの部分は基本的に拡張ライブラリと同じ作り方になっ ています.これらのソースは今までの説明でほとんど理解できると 思います. ruby language core class.c error.c eval.c gc.c object.c parse.y variable.c utility functions dln.c fnmatch.c glob.c regex.c st.c util.c ruby interpreter implementation dmyext.c inits.c main.c ruby.c version.c class library array.c bignum.c compar.c dir.c enum.c file.c hash.c io.c math.c numeric.c pack.c process.c random.c range.c re.c signal.c sprintf.c string.c struct.c time.c Appendix B. Ruby extension API reference C言語からRubyの機能を利用するAPIは以下の通りである. ** 型 VALUE Rubyオブジェクトを表現する型.必要に応じてキャストして用いる. 組み込み型を表現するCの型はruby.hに記述してあるRで始まる構造 体である.VALUE型をこれらにキャストするためにRで始まる構造体 名を全て大文字にした名前のマクロが用意されている. ** Variables and constants Qnil const: nil object Qtrue const: true object(default true value) Qfalse const: false object ** C pointer wrapping Data_Wrap_Struct(VALUE class, void (*mark)(), void (*free)(), void *sval) Cの任意のポインタをカプセル化したRubyオブジェクトを返す.こ のポインタがRubyからアクセスされなくなった時,freeで指定した 関数が呼ばれる.また,このポインタの指すデータが他のRubyオブ ジェクトを指している場合,markに指定する関数でマークする必要 がある. Data_Make_Struct(class, type, mark, free, sval) This macro allocates memory using malloc(), assigns it to the variable sval, and returns the DATA encapsulating the pointer to memory region. Data_Get_Struct(data, type, sval) This macro retrieves the pointer value from DATA, and assigns it to the variable sval. ** defining class/module VALUE rb_define_class(char *name, VALUE super) Defines new Ruby class as subclass of super. VALUE rb_define_class_under(VALUE module, char *name, VALUE super) Creates new Ruby class as subclass of super, under the module's namespace. VALUE rb_define_module(char *name) Defines new Ruby module. VALUE rb_define_module_under(VALUE module, char *name, VALUE super) Defines new Ruby module, under the modules's namespace. void rb_include_module(VALUE class, VALUE module) Includes module into class. If class already includes it, just ignore. void rb_extend_object(VALUE object, VALUE module) Extend the object with module's attribute. ** Defining Global Variables void rb_define_variable(char *name, VALUE *var) Defines a global variable which is shared between C and Ruby. If name contains the character which is not allowed to be part of the symbol, it can't be seen from Ruby programs. void rb_define_readonly_variable(char *name, VALUE *var) Defines a read-only global variable. Works just like rb_define_variable(), except defined variable is read-only. void rb_define_virtual_variable(char *name, VALUE (*getter)(), VALUE (*setter)()) Defines a virtual variable, whose behavior is defined by pair of C functions. The getter function is called when the variable is referred. The setter function is called when the value is set to the variable. The prototype for getter/setter functions are: VALUE getter(ID id) void setter(VALUE val, ID id) The getter function must return the value for the access. void rb_define_hooked_variable(char *name, VALUE *var, VALUE (*getter)(), VALUE (*setter)()) Defines hooked variable. It's virtual variable with C variable. The getter is called as VALUE getter(ID id, VALUE *var) returning new value. The setter is called as void setter(VALUE val, ID id, VALUE *var) GC requires to mark the C global variables which hold Ruby values. void rb_global_variable(VALUE *var) Tells GC to protect these variables. ** Constant Definition void rb_define_const(VALUE klass, char *name, VALUE val) Defines a new constant under the class/module. void rb_define_global_const(char *name, VALUE val) Defines global contant. This is just work as rb_define_const(cKernal, name, val) ** Method Definition rb_define_method(VALUE class, char *name, VALUE (*func)(), int argc) Defines a method for the class. func is the function pointer. argc is the number of arguments. if argc is -1, the function will receive 3 arguments argc, argv, and self. if argc is -2, the function will receive 2 arguments, self and args, where args is the Ruby array of the method arguments. rb_define_private_method(VALUE class, char *name, VALUE (*func)(), int argc) Defines a private method for the class. Arguments are same as rb_define_method(). rb_define_singleton_method(VALUE class, char *name, VALUE (*func)(), int argc) Defines a singleton method. Arguments are same as rb_define_method(). rb_scan_args(int atgc, VALUE *argv, char *fmt, ...) argc,argv形式で与えられた引数を分解する.fmtは必須引数の数, 付加引数の数, 残りの引数があるかを指定する文字列で, "数字数 字*"という形式である. 2 番目の数字と"*"はそれぞれ省略可能で ある.必須引数が一つもない場合は0を指定する.第3引数以降は変 数へのポインタで, 該当する要素がその変数に格納される.付加引 数に対応する引数が与えられていない場合は変数にQnilが代入され る. ** Rubyメソッド呼び出し VALUE rb_funcall(VALUE recv, ID mid, int narg, ...) Invokes the method. To retrieve mid from method name, use rb_intern(). VALUE rb_funcall2(VALUE recv, ID mid, int argc, VALUE *argv) Invokes method, passing arguments by array of values. VALUE rb_eval_string(char *str) Compiles and executes the string as Ruby program. ID rb_intern(char *name) Returns ID corresponding the name. char *rb_id2name(ID id) Returns the name corresponding ID. char *rb_class2name(VALUE class) Returns the name of the class. ** Instance Variables VALUE rb_iv_get(VALUE obj, char *name) Retrieve the value of the instance variable. If the name is not prefixed by `@', that variable shall be inaccessible from Ruby. VALUE rb_iv_set(VALUE obj, char *name, VALUE val) Sets the value of the instance variable. ** Control Structure VALUE rb_iterate(VALUE (*func1)(), void *arg1, VALUE (*func2)(), void *arg2) Calls the function func1, supplying func2 as the block. func1 will be called with the argument arg1. func2 receives the value from yield as the first argument, arg2 as the second argument. VALUE rb_yield(VALUE val) Evaluates the block with value val. VALUE rb_rescue(VALUE (*func1)(), void *arg1, VALUE (*func2)(), void *arg2) Calls the function func1, with arg1 as the argument. If exception occurs during func1, it calls func2 with arg2 as the argument. The return value of rb_rescue() is the return value from func1 if no exception occurs, from func2 otherwise. VALUE rb_ensure(VALUE (*func1)(), void *arg1, void (*func2)(), void *arg2) Calls the function func1 with arg1 as the argument, then calls func2 with arg2, whenever execution terminated. The return value from rb_ensure() is that of func1. ** Exceptions and Errors void rb_warn(char *fmt, ...) Prints warning message according to the printf-like format. void rb_warning(char *fmt, ...) Prints warning message according to the printf-like format, if $VERBOSE is true. void rb_raise(VALUE exception, char *fmt, ...) Raises an exception of class exception. The fmt is the format string just like printf(). void rb_fatal(char *fmt, ...) Raises fatal error, terminates the interpreter. No exception handling will be done for fatal error, but ensure blocks will be executed. void rb_bug(char *fmt, ...) Termintates the interpreter immediately. This function should be called under the situation caused by the bug in the interpreter. No exception handling nor ensure execution will be done. ** Initialize and Starts the Interpreter The embedding API are below (not needed for extension libraries): void ruby_init(int argc, char **argv, char **envp) Initializes the interpreter. void ruby_run() Starts execution of the interpreter. void ruby_script(char *name) Specifies the name of the script ($0). Appendix B. Functions Available in extconf.rb These functions are available in extconf.rb: have_library(lib, func) Checks whether library which contains specified function exists. Returns true if the library exists. have_func(func) Checks whether func exists. Returns true if the function exists. To check functions in the additional library, you need to check that library first using have_library(). have_header(header) Checks for the header files. Returns true if the header file exists. create_makefile(target) Generates the Makefile for the extension library. If you don't invoke this method, the compilation will not be done. /* * Local variables: * fill-column: 70 * end: */