diff options
Diffstat (limited to 'doc/language/box.md')
| -rw-r--r-- | doc/language/box.md | 357 |
1 files changed, 357 insertions, 0 deletions
diff --git a/doc/language/box.md b/doc/language/box.md new file mode 100644 index 0000000000..92514b3ec9 --- /dev/null +++ b/doc/language/box.md @@ -0,0 +1,357 @@ +# Ruby Box - Ruby's in-process separation of Classes and Modules + +Ruby Box is designed to provide separated spaces in a Ruby process, to isolate application code, libraries and monkey patches. + +## Known issues + +* Experimental warning is shown when ruby starts with `RUBY_BOX=1` (specify `-W:no-experimental` option to hide it) +* Installing native extensions may fail under `RUBY_BOX=1` because of stack level too deep in extconf.rb +* `require 'active_support/core_ext'` may fail under `RUBY_BOX=1` +* Defined methods in a box may not be referred by built-in methods written in Ruby + +## TODOs + +* Add the loaded box on iseq to check if another box tries running the iseq (add a field only when VM_CHECK_MODE?) +* Assign its own TOPLEVEL_BINDING in boxes +* Fix calling `warn` in boxes to refer `$VERBOSE` and `Warning.warn` in the box +* Make an internal data container class `Ruby::Box::Entry` invisible +* More test cases about `$LOAD_PATH` and `$LOADED_FEATURES` + +## How to use + +### Enabling Ruby Box + +First, an environment variable should be set at the ruby process bootup: `RUBY_BOX=1`. +The only valid value is `1` to enable Ruby Box. Other values (or unset `RUBY_BOX`) means disabling Ruby Box. And setting the value after Ruby program starts doesn't work. + +### Using Ruby Box + +`Ruby::Box` class is the entrypoint of Ruby Box. + +```ruby +box = Ruby::Box.new +box.require('something') # or require_relative, load +``` + +The required file (either .rb or .so/.dll/.bundle) is loaded in the box (`box` here). The required/loaded files from `something` will be loaded in the box recursively. + +```ruby +# something.rb + +X = 1 + +class Something + def self.x = X + def x = ::X +end +``` + +Classes/modules, those methods and constants defined in the box can be accessed via `box` object. + +```ruby +X = 2 +p X # 2 +p ::X # 2 +p box::Something.x # 1 +p box::X # 1 +``` + +Instance methods defined in the box also run with definitions in the box. + +```ruby +s = box::Something.new + +p s.x # 1 +``` + +## Specifications + +### Ruby Box types + +There are three box types: + +* Master box +* Root box +* User boxes + +Ruby bootstrap runs in the root box, and a + +There is the root box, just a single box in a Ruby process. All builtin classes/modules are defined and run in the root box. (See "Builtin classes and modules".) + +User boxes are to run user-written programs and libraries loaded from user programs. The user's main program (specified by the `ruby` command line argument) is executed in the "main" box, which is a user box automatically created at the end of Ruby's bootstrap. The files specified with `-r` command line option will be required in the main box. + +Calling `Ruby::Box.new` creates an "optional" box (a user, non-main box), technically equal to the main box. + +Ruby also has the master box. The master box is the "master copy" of all boxes. Boxes will be created as a copy of the master box. The master box is only for the source of box copies, and no code runs in the master box. + + +``` +[master] + | + |----[root] + | + |----[main] + | + |----[user box 1] + | + |----[user box 2] + ... +``` + +### Ruby Box class and instances + +`Ruby::Box` is a class, as a subclass of `Module`. `Ruby::Box` instances are a kind of `Module`. + +### Classes and modules defined in boxes + +The classes and modules, newly defined in a box `box`, are accessible via `box`. For example, if a class `A` is defined in `box`, it is accessible as `box::A` from outside of the box. + +In the box `box`, `A` can be referred to as `A` (and `::A`). + +### Built-in classes and modules reopened in boxes + +In boxes, builtin classes/modules are visible and can be reopened. Those classes/modules can be reopened using `class` or `module` clauses, and class/module definitions can be changed. + +The changed definitions are visible only in the box. In other boxes, builtin classes/modules and those instances work without changed definitions. + +```ruby +# in foo.rb +class String + BLANK_PATTERN = /\A\s*\z/ + def blank? + self.match?(BLANK_PATTERN) + end +end + +module Foo + def self.foo = "foo" + + def self.foo_is_blank? + foo.blank? + end +end + +Foo.foo.blank? #=> false +"foo".blank? #=> false + +# in main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::Foo.foo_is_blank? #=> false (#blank? called in box) + +"foo".blank? # NoMethodError +String::BLANK_PATTERN # NameError +``` + +The main box and `box` above are different boxes, so monkey patches in main are also invisible in `box`. + +### Builtin classes and modules + +In the box context, "builtin" classes and modules are classes and modules: + +* Accessible without any `require` calls in user scripts +* Defined before any user program start running + +Hereafter, "builtin classes and modules" will be referred to as just "builtin classes". + +Builtin classes and modules are loaded in all boxes, and run in the root box. + +### Exceptional non-built-in classes/modules + +There are some exceptional classes/modules that are enabled in default, but aren't built-in classes. Those classes/modules are: + +* `RubyGems` +* `ErrorHighlight` +* `DidYouMean` +* `SyntaxSuggest` + +Those classes/modules (part of default gems) are loaded in each boxes independently. If a user box's code calls RubyGems, it calls the RubyGems inside the box itself, instead of the root box's one. + +### Builtin classes referred via box objects + +Builtin classes in a box `box` can be referred from other boxes. For example, `box::String` is a valid reference, and `String` and `box::String` are identical (`String == box::String`, `String.object_id == box::String.object_id`). + +`box::String`-like reference returns just a `String` in the current box, so its definition is `String` in the box, not in `box`. + +```ruby +# foo.rb +class String + def self.foo = "foo" +end + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::String.foo # NoMethodError +``` + +### Class instance variables, class variables, constants + +Builtin classes can have different sets of class instance variables, class variables and constants between boxes. + +```ruby +# foo.rb +class Array + @v = "foo" + @@v = "_foo_" + V = "FOO" +end + +Array.instance_variable_get(:@v) #=> "foo" +Array.class_variable_get(:@@v) #=> "_foo_" +Array.const_get(:V) #=> "FOO" + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +Array.instance_variable_get(:@v) #=> nil +Array.class_variable_get(:@@v) # NameError +Array.const_get(:V) # NameError +``` + +### Global variables + +In boxes, changes on global variables are also isolated in the boxes. Changes on global variables in a box are visible/applied only in the box. + +```ruby +# foo.rb +$foo = "foo" +$VERBOSE = nil + +puts "This appears: '#{$foo}'" + +# main.rb +p $foo #=> nil +p $VERBOSE #=> false + +box = Ruby::Box.new +box.require_relative('foo') # "This appears: 'foo'" + +p $foo #=> nil +p $VERBOSE #=> false +``` + +### Top level constants + +Usually, top level constants are defined as constants of `Object`. In boxes, top level constants are constants of `Object` in the box. And the box object `box`'s constants are strictly equal to constants of `Object`. + +```ruby +# foo.rb +FOO = 100 + +FOO #=> 100 +Object::FOO #=> 100 + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::FOO #=> 100 + +FOO # NameError +Object::FOO # NameError +``` + +### Top level methods + +Top level methods are private instance methods of `Object`, in each box. + +```ruby +# foo.rb +def yay = "foo" + +class Foo + def self.say = yay +end + +Foo.say #=> "foo" +yay #=> "foo" + +# main.rb +box = Ruby::Box.new +box.require_relative('foo') + +box::Foo.say #=> "foo" + +yay # NoMethodError +``` + +There is no way to expose top level methods in boxes to others. +(See "Expose top level methods as a method of the box object" in "Discussions" section below) + +### Ruby Box scopes + +Ruby Box works in file scope. One `.rb` file runs in a single box. + +Once a file is loaded in a box `box`, all methods/procs defined/created in the file run in `box`. + +### Utility methods + +Several methods are available for trying/testing Ruby Box. + +* `Ruby::Box.current` returns the current box +* `Ruby::Box.enabled?` returns true/false to represent `RUBY_BOX=1` is specified or not +* `Ruby::Box.root` returns the root box +* `Ruby::Box.main` returns the main box +* `Ruby::Box#eval` evaluates a Ruby code (String) in the receiver box, just like calling `#load` with a file + +## Implementation details + +#### ISeq inline method/constant cache + +As described above in "Ruby Box scopes", an ".rb" file runs in a box. So method/constant resolution will be done in a box consistently. + +That means ISeq inline caches work well even with boxes. Otherwise, it's a bug. + +#### Method call global cache (gccct) + +`rb_funcall()` C function refers to the global cc cache table (gccct), and the cache key is calculated with the current box. + +So, `rb_funcall()` calls have a performance penalty when Ruby Box is enabled. + +#### Current box and loading box + +The current box is the box that the executing code is in. `Ruby::Box.current` returns the current box object. + +The loading box is an internally managed box to determine the box to load newly required/loaded files. For example, `box` is the loading box when `box.require("foo")` is called. + +## Discussions + +#### More builtin methods written in Ruby + +If Ruby Box is enabled by default, builtin methods can be written in Ruby because it can't be overridden by users' monkey patches. Builtin Ruby methods can be JIT-ed, and it could bring performance reward. + +#### Monkey patching methods called by builtin methods + +Builtin methods sometimes call other builtin methods. For example, `Hash#map` calls `Hash#each` to retrieve entries to be mapped. Without Ruby Box, Ruby users can overwrite `Hash#each` and expect the behavior change of `Hash#map` as a result. + +But with boxes, `Hash#map` runs in the root box. Ruby users can define `Hash#each` only in user boxes, so users cannot change `Hash#map`'s behavior in this case. To achieve it, users should override both`Hash#map` and `Hash#each` (or only `Hash#map`). + +It is a breaking change. + +Users can define methods using `Ruby::Box.root.eval(...)`, but it's clearly not ideal API. + +#### Assigning values to global variables used by builtin methods + +Similar to monkey patching methods, global variables assigned in a box is separated from the root box. Methods defined in the root box referring a global variable can't find the re-assigned one. + +#### Context of `$LOAD_PATH` and `$LOADED_FEATURES` + +Global variables `$LOAD_PATH` and `$LOADED_FEATURES` control `require` method behaviors. So those variables are determined by the loading box instead of the current box. + +This could potentially conflict with the user's expectations. We should find the solution. + +#### Expose top level methods as a method of the box object + +Currently, top level methods in boxes are not accessible from outside of the box. But there might be a use case to call other box's top level methods. + +#### Separate `cc_tbl` and `callable_m_tbl`, `cvc_tbl` for less classext CoW + +The fields of `rb_classext_t` contains several cache(-like) data, `cc_tbl`(callcache table), `callable_m_tbl`(table of resolved complemented methods) and `cvc_tbl`(class variable cache table). + +The classext CoW is triggered when the contents of `rb_classext_t` are changed, including `cc_tbl`, `callable_m_tbl`, and `cvc_tbl`. But those three tables are changed by just calling methods or referring class variables. So, currently, classext CoW is triggered much more times than the original expectation. + +If we can move those three tables outside of `rb_classext_t`, the number of copied `rb_classext_t` will be much less than the current implementation. |
