summaryrefslogtreecommitdiff
path: root/doc/language/box.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/language/box.md')
-rw-r--r--doc/language/box.md357
1 files changed, 357 insertions, 0 deletions
diff --git a/doc/language/box.md b/doc/language/box.md
new file mode 100644
index 0000000000..92514b3ec9
--- /dev/null
+++ b/doc/language/box.md
@@ -0,0 +1,357 @@
+# Ruby Box - Ruby's in-process separation of Classes and Modules
+
+Ruby Box is designed to provide separated spaces in a Ruby process, to isolate application code, libraries and monkey patches.
+
+## Known issues
+
+* Experimental warning is shown when ruby starts with `RUBY_BOX=1` (specify `-W:no-experimental` option to hide it)
+* Installing native extensions may fail under `RUBY_BOX=1` because of stack level too deep in extconf.rb
+* `require 'active_support/core_ext'` may fail under `RUBY_BOX=1`
+* Defined methods in a box may not be referred by built-in methods written in Ruby
+
+## TODOs
+
+* Add the loaded box on iseq to check if another box tries running the iseq (add a field only when VM_CHECK_MODE?)
+* Assign its own TOPLEVEL_BINDING in boxes
+* Fix calling `warn` in boxes to refer `$VERBOSE` and `Warning.warn` in the box
+* Make an internal data container class `Ruby::Box::Entry` invisible
+* More test cases about `$LOAD_PATH` and `$LOADED_FEATURES`
+
+## How to use
+
+### Enabling Ruby Box
+
+First, an environment variable should be set at the ruby process bootup: `RUBY_BOX=1`.
+The only valid value is `1` to enable Ruby Box. Other values (or unset `RUBY_BOX`) means disabling Ruby Box. And setting the value after Ruby program starts doesn't work.
+
+### Using Ruby Box
+
+`Ruby::Box` class is the entrypoint of Ruby Box.
+
+```ruby
+box = Ruby::Box.new
+box.require('something') # or require_relative, load
+```
+
+The required file (either .rb or .so/.dll/.bundle) is loaded in the box (`box` here). The required/loaded files from `something` will be loaded in the box recursively.
+
+```ruby
+# something.rb
+
+X = 1
+
+class Something
+ def self.x = X
+ def x = ::X
+end
+```
+
+Classes/modules, those methods and constants defined in the box can be accessed via `box` object.
+
+```ruby
+X = 2
+p X # 2
+p ::X # 2
+p box::Something.x # 1
+p box::X # 1
+```
+
+Instance methods defined in the box also run with definitions in the box.
+
+```ruby
+s = box::Something.new
+
+p s.x # 1
+```
+
+## Specifications
+
+### Ruby Box types
+
+There are three box types:
+
+* Master box
+* Root box
+* User boxes
+
+Ruby bootstrap runs in the root box, and a
+
+There is the root box, just a single box in a Ruby process. All builtin classes/modules are defined and run in the root box. (See "Builtin classes and modules".)
+
+User boxes are to run user-written programs and libraries loaded from user programs. The user's main program (specified by the `ruby` command line argument) is executed in the "main" box, which is a user box automatically created at the end of Ruby's bootstrap. The files specified with `-r` command line option will be required in the main box.
+
+Calling `Ruby::Box.new` creates an "optional" box (a user, non-main box), technically equal to the main box.
+
+Ruby also has the master box. The master box is the "master copy" of all boxes. Boxes will be created as a copy of the master box. The master box is only for the source of box copies, and no code runs in the master box.
+
+
+```
+[master]
+ |
+ |----[root]
+ |
+ |----[main]
+ |
+ |----[user box 1]
+ |
+ |----[user box 2]
+ ...
+```
+
+### Ruby Box class and instances
+
+`Ruby::Box` is a class, as a subclass of `Module`. `Ruby::Box` instances are a kind of `Module`.
+
+### Classes and modules defined in boxes
+
+The classes and modules, newly defined in a box `box`, are accessible via `box`. For example, if a class `A` is defined in `box`, it is accessible as `box::A` from outside of the box.
+
+In the box `box`, `A` can be referred to as `A` (and `::A`).
+
+### Built-in classes and modules reopened in boxes
+
+In boxes, builtin classes/modules are visible and can be reopened. Those classes/modules can be reopened using `class` or `module` clauses, and class/module definitions can be changed.
+
+The changed definitions are visible only in the box. In other boxes, builtin classes/modules and those instances work without changed definitions.
+
+```ruby
+# in foo.rb
+class String
+ BLANK_PATTERN = /\A\s*\z/
+ def blank?
+ self.match?(BLANK_PATTERN)
+ end
+end
+
+module Foo
+ def self.foo = "foo"
+
+ def self.foo_is_blank?
+ foo.blank?
+ end
+end
+
+Foo.foo.blank? #=> false
+"foo".blank? #=> false
+
+# in main.rb
+box = Ruby::Box.new
+box.require_relative('foo')
+
+box::Foo.foo_is_blank? #=> false (#blank? called in box)
+
+"foo".blank? # NoMethodError
+String::BLANK_PATTERN # NameError
+```
+
+The main box and `box` above are different boxes, so monkey patches in main are also invisible in `box`.
+
+### Builtin classes and modules
+
+In the box context, "builtin" classes and modules are classes and modules:
+
+* Accessible without any `require` calls in user scripts
+* Defined before any user program start running
+
+Hereafter, "builtin classes and modules" will be referred to as just "builtin classes".
+
+Builtin classes and modules are loaded in all boxes, and run in the root box.
+
+### Exceptional non-built-in classes/modules
+
+There are some exceptional classes/modules that are enabled in default, but aren't built-in classes. Those classes/modules are:
+
+* `RubyGems`
+* `ErrorHighlight`
+* `DidYouMean`
+* `SyntaxSuggest`
+
+Those classes/modules (part of default gems) are loaded in each boxes independently. If a user box's code calls RubyGems, it calls the RubyGems inside the box itself, instead of the root box's one.
+
+### Builtin classes referred via box objects
+
+Builtin classes in a box `box` can be referred from other boxes. For example, `box::String` is a valid reference, and `String` and `box::String` are identical (`String == box::String`, `String.object_id == box::String.object_id`).
+
+`box::String`-like reference returns just a `String` in the current box, so its definition is `String` in the box, not in `box`.
+
+```ruby
+# foo.rb
+class String
+ def self.foo = "foo"
+end
+
+# main.rb
+box = Ruby::Box.new
+box.require_relative('foo')
+
+box::String.foo # NoMethodError
+```
+
+### Class instance variables, class variables, constants
+
+Builtin classes can have different sets of class instance variables, class variables and constants between boxes.
+
+```ruby
+# foo.rb
+class Array
+ @v = "foo"
+ @@v = "_foo_"
+ V = "FOO"
+end
+
+Array.instance_variable_get(:@v) #=> "foo"
+Array.class_variable_get(:@@v) #=> "_foo_"
+Array.const_get(:V) #=> "FOO"
+
+# main.rb
+box = Ruby::Box.new
+box.require_relative('foo')
+
+Array.instance_variable_get(:@v) #=> nil
+Array.class_variable_get(:@@v) # NameError
+Array.const_get(:V) # NameError
+```
+
+### Global variables
+
+In boxes, changes on global variables are also isolated in the boxes. Changes on global variables in a box are visible/applied only in the box.
+
+```ruby
+# foo.rb
+$foo = "foo"
+$VERBOSE = nil
+
+puts "This appears: '#{$foo}'"
+
+# main.rb
+p $foo #=> nil
+p $VERBOSE #=> false
+
+box = Ruby::Box.new
+box.require_relative('foo') # "This appears: 'foo'"
+
+p $foo #=> nil
+p $VERBOSE #=> false
+```
+
+### Top level constants
+
+Usually, top level constants are defined as constants of `Object`. In boxes, top level constants are constants of `Object` in the box. And the box object `box`'s constants are strictly equal to constants of `Object`.
+
+```ruby
+# foo.rb
+FOO = 100
+
+FOO #=> 100
+Object::FOO #=> 100
+
+# main.rb
+box = Ruby::Box.new
+box.require_relative('foo')
+
+box::FOO #=> 100
+
+FOO # NameError
+Object::FOO # NameError
+```
+
+### Top level methods
+
+Top level methods are private instance methods of `Object`, in each box.
+
+```ruby
+# foo.rb
+def yay = "foo"
+
+class Foo
+ def self.say = yay
+end
+
+Foo.say #=> "foo"
+yay #=> "foo"
+
+# main.rb
+box = Ruby::Box.new
+box.require_relative('foo')
+
+box::Foo.say #=> "foo"
+
+yay # NoMethodError
+```
+
+There is no way to expose top level methods in boxes to others.
+(See "Expose top level methods as a method of the box object" in "Discussions" section below)
+
+### Ruby Box scopes
+
+Ruby Box works in file scope. One `.rb` file runs in a single box.
+
+Once a file is loaded in a box `box`, all methods/procs defined/created in the file run in `box`.
+
+### Utility methods
+
+Several methods are available for trying/testing Ruby Box.
+
+* `Ruby::Box.current` returns the current box
+* `Ruby::Box.enabled?` returns true/false to represent `RUBY_BOX=1` is specified or not
+* `Ruby::Box.root` returns the root box
+* `Ruby::Box.main` returns the main box
+* `Ruby::Box#eval` evaluates a Ruby code (String) in the receiver box, just like calling `#load` with a file
+
+## Implementation details
+
+#### ISeq inline method/constant cache
+
+As described above in "Ruby Box scopes", an ".rb" file runs in a box. So method/constant resolution will be done in a box consistently.
+
+That means ISeq inline caches work well even with boxes. Otherwise, it's a bug.
+
+#### Method call global cache (gccct)
+
+`rb_funcall()` C function refers to the global cc cache table (gccct), and the cache key is calculated with the current box.
+
+So, `rb_funcall()` calls have a performance penalty when Ruby Box is enabled.
+
+#### Current box and loading box
+
+The current box is the box that the executing code is in. `Ruby::Box.current` returns the current box object.
+
+The loading box is an internally managed box to determine the box to load newly required/loaded files. For example, `box` is the loading box when `box.require("foo")` is called.
+
+## Discussions
+
+#### More builtin methods written in Ruby
+
+If Ruby Box is enabled by default, builtin methods can be written in Ruby because it can't be overridden by users' monkey patches. Builtin Ruby methods can be JIT-ed, and it could bring performance reward.
+
+#### Monkey patching methods called by builtin methods
+
+Builtin methods sometimes call other builtin methods. For example, `Hash#map` calls `Hash#each` to retrieve entries to be mapped. Without Ruby Box, Ruby users can overwrite `Hash#each` and expect the behavior change of `Hash#map` as a result.
+
+But with boxes, `Hash#map` runs in the root box. Ruby users can define `Hash#each` only in user boxes, so users cannot change `Hash#map`'s behavior in this case. To achieve it, users should override both`Hash#map` and `Hash#each` (or only `Hash#map`).
+
+It is a breaking change.
+
+Users can define methods using `Ruby::Box.root.eval(...)`, but it's clearly not ideal API.
+
+#### Assigning values to global variables used by builtin methods
+
+Similar to monkey patching methods, global variables assigned in a box is separated from the root box. Methods defined in the root box referring a global variable can't find the re-assigned one.
+
+#### Context of `$LOAD_PATH` and `$LOADED_FEATURES`
+
+Global variables `$LOAD_PATH` and `$LOADED_FEATURES` control `require` method behaviors. So those variables are determined by the loading box instead of the current box.
+
+This could potentially conflict with the user's expectations. We should find the solution.
+
+#### Expose top level methods as a method of the box object
+
+Currently, top level methods in boxes are not accessible from outside of the box. But there might be a use case to call other box's top level methods.
+
+#### Separate `cc_tbl` and `callable_m_tbl`, `cvc_tbl` for less classext CoW
+
+The fields of `rb_classext_t` contains several cache(-like) data, `cc_tbl`(callcache table), `callable_m_tbl`(table of resolved complemented methods) and `cvc_tbl`(class variable cache table).
+
+The classext CoW is triggered when the contents of `rb_classext_t` are changed, including `cc_tbl`, `callable_m_tbl`, and `cvc_tbl`. But those three tables are changed by just calling methods or referring class variables. So, currently, classext CoW is triggered much more times than the original expectation.
+
+If we can move those three tables outside of `rb_classext_t`, the number of copied `rb_classext_t` will be much less than the current implementation.