summaryrefslogtreecommitdiff
path: root/doc/stringio/stringio.md
blob: 8931d1c30c7d79333b6cfa405d6271fc1cad90ed (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
\Class \StringIO supports accessing a string as a stream,
similar in some ways to [class IO][io class].

You can create a \StringIO instance using:

- StringIO.new: returns a new \StringIO object containing the given string.
- StringIO.open: passes a new \StringIO object to the given block.

Like an \IO stream, a \StringIO stream has certain properties:

- **Read/write mode**: whether the stream may be read, written, appended to, etc.;
  see [Read/Write Mode][read/write mode].
- **Data mode**: text-only or binary;
  see [Data Mode][data mode].
- **Encodings**: internal and external encodings;
  see [Encodings][encodings].
- **Position**: where in the stream the next read or write is to occur;
  see [Position][position].
- **Line number**: a special, line-oriented, "position" (different from the position mentioned above);
  see [Line Number][line number].
- **Open/closed**: whether the stream is open or closed, for reading or writing.
  see [Open/Closed Streams][open/closed streams].
- **BOM**: byte mark order;
  see [Byte Order Mark][bom (byte order mark)].

## About the Examples

Examples on this page assume that \StringIO has been required:

```ruby
require 'stringio'
```

And that this constant has been defined:

```ruby
TEXT = <<EOT
First line
Second line

Fourth line
Fifth line
EOT
```

## Stream Properties

### Read/Write Mode

#### Summary

|            Mode            | Initial Clear? |   Read   |  Write   |
|:--------------------------:|:--------------:|:--------:|:--------:|
|  <tt>'r'</tt>: read-only   |       No       | Anywhere |  Error   |
|  <tt>'w'</tt>: write-only  |      Yes       |  Error   | Anywhere |
| <tt>'a'</tt>: append-only  |       No       |  Error   | End only |
| <tt>'r+'</tt>: read/write  |       No       | Anywhere | Anywhere |
| <tt>'w+'</tt>: read-write  |      Yes       | Anywhere | Anywhere |
| <tt>'a+'</tt>: read/append |       No       | Anywhere | End only |

Each section below describes a read/write mode.

Any of the modes may be given as a string or as file constants;
example:

```ruby
strio = StringIO.new('foo', 'a')
strio = StringIO.new('foo', File::WRONLY | File::APPEND)
```

#### `'r'`: Read-Only

Mode specified as one of:

- String: `'r'`.
- Constant: `File::RDONLY`.

Initial state:

```ruby
strio = StringIO.new('foobarbaz', 'r')
strio.pos    # => 0            # Beginning-of-stream.
strio.string # => "foobarbaz"  # Not cleared.
```

May be read anywhere:

```ruby
strio.gets(3) # => "foo"
strio.gets(3) # => "bar"
strio.pos = 9
strio.gets(3) # => nil
```

May not be written:

```ruby
strio.write('foo')  # Raises IOError: not opened for writing
```

#### `'w'`: Write-Only

Mode specified as one of:

- String: `'w'`.
- Constant: `File::WRONLY`.

Initial state:

```ruby
strio = StringIO.new('foo', 'w')
strio.pos    # => 0   # Beginning of stream.
strio.string # => ""  # Initially cleared.
```

May be written anywhere (even past end-of-stream):

```ruby
strio.write('foobar')
strio.string # => "foobar"
strio.rewind
strio.write('FOO')
strio.string # => "FOObar"
strio.pos = 3
strio.write('BAR')
strio.string # => "FOOBAR"
strio.pos = 9
strio.write('baz')
strio.string # => "FOOBAR\u0000\u0000\u0000baz"  # Null-padded.
```

May not be read:

```ruby
strio.read  # Raises IOError: not opened for reading
```

#### `'a'`: Append-Only

Mode specified as one of:

- String: `'a'`.
- Constant: `File::WRONLY | File::APPEND`.

Initial state:

```ruby
strio = StringIO.new('foo', 'a')
strio.pos    # => 0      # Beginning-of-stream.
strio.string # => "foo"  # Not cleared.
```

May be written only at the end; position does not affect writing:

```ruby
strio.write('bar')
strio.string # => "foobar"
strio.write('baz')
strio.string # => "foobarbaz"
strio.pos = 400
strio.write('bat')
strio.string # => "foobarbazbat"
```

May not be read:

```ruby
strio.gets  # Raises IOError: not opened for reading
```

#### `'r+'`: Read/Write

Mode specified as one of:

- String: `'r+'`.
- Constant: `File::RDRW`.

Initial state:

```ruby
strio = StringIO.new('foobar', 'r+')
strio.pos    # => 0         # Beginning-of-stream.
strio.string # => "foobar"  # Not cleared.
```

May be written anywhere (even past end-of-stream):

```ruby
strio.write('FOO')
strio.string # => "FOObar"
strio.write('BAR')
strio.string # => "FOOBAR"
strio.write('BAZ')
strio.string # => "FOOBARBAZ"
strio.pos = 12
strio.write('BAT')
strio.string # => "FOOBARBAZ\u0000\u0000\u0000BAT"  # Null padded.
```

May be read anywhere:

```ruby
strio.pos = 0
strio.gets(3) # => "FOO"
strio.pos = 6
strio.gets(3) # => "BAZ"
strio.pos = 400
strio.gets(3) # => nil
```

#### `'w+'`: Read/Write (Initially Clear)

Mode specified as one of:

- String: `'w+'`.
- Constant: `File::RDWR | File::TRUNC`.

Initial state:

```ruby
strio = StringIO.new('foo', 'w+')
strio.pos    # => 0   # Beginning-of-stream.
strio.string # => ""  # Truncated.
```

May be written anywhere (even past end-of-stream):

```ruby
strio.write('foobar')
strio.string # => "foobar"
strio.rewind
strio.write('FOO')
strio.string # => "FOObar"
strio.write('BAR')
strio.string # => "FOOBAR"
strio.write('BAZ')
strio.string # => "FOOBARBAZ"
strio.pos = 12
strio.write('BAT')
strio.string # => "FOOBARBAZ\u0000\u0000\u0000BAT"  # Null-padded.
```

May be read anywhere:

```ruby
strio.rewind
strio.gets(3) # => "FOO"
strio.gets(3) # => "BAR"
strio.pos = 12
strio.gets(3) # => "BAT"
strio.pos = 400
strio.gets(3) # => nil
```

#### `'a+'`: Read/Append

Mode specified as one of:

- String: `'a+'`.
- Constant: `File::RDWR | File::APPEND`.

Initial state:

```ruby
strio = StringIO.new('foo', 'a+')
strio.pos    # => 0      # Beginning-of-stream.
strio.string # => "foo"  # Not cleared.
```

May be written only at the end; #rewind; position does not affect writing:

```ruby
strio.write('bar')
strio.string # => "foobar"
strio.write('baz')
strio.string # => "foobarbaz"
strio.pos = 400
strio.write('bat')
strio.string # => "foobarbazbat"
```

May be read anywhere:

```ruby
strio.rewind
strio.gets(3) # => "foo"
strio.gets(3) # => "bar"
strio.pos = 9
strio.gets(3) # => "bat"
strio.pos = 400
strio.gets(3) # => nil
```

### Data Mode

To specify whether the stream is to be treated as text or as binary data,
either of the following may be suffixed to any of the string read/write modes above:

- `'t'`: Text;
  initializes the encoding as Encoding::UTF_8.
- `'b'`: Binary;
  initializes the encoding as Encoding::ASCII_8BIT.

If neither is given, the stream defaults to text data.

Examples:

```ruby
strio = StringIO.new('foo', 'rt')
strio.external_encoding # => #<Encoding:UTF-8>
data = "\u9990\u9991\u9992\u9993\u9994"
strio = StringIO.new(data, 'rb')
strio.external_encoding # => #<Encoding:BINARY (ASCII-8BIT)>
```

When the data mode is specified, the read/write mode may not be omitted:

```ruby
StringIO.new(data, 'b')  # Raises ArgumentError: invalid access mode b
```

A text stream may be changed to binary by calling instance method #binmode;
a binary stream may not be changed to text.

### Encodings

A stream has an encoding; see [Encodings][encodings document].

The initial encoding for a new or re-opened stream depends on its [data mode][data mode]:

- Text: `Encoding::UTF_8`.
- Binary: `Encoding::ASCII_8BIT`.

These instance methods are relevant:

- #external_encoding: returns the current encoding of the stream as an `Encoding` object.
- #internal_encoding: returns +nil+; a stream does not have an internal encoding.
- #set_encoding: sets the encoding for the stream.
- #set_encoding_by_bom: sets the encoding for the stream to the stream's BOM (byte order mark).

Examples:

```ruby
strio = StringIO.new('foo', 'rt')  # Text mode.
strio.external_encoding # => #<Encoding:UTF-8>
data = "\u9990\u9991\u9992\u9993\u9994"
strio = StringIO.new(data, 'rb') # Binary mode.
strio.external_encoding # => #<Encoding:BINARY (ASCII-8BIT)>
strio = StringIO.new('foo')
strio.external_encoding # => #<Encoding:UTF-8>
strio.set_encoding('US-ASCII')
strio.external_encoding # => #<Encoding:US-ASCII>
```

### Position

A stream has a _position_, and integer offset (in bytes) into the stream.
The initial position of a stream is zero.

#### Getting and Setting the Position

Each of these methods initializes (to zero) the position of a new or re-opened stream:

- ::new: returns a new stream.
- ::open: passes a new stream to the block.
- #reopen: re-initializes the stream.

Each of these methods queries, gets, or sets the position, without otherwise changing the stream:

- #eof?: returns whether the position is at end-of-stream.
- #pos: returns the position.
- #pos=: sets the position.
- #rewind: sets the position to zero.
- #seek: sets the position.

Examples:

```ruby
strio = StringIO.new('foobar')
strio.pos  # => 0
strio.pos = 3
strio.pos  # => 3
strio.eof? # => false
strio.rewind
strio.pos  # => 0
strio.seek(0, IO::SEEK_END)
strio.pos  # => 6
strio.eof? # => true
```

#### Position Before and After Reading

Except for #pread, a stream reading method (see [Basic Reading][basic reading])
begins reading at the current position.

Except for #pread, a read method advances the position past the read substring.

Examples:

```ruby
strio = StringIO.new(TEXT)
strio.string # => "First line\nSecond line\n\nFourth line\nFifth line\n"
strio.pos    # => 0
strio.getc   # => "F"
strio.pos    # => 1
strio.gets   # => "irst line\n"
strio.pos    # => 11
strio.pos = 24
strio.gets   # => "Fourth line\n"
strio.pos    # => 36

strio = StringIO.new('тест') # Four 2-byte characters.
strio.pos = 0 # At first byte of first character.
strio.read    # => "тест"
strio.pos = 1 # At second byte of first character.
strio.read    # => "\x82ест"
strio.pos = 2 # At first of second character.
strio.read    # => "ест"

strio = StringIO.new(TEXT)
strio.pos = 15
a = []
strio.each_line {|line| a.push(line) }
a         # => ["nd line\n", "\n", "Fourth line\n", "Fifth line\n"]
strio.pos # => 47  ## End-of-stream.
```

#### Position Before and After Writing

Each of these methods begins writing at the current position,
and advances the position to the end of the written substring:

- #putc: writes the given character.
- #write: writes the given objects as strings.
- [Kernel#puts][kernel#puts]: writes given objects as strings, each followed by newline.

Examples:

```ruby
strio = StringIO.new('foo')
strio.pos    # => 0
strio.putc('b')
strio.string # => "boo"
strio.pos    # => 1
strio.write('r')
strio.string # => "bro"
strio.pos    # => 2
strio.puts('ew')
strio.string # => "brew\n"
strio.pos    # => 5
strio.pos = 8
strio.write('foo')
strio.string # => "brew\n\u0000\u0000\u0000foo"
strio.pos    # => 11
```

Each of these methods writes _before_ the current position, and decrements the position
so that the written data is next to be read:

- #ungetbyte: unshifts the given byte.
- #ungetc: unshifts the given character.

Examples:

```ruby
strio = StringIO.new('foo')
strio.pos = 2
strio.ungetc('x')
strio.pos    # => 1
strio.string # => "fxo"
strio.ungetc('x')
strio.pos    # => 0
strio.string # => "xxo"
```

This method does not affect the position:

- #truncate: truncates the stream's string to the given size.

Examples:

```ruby
strio = StringIO.new('foobar')
strio.pos    # => 0
strio.truncate(3)
strio.string # => "foo"
strio.pos    # => 0
strio.pos = 500
strio.truncate(0)
strio.string # => ""
strio.pos    # => 500
```

### Line Number

A stream has a line number, which initially is zero:

- Method #lineno returns the line number.
- Method #lineno= sets the line number.

The line number can be affected by reading (but never by writing);
in general, the line number is incremented each time the record separator (default: `"\n"`) is read.

Examples:

```ruby
strio = StringIO.new(TEXT)
strio.string # => "First line\nSecond line\n\nFourth line\nFifth line\n"
strio.lineno # => 0
strio.gets   # => "First line\n"
strio.lineno # => 1
strio.getc   # => "S"
strio.lineno # => 1
strio.gets   # => "econd line\n"
strio.lineno # => 2
strio.gets   # => "\n"
strio.lineno # => 3
strio.gets   # => "Fourth line\n"
strio.lineno # => 4
```

Setting the position does not affect the line number:

```ruby
strio.pos = 0
strio.lineno # => 4
strio.gets   # => "First line\n"
strio.pos    # => 11
strio.lineno # => 5
```

And setting the line number does not affect the position:

```ruby
strio.lineno = 10
strio.pos    # => 11
strio.gets   # => "Second line\n"
strio.lineno # => 11
strio.pos    # => 23
```

### Open/Closed Streams

A new stream is open for either reading or writing, and may be open for both;
see [Read/Write Mode][read/write mode].

Each of these methods initializes the read/write mode for a new or re-opened stream:

- ::new: returns a new stream.
- ::open: passes a new stream to the block.
- #reopen: re-initializes the stream.

Other relevant methods:

- #close: closes the stream for both reading and writing.
- #close_read: closes the stream for reading.
- #close_write: closes the stream for writing.
- #closed?: returns whether the stream is closed for both reading and writing.
- #closed_read?: returns whether the stream is closed for reading.
- #closed_write?: returns whether the stream is closed for writing.

### BOM (Byte Order Mark)

The string provided for ::new, ::open, or #reopen
may contain an optional [BOM][bom] (byte order mark) at the beginning of the string;
the BOM can affect the stream's encoding.

The BOM (if provided):

- Is stored as part of the stream's string.
- Does _not_ immediately affect the encoding.
- Is _initially_ considered part of the stream.

```ruby
utf8_bom = "\xEF\xBB\xBF"
string = utf8_bom + 'foo'
string.bytes               # => [239, 187, 191, 102, 111, 111]
strio.string.bytes.take(3) # => [239, 187, 191]                  # The BOM.
strio = StringIO.new(string, 'rb')
strio.string.bytes         # => [239, 187, 191, 102, 111, 111]   # BOM is part of the stored string.
strio.external_encoding    # => #<Encoding:BINARY (ASCII-8BIT)>  # Default for a binary stream.
strio.gets                 # => "\xEF\xBB\xBFfoo"                # BOM is part of the stream.
```

You can call instance method #set_encoding_by_bom to "activate" the stored BOM;
after doing so the BOM:

- Is _still_ stored as part of the stream's string.
- _Determines_ (and may have changed) the stream's encoding.
- Is _no longer_ considered part of the stream.

```ruby
strio.set_encoding_by_bom
strio.string.bytes      # => [239, 187, 191, 102, 111, 111]  # BOM is still part of the stored string.
strio.external_encoding # => #<Encoding:UTF-8>               # The new encoding.
strio.rewind            # => 0
strio.gets              # => "foo"                           # BOM is not part of the stream.
```

## Basic Stream \IO

### Basic Reading

You can read from the stream using these instance methods:

- #getbyte: reads and returns the next byte.
- #getc: reads and returns the next character.
- #gets: reads and returns all or part of the next line.
- #read: reads and returns all or part of the remaining data in the stream.
- #readlines: reads the remaining data the stream and returns an array of its lines.
- [Kernel#readline][kernel#readline]: like #gets, but raises an exception if at end-of-stream.

You can iterate over the stream using these instance methods:

- #each_byte: reads each remaining byte, passing it to the block.
- #each_char: reads each remaining character, passing it to the block.
- #each_codepoint: reads each remaining codepoint, passing it to the block.
- #each_line: reads all or part of each remaining line, passing the read string to the block

This instance method is useful in a multi-threaded application:

- #pread: reads and returns all or part of the stream.
 
### Basic Writing

You can write to the stream, advancing the position, using these instance methods:

- #putc: writes a given character.
- #write: writes the given objects as strings.
- [Kernel#puts][kernel#puts] writes given objects as strings, each followed by newline.

You can "unshift" to the stream using these instance methods;
each  writes _before_ the current position, and decrements the position
so that the written data is next to be read.

- #ungetbyte: unshifts the given byte.
- #ungetc: unshifts the given character.

One more writing method:

- #truncate: truncates the stream's string to the given size.

## Line \IO

Reading:

- #gets: reads and returns the next line.
- [Kernel#readline][kernel#readline]: like #gets, but raises an exception if at end-of-stream.
- #readlines: reads the remaining data the stream and returns an array of its lines.
- #each_line: reads each remaining line, passing it to the block

Writing:

- [Kernel#puts][kernel#puts]: writes given objects, each followed by newline.

## Character \IO

Reading:

- #each_char: reads each remaining character, passing it to the block.
- #getc: reads and returns the next character.

Writing:

- #putc: writes the given character.
- #ungetc.: unshifts the given character.

## Byte \IO

Reading:

- #each_byte: reads each remaining byte, passing it to the block.
- #getbyte: reads and returns the next byte.

Writing:

- #ungetbyte: unshifts the given byte.

## Codepoint \IO

Reading:

- #each_codepoint: reads each remaining codepoint, passing it to the block.

[bom]:                https://en.wikipedia.org/wiki/Byte_order_mark
[encodings document]: https://docs.ruby-lang.org/en/master/language/encodings_rdoc.html
[io class]:           https://docs.ruby-lang.org/en/master/IO.html
[kernel#puts]:        https://docs.ruby-lang.org/en/master/Kernel.html#method-i-puts
[kernel#readline]:    https://docs.ruby-lang.org/en/master/Kernel.html#method-i-readline

[basic reading]:         rdoc-ref:StringIO@Basic+Reading
[basic writing]:         rdoc-ref:StringIO@Basic+Writing
[bom (byte order mark)]: rdoc-ref:StringIO@BOM+-28Byte+Order+Mark-29
[data mode]:             rdoc-ref:StringIO@Data+Mode
[encodings]:             rdoc-ref:StringIO@Encodings
[end-of-stream]:         rdoc-ref:StringIO@End-of-Stream
[line number]:           rdoc-ref:StringIO@Line+Number
[open/closed streams]:   rdoc-ref:StringIO@Open-2FClosed+Streams
[position]:              rdoc-ref:StringIO@Position
[read/write mode]:       rdoc-ref:StringIO@Read-2FWrite+Mode