summaryrefslogtreecommitdiff
path: root/doc/io_streams.rdoc
blob: c8ce9991cfcce9ef908524c39b9efc2d79cc1554 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
== \IO Streams

This page describes:

- {Stream classes}[rdoc-ref:io_streams.rdoc@Stream+Classes].
- {Pre-existing streams}[rdoc-ref:io_streams.rdoc@Pre-Existing+Streams].
- {User-created streams}[rdoc-ref:io_streams.rdoc@User-Created+Streams].
- {Basic \IO}[rdoc-ref:io_streams.rdoc@Basic+IO], including:

  - {Position}[rdoc-ref:io_streams.rdoc@Position].
  - {Open and closed streams}[rdoc-ref:io_streams.rdoc@Open+and+Closed+Streams].
  - {End-of-stream}[rdoc-ref:io_streams.rdoc@End-of-Stream].

- {Line \IO}[rdoc-ref:io_streams.rdoc@Line+IO], including:

  - {Line separator}[rdoc-ref:io_streams.rdoc@Line+Separator].
  - {Line limit}[rdoc-ref:io_streams.rdoc@Line+Limit].
  - {Line number}[rdoc-ref:io_streams.rdoc@Line+Number].
  - {Line options}[rdoc-ref:io_streams.rdoc@Line+Options].

- {Character \IO}[rdoc-ref:io_streams.rdoc@Character+IO].
- {Byte \IO}[rdoc-ref:io_streams.rdoc@Byte+IO].
- {Codepoint \IO}[rdoc-ref:io_streams.rdoc@Codepoint+IO].

=== Stream Classes

Ruby supports processing data as \IO streams;
that is, as data that may be read, re-read, written, re-written,
and traversed via iteration.

Core classes with such support include:

- IO, and its derived class File.
- {StringIO}[rdoc-ref:StringIO]: for processing a string.
- {ARGF}[rdoc-ref:ARGF]: for processing files cited on the command line.

Except as noted, the instance methods described on this page
are available in classes \ARGF, \File, \IO, and \StringIO.
A few, also noted, are available in class \Kernel.

=== Pre-Existing Streams

Pre-existing streams that are referenced by constants include:

- $stdin: read-only instance of \IO.
- $stdout: write-only instance of \IO.
- $stderr: read-only instance of \IO.
- \ARGF: read-only instance of \ARGF.

=== User-Created Streams

You can create streams:

- \File:

  - File.new: returns a new \File object;
    the file should be closed when no longer needed.
  - File.open: passes a new \File object to given the block;
    the file is automatically closed on block exit.

- \IO:

  - IO.new: returns a new \IO object for the given integer file descriptor;
    the \IO object should be closed when no longer needed.
  - IO.open: passes a new \IO object to the given block;
    the \IO object is automatically closed on block exit.
  - IO.popen: returns a new \IO object that is connected to the $stdin
    and $stdout of a newly-launched subprocess.
  - Kernel#open: returns a new \IO object connected to a given source:
    stream, file, or subprocess;
    the \IO object should be closed when no longer needed.

- \StringIO:

  - StringIO.new: returns a new \StringIO object;
    the \StringIO object should be closed when no longer needed.
  - StringIO.open: passes a new \StringIO object to the given block;
    the \StringIO object is automatically closed on block exit.

(You cannot create an \ARGF object, but one already exists.)

=== About the Examples

Many examples here use these variables:

  :include: doc/examples/files.rdoc

=== Basic \IO

You can perform basic stream \IO with these methods:

- IO#read: Returns all remaining or the next _n_ bytes read from the stream,
  for a given _n_:

    f = File.new('t.txt')
    f.read     # => "First line\nSecond line\n\nFourth line\nFifth line\n"
    f.rewind
    f.read(30) # => "First line\r\nSecond line\r\n\r\nFou"
    f.read(30) # => "rth line\r\nFifth line\r\n"
    f.read(30) # => nil
    f.close

- IO#write: Writes one or more given strings to the stream:

    $stdout.write('Hello', ', ', 'World!', "\n") # => 14
    $stdout.write('foo', :bar, 2, "\n")

  Output:

    Hello, World!
    foobar2

==== Position

An \IO stream has a nonnegative integer _position_,
which is the byte offset at which the next read or write is to occur.
A new stream has position zero (and line number zero);
method +rewind+ resets the position (and line number) to zero.

The relevant methods:

- IO#tell (aliased as +#pos+):
  Returns the current position (in bytes) in the stream:

    f = File.new('t.txt')
    f.tell # => 0
    f.gets # => "First line\n"
    f.tell # => 12
    f.close

- IO#pos=: Sets the position of the stream (in bytes):

    f = File.new('t.txt')
    f.tell     # => 0
    f.pos = 20 # => 20
    f.tell     # => 20
    f.close

- IO#seek: Sets the position of the stream to a given integer +offset+
  (in bytes), with respect to a given constant +whence+, which is one of:

  - +:CUR+ or <tt>IO::SEEK_CUR</tt>:
    Repositions the stream to its current position plus the given +offset+:

      f = File.new('t.txt')
      f.tell            # => 0
      f.seek(20, :CUR)  # => 0
      f.tell            # => 20
      f.seek(-10, :CUR) # => 0
      f.tell            # => 10
      f.close

  - +:END+ or <tt>IO::SEEK_END</tt>:
    Repositions the stream to its end plus the given +offset+:

      f = File.new('t.txt')
      f.tell            # => 0
      f.seek(0, :END)   # => 0  # Repositions to stream end.
      f.tell            # => 52
      f.seek(-20, :END) # => 0
      f.tell            # => 32
      f.seek(-40, :END) # => 0
      f.tell            # => 12
      f.close

  - +:SET+ or <tt>IO:SEEK_SET</tt>:
    Repositions the stream to the given +offset+:

      f = File.new('t.txt')
      f.tell            # => 0
      f.seek(20, :SET) # => 0
      f.tell           # => 20
      f.seek(40, :SET) # => 0
      f.tell           # => 40
      f.close

- IO#rewind: Positions the stream to the beginning (also resetting the line number):

    f = File.new('t.txt')
    f.tell     # => 0
    f.gets     # => "First line\n"
    f.tell     # => 12
    f.rewind   # => 0
    f.tell     # => 0
    f.lineno   # => 0
    f.close

==== Open and Closed Streams

A new \IO stream may be open for reading, open for writing, or both.

A stream is automatically closed when claimed by the garbage collector.

Attempted reading or writing on a closed stream raises an exception.

- IO#close: Closes the stream for both reading and writing.
- IO#close_read: Closes the stream for reading; not in ARGF.
- IO#close_write: Closes the stream for writing; not in ARGF.
- IO#closed?: Returns whether the stream is closed.

==== End-of-Stream

You can query whether a stream is positioned at its end using
method IO#eof? (also aliased as +#eof+).

You can reposition to end-of-stream by reading all stream content:

  f = File.new('t.txt')
  f.eof? # => false
  f.read # => "First line\nSecond line\n\nFourth line\nFifth line\n"
  f.eof? # => true

Or by using method IO#seek:

  f = File.new('t.txt')
  f.eof? # => false
  f.seek(0, :END)
  f.eof? # => true

=== Line \IO

You can read an \IO stream line-by-line using these methods:

- IO#each_line: Passes each line to the block:

    f = File.new('t.txt')
    f.each_line {|line| p line }

  Output:

    "First line\n"
    "Second line\n"
    "\n"
    "Fourth line\n"
    "Fifth line\n"

  The reading may begin mid-line:

    f = File.new('t.txt')
    f.pos = 27
    f.each_line {|line| p line }

  Output:

    "rth line\n"
    "Fifth line\n"

- IO#gets (also in Kernel): Returns the next line (which may begin mid-line):

    f = File.new('t.txt')
    f.gets      # => "First line\n"
    f.gets      # => "Second line\n"
    f.pos = 27
    f.gets      # => "rth line\n"
    f.readlines # => ["Fifth line\n"]
    f.gets      # => nil

- IO#readline (also in Kernel; not in StringIO):
  Like #gets, but raises an exception at end-of-stream.

- IO#readlines (also in Kernel): Returns all remaining lines in an array;
  may begin mid-line:

    f = File.new('t.txt')
    f.pos = 19
    f.readlines # => ["ine\n", "\n", "Fourth line\n", "Fifth line\n"]
    f.readlines # => []

Each of these reader methods may be called with:

- An optional line separator, +sep+.
- An optional line-size limit, +limit+.
- Both +sep+ and +limit+.

You can write to an \IO stream line-by-line using this method:

- IO#puts (also in Kernel; not in \StringIO): Writes objects to the stream:

    f = File.new('t.tmp', 'w')
    f.puts('foo', :bar, 1, 2.0, Complex(3, 0))
    f.flush
    File.read('t.tmp') # => "foo\nbar\n1\n2.0\n3+0i\n"

==== Line Separator

The default line separator is the given by the global variable <tt>$/</tt>,
whose value is by default <tt>"\n"</tt>.
The line to be read next is all data from the current position
to the next line separator:

  f = File.new('t.txt')
  f.gets # => "First line\n"
  f.gets # => "Second line\n"
  f.gets # => "\n"
  f.gets # => "Fourth line\n"
  f.gets # => "Fifth line\n"
  f.close

You can specify a different line separator:

  f = File.new('t.txt')
  f.gets('l')   # => "First l"
  f.gets('li')  # => "ine\nSecond li"
  f.gets('lin') # => "ne\n\nFourth lin"
  f.gets        # => "e\n"
  f.close

There are two special line separators:

- +nil+: The entire stream is read into a single string:

    f = File.new('t.txt')
    f.gets(nil) # => "First line\nSecond line\n\nFourth line\nFifth line\n"
    f.close

- <tt>''</tt> (the empty string): The next "paragraph" is read
  (paragraphs being separated by two consecutive line separators):

    f = File.new('t.txt')
    f.gets('') # => "First line\nSecond line\n\n"
    f.gets('') # => "Fourth line\nFifth line\n"
    f.close

==== Line Limit

The line to be read may be further defined by an optional integer argument +limit+,
which specifies that the number of bytes returned may not be (much) longer
than the given +limit+;
a multi-byte character will not be split, and so a line may be slightly longer
than the given limit.

If +limit+ is not given, the line is determined only by +sep+.

  # Text with 1-byte characters.
  File.new('t.txt') {|f| f.gets(1) }  # => "F"
  File.new('t.txt') {|f| f.gets(2) }  # => "Fi"
  File.new('t.txt') {|f| f.gets(3) }  # => "Fir"
  File.new('t.txt') {|f| f.gets(4) }  # => "Firs"
  # No more than one line.
  File.new('t.txt') {|f| f.gets(10) } # => "First line"
  File.new('t.txt') {|f| f.gets(11) } # => "First line\n"
  File.new('t.txt') {|f| f.gets(12) } # => "First line\n"

  # Text with 2-byte characters, which will not be split.
  File.new('r.rus') {|f| f.gets(1).size } # => 1
  File.new('r.rus') {|f| f.gets(2).size } # => 1
  File.new('r.rus') {|f| f.gets(3).size } # => 2
  File.new('r.rus') {|f| f.gets(4).size } # => 2

==== Line Separator and Line Limit

With arguments +sep+ and +limit+ given,
combines the two behaviors:

- Returns the next line as determined by line separator +sep+.
- But returns no more bytes than are allowed by the limit.

Example:

  File.new('t.txt') {|f| f.gets('li', 20) } # => "First li"
  File.new('t.txt') {|f| f.gets('li', 2) }  # => "Fi"

==== Line Number

A readable \IO stream has a _line_ _number_,
which is the non-negative integer line number
in the stream where the next read will occur.

The line number is the number of lines read by certain line-oriented methods
(IO.foreach, IO#each_line, IO#gets, IO#readline, and IO#readlines)
according to the given (or default) line separator +sep+.

A new stream is initially has line number zero (and position zero);
method +rewind+ resets the line number (and position) to zero.

\Method IO#lineno returns the line number.

Reading lines from a stream usually changes its line number:

  f = File.new('t.txt', 'r')
  f.lineno   # => 0
  f.readline # => "This is line one.\n"
  f.lineno   # => 1
  f.readline # => "This is the second line.\n"
  f.lineno   # => 2
  f.readline # => "Here's the third line.\n"
  f.lineno   # => 3
  f.eof?     # => true
  f.close

Iterating over lines in a stream usually changes its line number:

  File.open('t.txt') do |f|
    f.each_line do |line|
      p "position=#{f.pos} eof?=#{f.eof?} lineno=#{f.lineno}"
    end
  end

Output:

  "position=11 eof?=false lineno=1"
  "position=23 eof?=false lineno=2"
  "position=24 eof?=false lineno=3"
  "position=36 eof?=false lineno=4"
  "position=47 eof?=true lineno=5"

==== Line Options

A number of \IO methods accept optional keyword arguments
that determine how lines in a stream are to be treated:

- +:chomp+: If +true+, line separators are omitted; default is +false+.

=== Character \IO

You can process an \IO stream character-by-character using these methods:

- IO#getc: Reads and returns the next character from the stream:

    f = File.new('t.rus')
    f.getc # => "т"
    f.getc # => "е"
    f.getc # => "с"
    f.getc # => "т"
    f.getc # => nil

- IO#readchar (not in \StringIO):
  Like #getc, but raises an exception at end-of-stream:

    f.readchar # Raises EOFError.

- IO#ungetc (not in \ARGF):
  Pushes back ("unshifts") a character or integer onto the stream:

    path = 't.tmp'
    File.write(path, 'foo')
    File.open(path) do |f|
      f.ungetc('т')
      f.read # => "тfoo"
    end

- IO#putc (also in Kernel): Writes a character to the stream:

    File.open('t.tmp', 'w') do |f|
      f.putc('т')
      f.putc('е')
      f.putc('с')
      f.putc('т')
    end
    File.read('t.tmp') # => "тест"

- IO#each_char: Reads each remaining character in the stream,
  passing the character to the given block:

    File.open('t.rus') do |f|
      f.pos = 4
      f.each_char {|c| p c }
    end

  Output:

    "с"
    "т"

=== Byte \IO

You can process an \IO stream byte-by-byte using these methods:

- IO#getbyte: Returns the next 8-bit byte as an integer in range 0..255:

    File.read('t.dat')
    # => "\xFE\xFF\x99\x90\x99\x91\x99\x92\x99\x93\x99\x94"
    File.read('t.dat')
    # => "\xFE\xFF\x99\x90\x99\x91\x99\x92\x99\x93\x99\x94"
    f = File.new('t.dat')
    f.getbyte # => 254
    f.getbyte # => 255
    f.seek(-2, :END)
    f.getbyte # => 153
    f.getbyte # => 148
    f.getbyte # => nil

- IO#readbyte (not in \StringIO):
  Like #getbyte, but raises an exception if at end-of-stream:

    f.readbyte # Raises EOFError.

- IO#ungetbyte (not in \ARGF):
  Pushes back ("unshifts") a byte back onto the stream:

    f.ungetbyte(0)
    f.ungetbyte(01)
    f.read # => "\u0001\u0000"

- IO#each_byte: Reads each remaining byte in the stream,
  passing the byte to the given block:

    f.seek(-4, :END)
    f.each_byte {|b| p b }

  Output:

    153
    147
    153
    148

=== Codepoint \IO

You can process an \IO stream codepoint-by-codepoint using method
+#each_codepoint+:

  a = []
  File.open('t.rus') do |f|
    f.each_codepoint {|c| a << c }
  end
  a # => [1090, 1077, 1089, 1090]