summaryrefslogtreecommitdiff
path: root/doc/io_streams.rdoc
blob: aab1b21b9c2950ef0b4fc64c08752a9851fa82ca (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
== \IO Streams

Ruby supports processing data as \IO streams;
that is, as data that may be read, re-read, written, re-written,
and traversed via iteration.

Core classes with such support include:

- IO, and its derived class File.
- {StringIO}[rdoc-ref:StringIO]: for processing a string.
- {ARGF}[rdoc-ref:ARGF]: for processing files cited on the command line.

Pre-existing stream objects that are referenced by constants include:

- $stdin: read-only instance of \IO.
- $stdout: write-only instance of \IO.
- $stderr: read-only instance of \IO.
- \ARGF: read-only instance of \ARGF.

You can create stream objects:

- \File:

  - File.new: returns a new \File object.
  - File.open: passes a new \File object to given the block.

- \IO:

  - IO.new: returns a new \IO object for the given integer file descriptor.
  - IO.open: passes a new \IO object to the given block.
  - IO.popen: returns a new \IO object that is connected to the $stdin
    and $stdout of a newly-launched subprocess.
  - Kernel#open: returns a new \IO object connected to a given source:
    stream, file, or subprocess.

- \StringIO:

  - StringIO.new: returns a new \StringIO object.
  - StringIO.open: passes a new \StringIO object to the given block.

(You cannot create an \ARGF object, but one already exists.)

=== About the Examples

Many examples here use these variables:

  # English text with newlines.
  text = <<~EOT
    First line
    Second line

    Fourth line
    Fifth line
  EOT

  # Russian text.
  russian = "\u{442 435 441 442}" # => "тест"

  # Binary data.
  data = "\u9990\u9991\u9992\u9993\u9994"

  # Text file.
  File.write('t.txt', text)

  # File with Russian text.
  File.write('t.rus', russian)

  # File with binary data.
  f = File.new('t.dat', 'wb:UTF-16')
  f.write(data)
  f.close

=== Position

An \IO stream has a nonnegative integer _position_,
which is the byte offset at which the next read or write is to occur;
the relevant methods:

- IO#tell (aliased as +#pos+):
  Returns the current position (in bytes) in the stream:

    f = File.new('t.txt')
    f.tell # => 0
    f.gets # => "First line\n"
    f.tell # => 12
    f.close

- IO#pos=: Sets the position of the stream (in bytes):

    f = File.new('t.txt')
    f.tell     # => 0
    f.pos = 20 # => 20
    f.tell     # => 20
    f.close

- IO#seek: Sets the position of the stream to a given integer +offset+
  (in bytes), with respect to a given constant +whence+, which is one of:

  - +:CUR+ or <tt>IO::SEEK_CUR</tt>:
    Repositions the stream to its current position plus the given +offset+:

      f = File.new('t.txt')
      f.tell            # => 0
      f.seek(20, :CUR)  # => 0
      f.tell            # => 20
      f.seek(-10, :CUR) # => 0
      f.tell            # => 10
      f.close

  - +:END+ or <tt>IO::SEEK_END</tt>:
    Repositions the stream to its end plus the given +offset+:

      f = File.new('t.txt')
      f.tell            # => 0
      f.seek(0, :END)   # => 0  # Repositions to stream end.
      f.tell            # => 52
      f.seek(-20, :END) # => 0
      f.tell            # => 32
      f.seek(-40, :END) # => 0
      f.tell            # => 12
      f.close

  - +:SET+ or <tt>IO:SEEK_SET</tt>:
    Repositions the stream to the given +offset+:

      f = File.new('t.txt')
      f.tell            # => 0
      f.seek(20, :SET) # => 0
      f.tell           # => 20
      f.seek(40, :SET) # => 0
      f.tell           # => 40
      f.close

- IO#rewind: Positions the stream to the beginning:

    f = File.new('t.txt')
    f.tell     # => 0
    f.gets     # => "First line\n"
    f.tell     # => 12
    f.rewind   # => 0
    f.tell     # => 0
    f.close

=== Lines

Some reader methods in \IO streams are line-oriented;
such a method reads one or more lines,
which are separated by an implicit or explicit line separator.

These methods are included (except as noted) in classes Kernel, IO, File,
and {ARGF}[rdoc-ref:ARGF]:

- IO#each_line: Passes each line to the block; not in Kernel:

    f = File.new('t.txt')
    f.each_line {|line| p line }

  Output:

    "First line\n"
    "Second line\n"
    "\n"
    "Fourth line\n"
    "Fifth line\n"

  The reading may begin mid-line:

    f = File.new('t.txt')
    f.pos = 27
    f.each_line {|line| p line }

  Output:

    "rth line\n"
    "Fifth line\n"

- IO#gets: Returns the next line (which may begin mid-line):

    f = File.new('t.txt')
    f.gets      # => "First line\n"
    f.gets      # => "Second line\n"
    f.pos = 27
    f.gets      # => "rth line\n"
    f.readlines # => ["Fifth line\n"]
    f.gets      # => nil

- IO#readline: Like #gets, but raises an exception at end-of-file;
  not in StringIO.

- IO#readlines: Returns all remaining lines in an array;
  may begin mid-line:

    f = File.new('t.txt')
    f.pos = 19
    f.readlines # => ["ine\n", "\n", "Fourth line\n", "Fifth line\n"]
    f.readlines # => []

Each of these methods may be called with:

- An optional line separator, +sep+.
- An optional line-size limit, +limit+.
- Both +sep+ and +limit+.

==== Line Separator

The default line separator is the given by the global variable <tt>$/</tt>,
whose value is by default <tt>"\n"</tt>.
The line to be read next is all data from the current position
to the next line separator:

  f = File.new('t.txt')
  f.gets # => "First line\n"
  f.gets # => "Second line\n"
  f.gets # => "\n"
  f.gets # => "Fourth line\n"
  f.gets # => "Fifth line\n"
  f.close

You can specify a different line separator:

  f = File.new('t.txt')
  f.gets('l')   # => "First l"
  f.gets('li')  # => "ine\nSecond li"
  f.gets('lin') # => "ne\n\nFourth lin"
  f.gets        # => "e\n"
  f.close

There are two special line separators:

- +nil+: The entire stream is read into a single string:

    f = File.new('t.txt')
    f.gets(nil) # => "First line\nSecond line\n\nFourth line\nFifth line\n"
    f.close

- <tt>''</tt> (the empty string): The next "paragraph" is read
  (paragraphs being separated by two consecutive line separators):

    f = File.new('t.txt')
    f.gets('') # => "First line\nSecond line\n\n"
    f.gets('') # => "Fourth line\nFifth line\n"
    f.close

==== Line Limit

The line to be read may be further defined by an optional integer argument +limit+,
which specifies that the number of bytes returned may not be (much) longer
than the given +limit+;
a multi-byte character will not be split, and so a line may be slightly longer
than the given limit.

If +limit+ is not given, the line is determined only by +sep+.

  # Text with 1-byte characters.
  File.new('t.txt') {|f| f.gets(1) }  # => "F"
  File.new('t.txt') {|f| f.gets(2) }  # => "Fi"
  File.new('t.txt') {|f| f.gets(3) }  # => "Fir"
  File.new('t.txt') {|f| f.gets(4) }  # => "Firs"
  # No more than one line.
  File.new('t.txt') {|f| f.gets(10) } # => "First line"
  File.new('t.txt') {|f| f.gets(11) } # => "First line\n"
  File.new('t.txt') {|f| f.gets(12) } # => "First line\n"

  # Text with 2-byte characters, which will not be split.
  File.new('r.rus') {|f| f.gets(1).size } # => 1
  File.new('r.rus') {|f| f.gets(2).size } # => 1
  File.new('r.rus') {|f| f.gets(3).size } # => 2
  File.new('r.rus') {|f| f.gets(4).size } # => 2

==== Line Separator and Line Limit

With arguments +sep+ and +limit+ given,
combines the two behaviors:

- Returns the next line as determined by line separator +sep+.
- But returns no more bytes than are allowed by the limit.

Example:

  File.new('t.txt') {|f| f.gets('li', 20) } # => "First li"
  File.new('t.txt') {|f| f.gets('li', 2) }  # => "Fi"

==== Line Number

A readable \IO stream has a _line_ _number_,
which is the non-negative integer line number
in the stream where the next read will occur.

A new stream is initially has line number +0+.

\Method IO#lineno returns the line number.

Reading lines from a stream usually changes its line number:

  f = File.new('t.txt', 'r')
  f.lineno   # => 0
  f.readline # => "This is line one.\n"
  f.lineno   # => 1
  f.readline # => "This is the second line.\n"
  f.lineno   # => 2
  f.readline # => "Here's the third line.\n"
  f.lineno   # => 3
  f.eof?     # => true
  f.close

Iterating over lines in a stream usually changes its line number:

  f = File.new('t.txt')
  f.each_line do |line|
    p "position=#{f.pos} eof?=#{f.eof?} lineno=#{f.lineno}"
  end
  f.close

Output:

  "position=11 eof?=false lineno=1"
  "position=23 eof?=false lineno=2"
  "position=24 eof?=false lineno=3"
  "position=36 eof?=false lineno=4"
  "position=47 eof?=true lineno=5"

==== Line Options

A number of \IO methods accept optional keyword arguments
that determine how lines in a stream are to be treated:

- +:chomp+: If +true+, line separators are omitted; default is +false+.

=== Open and Closed \IO Streams

A new \IO stream may be open for reading, open for writing, or both.

You can close a stream using these methods:

- IO#close: Closes the stream for both reading and writing.
- IO#close_read (not available in \ARGF): Closes the stream for reading.
- IO#close_write (not available in \ARGF): Closes the stream for writing.

You can query whether a stream is closed using these methods:

- IO#closed?: Returns whether the stream is closed.

=== Stream End-of-File

You can query whether a stream is at end-of-file using this method:

- IO#eof? (also aliased as +#eof+):
  Returns whether the stream is at end-of-file.