summaryrefslogtreecommitdiff
path: root/doc/csv/recipes/parsing.rdoc
blob: 40feeef151f4a633dd0b416f9f013a1579a473e3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
== Recipes for Parsing \CSV

For other recipes, see {Recipes for CSV}[./recipes_rdoc.html].

All code snippets on this page assume that the following has been executed:
  require 'csv'

=== Contents

- {Source Formats}[#label-Source+Formats]
  - {Parsing from a String}[#label-Parsing+from+a+String]
    - {Recipe: Parse from String with Headers}[#label-Recipe-3A+Parse+from+String+with+Headers]
    - {Recipe: Parse from String Without Headers}[#label-Recipe-3A+Parse+from+String+Without+Headers]
  - {Parsing from a File}[#label-Parsing+from+a+File]
    - {Recipe: Parse from File with Headers}[#label-Recipe-3A+Parse+from+File+with+Headers]
    - {Recipe: Parse from File Without Headers}[#label-Recipe-3A+Parse+from+File+Without+Headers]
  - {Parsing from an IO Stream}[#label-Parsing+from+an+IO+Stream]
    - {Recipe: Parse from IO Stream with Headers}[#label-Recipe-3A+Parse+from+IO+Stream+with+Headers]
    - {Recipe: Parse from IO Stream Without Headers}[#label-Recipe-3A+Parse+from+IO+Stream+Without+Headers]
- {Converting Fields}[#label-Converting+Fields]
  - {Converting Fields to Objects}[#label-Converting+Fields+to+Objects]
    - {Recipe: Convert Fields to Integers}[#label-Recipe-3A+Convert+Fields+to+Integers]
    - {Recipe: Convert Fields to Floats}[#label-Recipe-3A+Convert+Fields+to+Floats]
    - {Recipe: Convert Fields to Numerics}[#label-Recipe-3A+Convert+Fields+to+Numerics]
    - {Recipe: Convert Fields to Dates}[#label-Recipe-3A+Convert+Fields+to+Dates]
    - {Recipe: Convert Fields to DateTimes}[#label-Recipe-3A+Convert+Fields+to+DateTimes]
    - {Recipe: Convert Assorted Fields to Objects}[#label-Recipe-3A+Convert+Assorted+Fields+to+Objects]
    - {Recipe: Convert Fields to Other Objects}[#label-Recipe-3A+Convert+Fields+to+Other+Objects]
  - {Recipe: Filter Field Strings}[#label-Recipe-3A+Filter+Field+Strings]
  - {Recipe: Register Field Converters}[#label-Recipe-3A+Register+Field+Converters]
  - {Using Multiple Field Converters}[#label-Using+Multiple+Field+Converters]
    - {Recipe: Specify Multiple Field Converters in Option :converters}[#label-Recipe-3A+Specify+Multiple+Field+Converters+in+Option+-3Aconverters]
    - {Recipe: Specify Multiple Field Converters in a Custom Converter List}[#label-Recipe-3A+Specify+Multiple+Field+Converters+in+a+Custom+Converter+List]
- {Converting Headers}[#label-Converting+Headers]
  - {Recipe: Convert Headers to Lowercase}[#label-Recipe-3A+Convert+Headers+to+Lowercase]
  - {Recipe: Convert Headers to Symbols}[#label-Recipe-3A+Convert+Headers+to+Symbols]
  - {Recipe: Filter Header Strings}[#label-Recipe-3A+Filter+Header+Strings]
  - {Recipe: Register Header Converters}[#label-Recipe-3A+Register+Header+Converters]
  - {Using Multiple Header Converters}[#label-Using+Multiple+Header+Converters]
    - {Recipe: Specify Multiple Header Converters in Option :header_converters}[#label-Recipe-3A+Specify+Multiple+Header+Converters+in+Option+-3Aheader_converters]
    - {Recipe: Specify Multiple Header Converters in a Custom Header Converter List}[#label-Recipe-3A+Specify+Multiple+Header+Converters+in+a+Custom+Header+Converter+List]

=== Source Formats

You can parse \CSV data from a \String, from a \File (via its path), or from an \IO stream.

==== Parsing from a \String

You can parse \CSV data from a \String, with or without headers.

===== Recipe: Parse from \String with Headers

Use class method CSV.parse with option +headers+ to read a source \String all at once
(may have memory resource implications):
  string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
  CSV.parse(string, headers: true) # => #<CSV::Table mode:col_or_row row_count:4>

Use instance method CSV#each with option +headers+ to read a source \String one row at a time:
  CSV.new(string, headers: true).each do |row|
    p row
  end
Ouput:
  #<CSV::Row "Name":"foo" "Value":"0">
  #<CSV::Row "Name":"bar" "Value":"1">
  #<CSV::Row "Name":"baz" "Value":"2">

===== Recipe: Parse from \String Without Headers

Use class method CSV.parse without option +headers+ to read a source \String all at once
(may have memory resource implications):
  string = "foo,0\nbar,1\nbaz,2\n"
  CSV.parse(string) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]

Use instance method CSV#each without option +headers+ to read a source \String one row at a time:
  CSV.new(string).each do |row|
    p row
  end
Output:
  ["foo", "0"]
  ["bar", "1"]
  ["baz", "2"]

==== Parsing from a \File

You can parse \CSV data from a \File, with or without headers.

===== Recipe: Parse from \File with Headers

Use instance method CSV#read with option +headers+ to read a file all at once:
  string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
  path = 't.csv'
  File.write(path, string)
  CSV.read(path, headers: true) # => #<CSV::Table mode:col_or_row row_count:4>

Use class method CSV.foreach with option +headers+ to read one row at a time:
  CSV.foreach(path, headers: true) do |row|
    p row
  end
Output:
  #<CSV::Row "Name":"foo" "Value":"0">
  #<CSV::Row "Name":"bar" "Value":"1">
  #<CSV::Row "Name":"baz" "Value":"2">

===== Recipe: Parse from \File Without Headers

Use class method CSV.read without option +headers+ to read a file all at once:
  string = "foo,0\nbar,1\nbaz,2\n"
  path = 't.csv'
  File.write(path, string)
  CSV.read(path) # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]

Use class method CSV.foreach without option +headers+ to read one row at a time:
  CSV.foreach(path) do |row|
    p row
  end
Output:
  ["foo", "0"]
  ["bar", "1"]
  ["baz", "2"]

==== Parsing from an \IO Stream

You can parse \CSV data from an \IO stream, with or without headers.

===== Recipe: Parse from \IO Stream with Headers

Use class method CSV.parse with option +headers+ to read an \IO stream all at once:
  string = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
  path = 't.csv'
  File.write(path, string)
  File.open(path) do |file|
    CSV.parse(file, headers: true)
  end # => #<CSV::Table mode:col_or_row row_count:4>

Use class method CSV.foreach with option +headers+ to read one row at a time:
  File.open(path) do |file|
    CSV.foreach(file, headers: true) do |row|
      p row
    end
  end
Output:
  #<CSV::Row "Name":"foo" "Value":"0">
  #<CSV::Row "Name":"bar" "Value":"1">
  #<CSV::Row "Name":"baz" "Value":"2">

===== Recipe: Parse from \IO Stream Without Headers

Use class method CSV.parse without option +headers+ to read an \IO stream all at once:
  string = "foo,0\nbar,1\nbaz,2\n"
  path = 't.csv'
  File.write(path, string)
  File.open(path) do |file|
    CSV.parse(file)
  end # => [["foo", "0"], ["bar", "1"], ["baz", "2"]]

Use class method CSV.foreach without option +headers+ to read one row at a time:
  File.open(path) do |file|
    CSV.foreach(file) do |row|
      p row
    end
  end
Output:
  ["foo", "0"]
  ["bar", "1"]
  ["baz", "2"]

=== Converting Fields

You can use field converters to change parsed \String fields into other objects,
or to otherwise modify the \String fields.

==== Converting Fields to Objects

Use field converters to change parsed \String objects into other, more specific, objects.

There are built-in field converters for converting to objects of certain classes:
- \Float
- \Integer
- \Date
- \DateTime

Other built-in field converters include:
- <tt>:numeric</tt>: converts to \Integer and \Float.
- <tt>:all</tt>: converts to \DateTime, \Integer, \Float.

You can also define field converters to convert to objects of other classes.

===== Recipe: Convert Fields to Integers

Convert fields to \Integer objects using built-in converter <tt>:integer</tt>:
  source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
  parsed = CSV.parse(source, headers: true, converters: :integer)
  parsed.map {|row| row['Value'].class} # => [Integer, Integer, Integer]

===== Recipe: Convert Fields to Floats

Convert fields to \Float objects using built-in converter <tt>:float</tt>:
  source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
  parsed = CSV.parse(source, headers: true, converters: :float)
  parsed.map {|row| row['Value'].class} # => [Float, Float, Float]

===== Recipe: Convert Fields to Numerics

Convert fields to \Integer and \Float objects using built-in converter <tt>:numeric</tt>:
  source = "Name,Value\nfoo,0\nbar,1.1\nbaz,2.2\n"
  parsed = CSV.parse(source, headers: true, converters: :numeric)
  parsed.map {|row| row['Value'].class} # => [Integer, Float, Float]

===== Recipe: Convert Fields to Dates

Convert fields to \Date objects using built-in converter <tt>:date</tt>:
  source = "Name,Date\nfoo,2001-02-03\nbar,2001-02-04\nbaz,2001-02-03\n"
  parsed = CSV.parse(source, headers: true, converters: :date)
  parsed.map {|row| row['Date'].class} # => [Date, Date, Date]

===== Recipe: Convert Fields to DateTimes

Convert fields to \DateTime objects using built-in converter <tt>:date_time</tt>:
  source = "Name,DateTime\nfoo,2001-02-03\nbar,2001-02-04\nbaz,2020-05-07T14:59:00-05:00\n"
  parsed = CSV.parse(source, headers: true, converters: :date_time)
  parsed.map {|row| row['DateTime'].class} # => [DateTime, DateTime, DateTime]

===== Recipe: Convert Assorted Fields to Objects

Convert assorted fields to objects using built-in converter <tt>:all</tt>:
  source = "Type,Value\nInteger,0\nFloat,1.0\nDateTime,2001-02-04\n"
  parsed = CSV.parse(source, headers: true, converters: :all)
  parsed.map {|row| row['Value'].class} # => [Integer, Float, DateTime]

===== Recipe: Convert Fields to Other Objects

Define a custom field converter to convert \String fields into other objects.
This example defines and uses a custom field converter
that converts each column-1 value to a \Rational object:
  rational_converter = proc do |field, field_context|
    field_context.index == 1 ? field.to_r : field
  end
  source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
  parsed = CSV.parse(source, headers: true, converters: rational_converter)
  parsed.map {|row| row['Value'].class} # => [Rational, Rational, Rational]

==== Recipe: Filter Field Strings

Define a custom field converter to modify \String fields.
This example defines and uses a custom field converter
that strips whitespace from each field value:
  strip_converter = proc {|field| field.strip }
  source = "Name,Value\n foo , 0 \n bar , 1 \n baz , 2 \n"
  parsed = CSV.parse(source, headers: true, converters: strip_converter)
  parsed['Name'] # => ["foo", "bar", "baz"]
  parsed['Value'] # => ["0", "1", "2"]

==== Recipe: Register Field Converters

Register a custom field converter, assigning it a name;
then refer to the converter by its name:
  rational_converter = proc do |field, field_context|
    field_context.index == 1 ? field.to_r : field
  end
  CSV::Converters[:rational] = rational_converter
  source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
  parsed = CSV.parse(source, headers: true, converters: :rational)
  parsed['Value'] # => [(0/1), (1/1), (2/1)]

==== Using Multiple Field Converters

You can use multiple field converters in either of these ways:
- Specify converters in option <tt>:converters</tt>.
- Specify converters in a custom converter list.

===== Recipe: Specify Multiple Field Converters in Option <tt>:converters</tt>

Apply multiple field converters by specifying them in option <tt>:conveters</tt>:
  source = "Name,Value\nfoo,0\nbar,1.0\nbaz,2.0\n"
  parsed = CSV.parse(source, headers: true, converters: [:integer, :float])
  parsed['Value'] # => [0, 1.0, 2.0]

===== Recipe: Specify Multiple Field Converters in a Custom Converter List

Apply multiple field converters by defining and registering a custom converter list:
  strip_converter = proc {|field| field.strip }
  CSV::Converters[:strip] = strip_converter
  CSV::Converters[:my_converters] = [:integer, :float, :strip]
  source = "Name,Value\n foo , 0 \n bar , 1.0 \n baz , 2.0 \n"
  parsed = CSV.parse(source, headers: true, converters: :my_converters)
  parsed['Name'] # => ["foo", "bar", "baz"]
  parsed['Value'] # => [0, 1.0, 2.0]

=== Converting Headers

You can use header converters to modify parsed \String headers.

Built-in header converters include:
- <tt>:symbol</tt>: converts \String header to \Symbol.
- <tt>:downcase</tt>: converts \String header to lowercase.

You can also define header converters to otherwise modify header \Strings.

==== Recipe: Convert Headers to Lowercase

Convert headers to lowercase using built-in converter <tt>:downcase</tt>:
  source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
  parsed = CSV.parse(source, headers: true, header_converters: :downcase)
  parsed.headers # => ["name", "value"]

==== Recipe: Convert Headers to Symbols

Convert headers to downcased Symbols using built-in converter <tt>:symbol</tt>:
  source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
  parsed = CSV.parse(source, headers: true, header_converters: :symbol)
  parsed.headers # => [:name, :value]
  parsed.headers.map {|header| header.class} # => [Symbol, Symbol]

==== Recipe: Filter Header Strings

Define a custom header converter to modify \String fields.
This example defines and uses a custom header converter
that capitalizes each header \String:
  capitalize_converter = proc {|header| header.capitalize }
  source = "NAME,VALUE\nfoo,0\nbar,1\nbaz,2\n"
  parsed = CSV.parse(source, headers: true, header_converters: capitalize_converter)
  parsed.headers # => ["Name", "Value"]

==== Recipe: Register Header Converters

Register a custom header converter, assigning it a name;
then refer to the converter by its name:
  capitalize_converter = proc {|header| header.capitalize }
  CSV::HeaderConverters[:capitalize] = capitalize_converter
  source = "NAME,VALUE\nfoo,0\nbar,1\nbaz,2\n"
  parsed = CSV.parse(source, headers: true, header_converters: :capitalize)
  parsed.headers # => ["Name", "Value"]

==== Using Multiple Header Converters

You can use multiple header converters in either of these ways:
- Specify header converters in option <tt>:header_converters</tt>.
- Specify header converters in a custom header converter list.

===== Recipe: Specify Multiple Header Converters in Option :header_converters

Apply multiple header converters by specifying them in option <tt>:header_conveters</tt>:
  source = "Name,Value\nfoo,0\nbar,1.0\nbaz,2.0\n"
  parsed = CSV.parse(source, headers: true, header_converters: [:downcase, :symbol])
  parsed.headers # => [:name, :value]

===== Recipe: Specify Multiple Header Converters in a Custom Header Converter List

Apply multiple header converters by defining and registering a custom header converter list:
  CSV::HeaderConverters[:my_header_converters] = [:symbol, :downcase]
  source = "NAME,VALUE\nfoo,0\nbar,1.0\nbaz,2.0\n"
  parsed = CSV.parse(source, headers: true, header_converters: :my_header_converters)
  parsed.headers # => [:name, :value]