summaryrefslogtreecommitdiff
path: root/sample/trick2015/ksk_2/remarks.markdown
blob: bb9b7057733bc737521418a48b0a08b48db2f0f6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
### Remarks

The program is run with a data file from the standard input, e.g.,
```shell
  ruby entry.rb < data
```
where ``<`` can be omitted.  The data file must be in the DIMACS CNF
format (see Description for detail).  It has been confirmed to be run on
```
  ruby 1.9.3p385 (2013-02-06 revision 39114) [x86_64-darwin11.4.2]
  ruby 2.0.0p481 (2014-05-08 revision 45883) [universal.x86_64-darwin13]
  ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-linux]
```
For particular inputs, the program works differently on these environments
(see Limitation).


### Description

The program is a very small SAT solver with 194 bytes making use of a
powerful feature of Regexp matching in Ruby.  It receives a data file
from the standard input in the DIMACS CNF that is a standard format
for inputs of SAT solvers.  For example, the text in the DIMACS CNF
format,
```
c
c This is a sample input file.
c
p cnf 3 5
 1 -2  3 0
-1  2 0
-2 -3 0
 1  2 -3 0
 1  3 0
```
corresponds to a propositional formula in conjunctive normal form,

  (L1      &or; &not;L2 &or; L3) &and;
  (&not;L1 &or;      L2) &and;
  (&not;L2 &or; &not;L3) &and;
  (L1      &or; L2      &or; &not;L3) &and;
  (L1      &or; L3).

In the DIMACS CNF format, the lines starting with ``c`` are comments
that are allowed only before the line ``p cnf ...``.  The line ``p cnf
3 5`` represents that the problem is given in conjunctive normal form
with 3 variables (L1,L2,and L3) and 5 clauses.  A clause is given by a
sequence of the indices of positive literals and the negative indices
of negative literals.  Each clause is terminated by ``0``.  For the
input above, the program outputs
```
s SATISFIABLE
v 1 2 -3
```
because the formula is satisfiable by L1=true, L2=true, and L3=false.
If an unsatisfiable formula is given, the program should output
```
s UNSATISFIABLE
```
This specification is common in most exiting SAT solvers and required
for entries of [SAT competition](http://www.satcompetition.org/).

The program is very small with no other external libraries thanks to
the wealth of string manipulations in Ruby.  It is much smaller than
existing small SAT solvers like [minisat](http://minisat.se/) and
[picosat](http://fmv.jku.at/picosat/)!


### Internals

The basic idea of the program is a translation from DIMACS CNF format
into Ruby.  For example, the data file above is translated into a
``Regexp`` matching expression equivalent to
```ruby
 '---=-' =~ 
 /(-?)(-?)(-?)-*=(?=\1$|-\2$|\3$|$)(?=-\1$|\2$|$)(?=-\2$|-\3$|$)(?=\1$|\2$|-\3$|$)(?=\1$|\3$|$)(?=)/
```
that returns ``MatchData`` if the formula is satisfiable and otherwise
returns ``nil``.  The beginning of regular expression
``(-?)(-?)(-?)-*=`` matches a string ``"---="`` so that each
capturing pattern ``(-?)`` matches either ``"-"`` or `""`, which
corresponds to an assignment of true or false, respectively, for a
propositional variable.  Each clause is translated into positive
lookahead assertion like ``(?=\1$|-\2$|\3$|$)`` that matches 
``"-"`` only when ``\1`` holds ``"-"``, ``\2`` holds ``""``, or ``\3``
holds ``"-"``.  This exactly corresponds to the condition for
L1&or;&not;L2&or;L3 to be true.  The last case ``|$`` never matches
``"-"`` but it is required for making the translation simple.
The last meaningless positive lookahead assertion ``(?=)`` is added
for a similar reason.  This translation is based on
[Abigail's idea](http://perl.plover.com/NPC/NPC-3SAT.html) where a
3SAT formula is translated into a similar Perl regular expression.
The differences are the submitted Ruby program translates directly
from the DIMACS CNF format and tries to make the code shorter by using
lookahead assertion which can also make matching more faster.

Thanks to the ``x`` option for regular expression, the input above is
simply translated into
```ruby
  ?-*3+'=-'=~/#{'(-?)'*3}-*=(?=
   \1$| -\2$|  \3$| $)(?=
  -\1$|  \2$| $)(?=
  -\2$| -\3$| $)(?=
   \1$|  \2$| -\3$| $)(?=
   \1$|  \3$| $)(?=
  )/x
```
which has a structure similar to the DIMACS CNF format.

The part of formatting outputs in the program is obfuscated as an
inevitable result of 'golfing' the original program
```ruby
   if ...the matching expression above... then
     puts 's SATISFIABLE'
     puts 'v '+$~[1..-1].map.with_index{|x,i|
       if x == '-' then
         i+1
       else
         ['-',i+1].join
       end
     }.join(' ')
   else
     puts 's UNSATISFIABLE'
   end
```
In the satisfiable case, the MatchData ``$~`` obtained by the regular expression
has the form of
```
  #<MatchData "---=" 1:"-" 2:"-" 3:"">
```
which should be translated into a string ``1 2 -3``.  The golfed code simply
does it by `eval(x+?1)*i-=1` where ``x`` is matched string ``"x"`` or ``""``
and ``i`` be a negated index.


### Data files

The submission includes some input files in the DIMACS CNF format for
testing the program.

* [sample.cnf](sample.cnf) : an example shown above.

* [unsat.cnf](unsat.cnf) : an example of an unsatisfiable formula.

* [quinn.cnf](quinn.cnf) : an example from Quinn's text, 16 variables and 18 clauses
   (available from [http://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html])

* [abnormal.cnf](abnormal.cnf) : an example from [the unofficial manual of the DIMACS challenge](http://www.domagoj-babic.com/uploads/ResearchProjects/Spear/dimacs-cnf.pdf)
  where a single clause may be on multiple lines.

* [uf20-01.cnf](uf20-01.cnf) : an example, with 20 variables and 91 clauses, from [SATLIB benchmark suite](http://www.cs.ubc.ca/~hoos/SATLIB/benchm.html).  The last two lines are removed from the original because they are illegal in the DIMACS CNF format (all examples in 'Uniform Random-3-SAT' of the linked page need this modification).


### Limitation

The program may not work when the number of variables exceeds 99
because ``\nnn`` in regular expression with number ``nnn`` does not
always represent backreference but octal notation of characters.  For
example,
```ruby
  /#{"(x)"*999}:\502/=~"x"*999+":x"
  /#{"(x)"*999}:\661/=~"x"*999+":x"
  /#{"(x)"*999}:\775/=~"x"*999+":x"
```
fail due to the syntax error (invalid escape), while
```ruby
  /#{"(x)"*999}:\508/=~"x"*999+":x"
  /#{"(x)"*999}:\691/=~"x"*999+":x"
  /#{"(x)"*999}:\785/=~"x"*999+":x"
```
succeed (to return 0) because 508, 691, and 785 are not in octal notation.
Since Ruby 1.9.3 incorrectly returns ``nil`` instead of terminating
with the error for
```ruby
  /#{"(x)"*999}:\201/=~"x"*999+":x"
  /#{"(x)"*999}:\325/=~"x"*999+":x"
```
the present SAT solver may unexpectedly return "UNSATISFIABLE" even
for satisfiable inputs.  This happens when the number is in octal
notation starting with either 2 or 3.

In the case of the number starting with 1, the code like the above
does work on all versions of Ruby I tried.  For example,
```ruby
  /#{"(x)"*999}:\101/=~"x"*999+":x"
  /#{"(x)"*999}:\177/=~"x"*999+":x"
```
succeed (to return 0).  Interestingly, 
```ruby
  /#{"(x)"*999}:\101/=~"x"*999+":\101"
  /#{"(x)"*999}:\177/=~"x"*999+":\177"
```
return ``nil``, while
```ruby
  /:\101/=~":\101"
  /:\177/=~":\177"
```
succeed to return 0.  The meaning of ``\1nn`` in regular expression
seems to depend on the existence of capturing expressions.

In spite of these Ruby's behaviors, we have a good news!  The present
SAT sover does not suffer from the issues because the program cannot
return solutions in practical time for inputs with variables more than
40.