Sunday, January 10, 2016

Sudoku by SQL Part 2 of 3

The Challenge

Since I came across the first sudoku I was wondering if it could by solved by SQL using queries that select just the right value for a cell.

Well, some day I tried, here is what I have got now.

Previous Part

In part one I introduced tables, views and triggers. In this part comes the logic.

Rules

Single Candidate

CREATE PROCEDURE ApplyRuleSingleCandidate (@rows SMALLINT OUTPUT) AS
BEGIN
  -- if there is only one candidate for a cell, the cell has this value
  INSERT INTO Cell (CellID, [Rule], Value)
  SELECT t.CellID, t.[Rule], t.Value
  FROM (
    SELECT c.CellID, 'Single Candidate' AS [Rule], MAX(c.Value) AS Value
    FROM Candidate c
    GROUP BY c.CellID
    HAVING COUNT(*)=1
  ) t

  SET @rows=@@ROWCOUNT

  IF EXISTS(SELECT 2 FROM Violation)
   THROW 50001, 'Invalid value in cells', 1

END;
The main point is COUNT(*)=1. There is only one candidate for a cell. As aggregation I choose MAX, as there is only one row, MIN would do the same. For traceability I use the string 'Single Candidate' as the rule. After inserting I check if any violation exists.

How can violations happen?

If we operate pure rule based, no violation should happen. But if we implement guessing we have to check if a guess leads us to a dead end.

Single Token In Group

CREATE PROCEDURE ApplyRuleSingleTokenInGroup (@rows SMALLINT OUTPUT) AS
BEGIN
  -- if, for a given value, there is only one cell in a row, column or square as a candidate for this value, it has to be there
  INSERT INTO Cell (CellID, [Rule], Value)
  SELECT CellID, [Rule], Value
  FROM (
    SELECT MAX(c.CellID) AS CellID
    , 'Single Token in ' + CASE WHEN l.GroupType='C' THEN 'Column' WHEN l.GroupType='R' then 'Row' ELSE 'Square' END AS [Rule]
    , c.Value
    , ROW_NUMBER() OVER(PARTITION BY MAX(c.CellID) ORDER BY l.GroupType) AS RowNumber
    FROM Candidate c
    INNER JOIN Location l ON c.CellID=l.CellID
    GROUP BY l.GroupType, l.GroupValue, c.Value
    HAVING COUNT(*)=1
    ) t
  WHERE RowNumber=1 -- we need this to avoid duplicate inserts due to more than one rule that applies

  SET @rows=@@ROWCOUNT

  IF EXISTS(SELECT 2 FROM Violation)
   THROW 50001, 'Invalid value in cells', 1
END;

Again, the main point ist COUNT(*)=1. But grouping is now by value and unit. Unit means row, column or square. The encapsulation in a subquery and restriction to RowNumber=1 is only to avoid duplicate key violation if a value unique in more then one unit, e.g. a row and a column.

Eliminating candidates

The two rules above are all rules that give us a value for a cell that I can think of. If you can think of another, let me know.
Further progress in solving relies on rules that eliminate candidates.

Complete Match

CREATE PROCEDURE RemoveCandidateCompleteMatch (@rows SMALLINT OUTPUT) AS
BEGIN
  DELETE c
  FROM Candidate c
  INNER JOIN CandidateMask m ON c.CellID=m.CellID
  INNER JOIN Location l on m.CellID=l.CellID
  INNER JOIN (
    SELECT l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    FROM SetAll s
    INNER JOIN CandidateMask c1 ON s.SetValue = c1.Mask
    INNER JOIN Location l1 ON c1.CellID=l1.CellID
    WHERE s.Size NOT IN (1, 9)
    GROUP BY l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    HAVING COUNT(*) = s.Size
  ) h ON h.GroupType=l.GroupType AND h.GroupValue=l.GroupValue
  WHERE m.Mask != h.SetValue
  AND c.Mask & h.SetValue != 0

  SET @rows=@@ROWCOUNT

END;

If we find a set of candidate values, lets say a set with two members, in a unit exactly two times, the values of this set cant be anywhere else in the unit.
Example: candidate set 1,3 in row 4 two times => 1 and 3 can be removed as candidates from all other cells in row 4.

Partial Match

CREATE PROCEDURE RemoveCandidatePartialSet (@rows SMALLINT OUTPUT) AS
BEGIN
  DELETE c
  FROM Candidate c
  INNER JOIN CandidateMask m on c.CellID=m.CellID
  INNER JOIN Location l on m.CellID=l.CellID
  INNER JOIN (
    SELECT l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    FROM SetAll s
    INNER JOIN CandidateMask c1 ON s.SetValue & c1.Mask = c1.Mask AND ~s.SetValue & c1.Mask = 0
    INNER JOIN Location l1 ON c1.CellID=l1.CellID
    WHERE s.Size NOT IN (1, 9)
    GROUP BY l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    HAVING COUNT(*) = s.Size
  ) h ON h.GroupType=l.GroupType AND h.GroupValue=l.GroupValue
  WHERE ~h.SetValue & m.Mask != 0 AND c.Mask & h.SetValue != 0

  SET @rows=@@ROWCOUNT
END;
 If a part of a candidate set, lets say with 3 members, are the only candidate members for 3 cells in a unit, we can remove the members of the set from all other candidate values for the unit.
Example: 1,3 is in cell 11, 1,4 is in cell 13 and 3,4 is in cell 22. Our set is 1,3,4, our square is 1, we can remove the values 1,3 and 4 from all candidates from square 1.

Same Row Or Column And Square  

CREATE PROCEDURE RemoveCandidateValueInSameRowOrColumnAndSquare (@rows SMALLINT OUTPUT) AS
BEGIN
  DELETE FROM c
  FROM Candidate c
  INNER  JOIN Location l ON c.CellID=l.CellID
  INNER  JOIN LocationWide s ON c.CellID=s.CellID
  INNER JOIN (
    SELECT c.Value, ls.[Square], lrc.GroupType, MAX(lrc.GroupValue) as GroupValue
    FROM Candidate c
    INNER JOIN LocationWide ls ON c.CellID=ls.CellID
    INNER JOIN Location lrc ON c.CellID=lrc.CellID AND lrc.GroupType IN ('C','R')
    GROUP BY c.Value, ls.[Square], lrc.GroupType
    HAVING COUNT(DISTINCT ls.[Square])=1 AND COUNT(DISTINCT lrc.GroupValue)=1
  ) a ON a.Value=c.Value AND l.GroupType=a.GroupType AND l.GroupValue=a.GroupValue
  WHERE s.[Square] != a.[Square]

  SET @rows=@@ROWCOUNT

END;

If a candidate exists within a square multiple times, but only in one row or column, the candidate value can be removed for cells in this row but in another square.
Example: candidate value 3 exists in cell 12 and in cell 13 but nowhere else in square 1. candidate value 3 can be removed for cells 14, 15, 16, 17, 18, 19.

Executing Rules

Now we can execute the rules until they do not affect any rows.
  DECLARE @cells INT
  DECLARE @affected INT
  SET @affected=0
  SET @cells=1
  WHILE @cells>0
  BEGIN
    WHILE @cells>0
    BEGIN
      EXEC ApplyRuleSingleCandidate @cells OUTPUT
      print SPACE(@@nestlevel) + 'Iteration ' + CAST(@iteration AS VARCHAR(5)) + ' ApplyRuleSingleCandidate solves ' + CAST(@cells AS VARCHAR(5)) + ' cells.'
     
      EXEC ApplyRuleSingleTokenInGroup @affected OUTPUT
      print SPACE(@@nestlevel) + 'Iteration ' + CAST(@iteration AS VARCHAR(5)) + ' ApplyRuleSingleTokenInGroup solves ' + CAST(@affected AS VARCHAR(5)) + ' cells.'
      SET @cells=@cells+@affected

      SET @iteration=@iteration+1
    END

    -- execution of rules that give values directly ends here
    -- now we apply rules that remove candidates
    EXEC RemoveCandidateCompleteMatch @cells OUTPUT
    PRINT SPACE(@@nestlevel) + 'Removed ' + CAST(@cells AS VARCHAR(10)) + ' candidates due to complete match in same group'

    EXEC RemoveCandidatePartialSet @affected OUTPUT
    PRINT SPACE(@@nestlevel) + 'Removed ' + CAST(@affected AS VARCHAR(10)) + ' candidates due to exclusive partial sets in same group'
    SET @cells=@cells+@affected

    EXEC RemoveCandidateValueInSameRowOrColumnAndSquare @affected OUTPUT
    PRINT SPACE(@@nestlevel) + 'Removed ' + CAST(@affected AS VARCHAR(10)) + ' candidates due to candidate values in a square in only one row or column'
    SET @cells=@cells+@affected

  END
 So that is all we can do right now. If you know more rules, let me know.

Guessing

But I found some sudokus that could not be solved be these rules. I was more curious about implementing guessing than implementing another rule. With guessing, I wanted to make the process as following: 
  1. Try solving by rules
  2. If that does not work out, make up a list of choices. Take the cell with most candidates, fill in the first choice. Try step 1. If it does not solve the sudoku, try the next choice.
You see, it is going to be recursive. Here is the code:
  DECLARE @guess TABLE (CellID SMALLINT, Value SMALLINT)
  INSERT INTO @guess (CellID, Value)
  SELECT c.CellID, c.Value
  FROM Candidate c
  WHERE c.CellID = (SELECT TOP 1 cm.CellID
                    FROM CandidateMask cm
                    INNER JOIN SetAll s ON s.SetValue=cm.Mask
                    ORDER BY s.Size DESC, cm.CellID )
  ORDER BY c.Value

  DECLARE @cellID SMALLINT, @value SMALLINT, @count SMALLINT
  WHILE (SELECT COUNT(*) FROM Cell)<81
  BEGIN
    SET @count=(SELECT COUNT(*) FROM @guess)
    IF (@count=0)
      THROW 50002, 'Error in guessing, all candidates used but not solved.', 1
    IF NOT EXISTS(SELECT 2 FROM Candidate)
      THROW 50002, 'Error in guessing, no candidates.', 1
    SELECT TOP 1 @cellID=g.CellID, @value=g.Value
    FROM @guess g
    ORDER BY g.CellID, g.Value
    DELETE FROM @guess
    WHERE CellID=@cellID AND Value=@value
    PRINT SPACE(@@nestlevel) + 'nestlevel ' + CAST(@@NESTLEVEL as VARCHAR(10)) +': guessing value ' + CAST(@value as CHAR(1)) + ' in cell ' + CAST(@cellID as VARCHAR(5))
    BEGIN TRY
      SAVE TRANSACTION Guessing
      INSERT INTO Cell(CellID, Value, [Rule])
      VALUES (@cellID, @value, 'Guess')
      EXEC TrySolve @iteration
    END TRY
    BEGIN CATCH
      PRINT SPACE(@@nestlevel) + 'rolling back transaction due to ' + ERROR_MESSAGE()
      ROLLBACK TRANSACTION Guessing
      IF (ERROR_NUMBER() NOT IN (50001, 50002))
        THROW
    END CATCH
  END
END;

This code together with the code above make up the procedure TrySolve, with is called recursively at EXEC TrySolve @iteration.

Next Part

In the next part I will introduce some convenience features for entering start data and executing the solver.

Saturday, January 9, 2016

Sudoku by SQL Part 3 of 3

The Challenge

Since I came across the first sudoku I was wondering if it could by solved by SQL using queries that select just the right value for a cell.

Well, some day I tried, here is what I have got now.

Previous Parts

In part one tables und views are described.
In part two I introduce rules and the solving procedure.
Here you will see some convenience features and conclusion.
And of course the source, created for Microsoft SQL Server, tested with version 2012.

Playing

For starting and solving a game there are some stored procedures for convenience. With game I mean the "start position", together with a name and description.

Creating a Game

declare @gameID int
exec dbo.CreateGame 'Name of Game', 'Some Description','
..38....4
.9..1..8.
2....73..
7....18..
.8..9..7.
..47....3
..12....9
.6..3..4.
3....52..', @gameID output
select @gameID
For the first test I used a sudoku created by my Android sudoku app, level "extreme".

Starting

exec StartGame 1
With this procedure the tables Cell, Candidate and CandidateMask are initialized with the start position of game 1.

Viewing 

You can have a look at the currently known cells with
select * from Board
That gives you for the game above:

Row
C1
C2
C3
C4
C5
C6
C7
C8
C9
1
NULL
NULL
3
8
NULL
NULL
NULL
NULL
4
2
NULL
9
NULL
NULL
1
NULL
NULL
8
NULL
3
2
NULL
NULL
NULL
NULL
7
3
NULL
NULL
4
7
NULL
NULL
NULL
NULL
1
8
NULL
NULL
5
NULL
8
NULL
NULL
9
NULL
NULL
7
NULL
6
NULL
NULL
4
7
NULL
NULL
NULL
NULL
3
7
NULL
NULL
1
2
NULL
NULL
NULL
NULL
9
8
NULL
6
NULL
NULL
3
NULL
NULL
4
NULL
9
3
NULL
NULL
NULL
NULL
5
2
NULL
NULL


If you want to see the candidates, use
select * from CandidateMaskDisplay
 That gives you:

Row
C1
C2
C3
C4
C5
C6
C7
C8
C9
1
1...56...
1...5.7..
NULL
NULL
.2..56...
.2...6..9
1...567.9
12..56..9
NULL
2
...456...
NULL
....567..
..3456...
NULL
.234.6...
....567..
NULL
.2..567..
3
NULL
1..45....
....56.8.
...456..9
...456...
NULL
NULL
1...56..9
1...56...
4
NULL
.23.5....
.2..56..9
..3456...
.2.456...
NULL
NULL
.2..56..9
.2..56...
5
1...56...
NULL
.2..56...
..3456...
NULL
.234.6...
1..456...
NULL
12..56...
6
1...56..9
12..5....
NULL
NULL
.2..56.8.
.2...6.8.
1...56..9
12..56..9
NULL
7
...45..8.
...45.7..
NULL
NULL
...4.678.
...4.6.8.
....567..
..3.56...
NULL
8
....5..89
NULL
.2..5.789
1.......9
NULL
.......89
1...5.7..
NULL
1...5.78.
9
NULL
...4..7..
......789
1..4.6..9
...4.678.
NULL
NULL
1....6...
1....678.

Solving

set nocount on
exec Solve
For better readability of output I set nocount on.
Here is the solution:
Row
C1
C2
C3
C4
C5
C6
C7
C8
C9
1
5
7
3
8
2
9
1
6
4
2
4
9
6
5
1
3
7
8
2
3
2
1
8
6
4
7
3
9
5
4
7
3
9
4
5
1
8
2
6
5
6
8
5
3
9
2
4
7
1
6
1
2
4
7
8
6
9
5
3
7
8
5
1
2
7
4
6
3
9
8
9
6
2
1
3
8
5
4
7
9
3
4
7
9
6
5
2
1
8

Output from PRINT:
  Iteration 1 ApplyRuleSingleCandidate solves 0 cells.
  Iteration 1 ApplyRuleSingleTokenInGroup solves 5 cells.
  Iteration 2 ApplyRuleSingleCandidate solves 0 cells.
  Iteration 2 ApplyRuleSingleTokenInGroup solves 1 cells.
  Iteration 3 ApplyRuleSingleCandidate solves 0 cells.
  Iteration 3 ApplyRuleSingleTokenInGroup solves 0 cells.
  Removed 0 candidates due to complete match in same group
  Removed 0 candidates due to exclusive partial sets in same group
  Removed 1 candidates due to candidate values in a square in only one row or column
  Iteration 4 ApplyRuleSingleCandidate solves 0 cells.
  Iteration 4 ApplyRuleSingleTokenInGroup solves 0 cells.
  Removed 0 candidates due to complete match in same group
  Removed 0 candidates due to exclusive partial sets in same group
  Removed 0 candidates due to candidate values in a square in only one row or column
  nestlevel 2: guessing value 1 in cell 17
   Iteration 5 ApplyRuleSingleCandidate solves 0 cells.
   Iteration 5 ApplyRuleSingleTokenInGroup solves 3 cells.
   Iteration 6 ApplyRuleSingleCandidate solves 1 cells.
   Iteration 6 ApplyRuleSingleTokenInGroup solves 5 cells.
   Iteration 7 ApplyRuleSingleCandidate solves 3 cells.
   Iteration 7 ApplyRuleSingleTokenInGroup solves 6 cells.
   Iteration 8 ApplyRuleSingleCandidate solves 3 cells.
   Iteration 8 ApplyRuleSingleTokenInGroup solves 3 cells.
   Iteration 9 ApplyRuleSingleCandidate solves 1 cells.
   Iteration 9 ApplyRuleSingleTokenInGroup solves 2 cells.
   Iteration 10 ApplyRuleSingleCandidate solves 1 cells.
   Iteration 10 ApplyRuleSingleTokenInGroup solves 3 cells.
   Iteration 11 ApplyRuleSingleCandidate solves 2 cells.
   Iteration 11 ApplyRuleSingleTokenInGroup solves 7 cells.
   Iteration 12 ApplyRuleSingleCandidate solves 4 cells.
   Iteration 12 ApplyRuleSingleTokenInGroup solves 3 cells.
   Iteration 13 ApplyRuleSingleCandidate solves 0 cells.
   Iteration 13 ApplyRuleSingleTokenInGroup solves 0 cells.
   Removed 0 candidates due to complete match in same group
   Removed 0 candidates due to exclusive partial sets in same group
   Removed 0 candidates due to candidate values in a square in only one row or column
Solved!

Harder Than Hardest

Inspired by this success, I searched the internet for the hardest sudoku. The hardest I found is solved with a guessing depth of 6. Sudoku should not need any guessing, so obviously there is at least one rule missing.
I can describe one: a set in exactly two rows and two squares forbids these values as candidates in the third square in these two rows. But I did not implement it (yet).

What I was curious about is how the solving process would react on a complete empty starting position. That is not solvable unambiguously, of course, but it is the ultimate edge case. Unfortunately trying this I run into the limit of maximum allowed nestlevel (recursiveness) of stored procedures, which is 32.

Conclusion

I like playing sudoku, but writing a sudoku solver is even more fun for me.

Throwing an exception in a trigger causes the transaction to be in an uncommitable state. That's why I check the state after inserting values into table Cell in the stored procedure, not in the trigger.

SQL Server does not really have nested transactions, but savepoints did the job.

Feel free to add a comment!