Sunday, January 10, 2016

Sudoku by SQL Part 2 of 3

The Challenge

Since I came across the first sudoku I was wondering if it could by solved by SQL using queries that select just the right value for a cell.

Well, some day I tried, here is what I have got now.

Previous Part

In part one I introduced tables, views and triggers. In this part comes the logic.

Rules

Single Candidate

CREATE PROCEDURE ApplyRuleSingleCandidate (@rows SMALLINT OUTPUT) AS
BEGIN
  -- if there is only one candidate for a cell, the cell has this value
  INSERT INTO Cell (CellID, [Rule], Value)
  SELECT t.CellID, t.[Rule], t.Value
  FROM (
    SELECT c.CellID, 'Single Candidate' AS [Rule], MAX(c.Value) AS Value
    FROM Candidate c
    GROUP BY c.CellID
    HAVING COUNT(*)=1
  ) t

  SET @rows=@@ROWCOUNT

  IF EXISTS(SELECT 2 FROM Violation)
   THROW 50001, 'Invalid value in cells', 1

END;
The main point is COUNT(*)=1. There is only one candidate for a cell. As aggregation I choose MAX, as there is only one row, MIN would do the same. For traceability I use the string 'Single Candidate' as the rule. After inserting I check if any violation exists.

How can violations happen?

If we operate pure rule based, no violation should happen. But if we implement guessing we have to check if a guess leads us to a dead end.

Single Token In Group

CREATE PROCEDURE ApplyRuleSingleTokenInGroup (@rows SMALLINT OUTPUT) AS
BEGIN
  -- if, for a given value, there is only one cell in a row, column or square as a candidate for this value, it has to be there
  INSERT INTO Cell (CellID, [Rule], Value)
  SELECT CellID, [Rule], Value
  FROM (
    SELECT MAX(c.CellID) AS CellID
    , 'Single Token in ' + CASE WHEN l.GroupType='C' THEN 'Column' WHEN l.GroupType='R' then 'Row' ELSE 'Square' END AS [Rule]
    , c.Value
    , ROW_NUMBER() OVER(PARTITION BY MAX(c.CellID) ORDER BY l.GroupType) AS RowNumber
    FROM Candidate c
    INNER JOIN Location l ON c.CellID=l.CellID
    GROUP BY l.GroupType, l.GroupValue, c.Value
    HAVING COUNT(*)=1
    ) t
  WHERE RowNumber=1 -- we need this to avoid duplicate inserts due to more than one rule that applies

  SET @rows=@@ROWCOUNT

  IF EXISTS(SELECT 2 FROM Violation)
   THROW 50001, 'Invalid value in cells', 1
END;

Again, the main point ist COUNT(*)=1. But grouping is now by value and unit. Unit means row, column or square. The encapsulation in a subquery and restriction to RowNumber=1 is only to avoid duplicate key violation if a value unique in more then one unit, e.g. a row and a column.

Eliminating candidates

The two rules above are all rules that give us a value for a cell that I can think of. If you can think of another, let me know.
Further progress in solving relies on rules that eliminate candidates.

Complete Match

CREATE PROCEDURE RemoveCandidateCompleteMatch (@rows SMALLINT OUTPUT) AS
BEGIN
  DELETE c
  FROM Candidate c
  INNER JOIN CandidateMask m ON c.CellID=m.CellID
  INNER JOIN Location l on m.CellID=l.CellID
  INNER JOIN (
    SELECT l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    FROM SetAll s
    INNER JOIN CandidateMask c1 ON s.SetValue = c1.Mask
    INNER JOIN Location l1 ON c1.CellID=l1.CellID
    WHERE s.Size NOT IN (1, 9)
    GROUP BY l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    HAVING COUNT(*) = s.Size
  ) h ON h.GroupType=l.GroupType AND h.GroupValue=l.GroupValue
  WHERE m.Mask != h.SetValue
  AND c.Mask & h.SetValue != 0

  SET @rows=@@ROWCOUNT

END;

If we find a set of candidate values, lets say a set with two members, in a unit exactly two times, the values of this set cant be anywhere else in the unit.
Example: candidate set 1,3 in row 4 two times => 1 and 3 can be removed as candidates from all other cells in row 4.

Partial Match

CREATE PROCEDURE RemoveCandidatePartialSet (@rows SMALLINT OUTPUT) AS
BEGIN
  DELETE c
  FROM Candidate c
  INNER JOIN CandidateMask m on c.CellID=m.CellID
  INNER JOIN Location l on m.CellID=l.CellID
  INNER JOIN (
    SELECT l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    FROM SetAll s
    INNER JOIN CandidateMask c1 ON s.SetValue & c1.Mask = c1.Mask AND ~s.SetValue & c1.Mask = 0
    INNER JOIN Location l1 ON c1.CellID=l1.CellID
    WHERE s.Size NOT IN (1, 9)
    GROUP BY l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    HAVING COUNT(*) = s.Size
  ) h ON h.GroupType=l.GroupType AND h.GroupValue=l.GroupValue
  WHERE ~h.SetValue & m.Mask != 0 AND c.Mask & h.SetValue != 0

  SET @rows=@@ROWCOUNT
END;
 If a part of a candidate set, lets say with 3 members, are the only candidate members for 3 cells in a unit, we can remove the members of the set from all other candidate values for the unit.
Example: 1,3 is in cell 11, 1,4 is in cell 13 and 3,4 is in cell 22. Our set is 1,3,4, our square is 1, we can remove the values 1,3 and 4 from all candidates from square 1.

Same Row Or Column And Square  

CREATE PROCEDURE RemoveCandidateValueInSameRowOrColumnAndSquare (@rows SMALLINT OUTPUT) AS
BEGIN
  DELETE FROM c
  FROM Candidate c
  INNER  JOIN Location l ON c.CellID=l.CellID
  INNER  JOIN LocationWide s ON c.CellID=s.CellID
  INNER JOIN (
    SELECT c.Value, ls.[Square], lrc.GroupType, MAX(lrc.GroupValue) as GroupValue
    FROM Candidate c
    INNER JOIN LocationWide ls ON c.CellID=ls.CellID
    INNER JOIN Location lrc ON c.CellID=lrc.CellID AND lrc.GroupType IN ('C','R')
    GROUP BY c.Value, ls.[Square], lrc.GroupType
    HAVING COUNT(DISTINCT ls.[Square])=1 AND COUNT(DISTINCT lrc.GroupValue)=1
  ) a ON a.Value=c.Value AND l.GroupType=a.GroupType AND l.GroupValue=a.GroupValue
  WHERE s.[Square] != a.[Square]

  SET @rows=@@ROWCOUNT

END;

If a candidate exists within a square multiple times, but only in one row or column, the candidate value can be removed for cells in this row but in another square.
Example: candidate value 3 exists in cell 12 and in cell 13 but nowhere else in square 1. candidate value 3 can be removed for cells 14, 15, 16, 17, 18, 19.

Executing Rules

Now we can execute the rules until they do not affect any rows.
  DECLARE @cells INT
  DECLARE @affected INT
  SET @affected=0
  SET @cells=1
  WHILE @cells>0
  BEGIN
    WHILE @cells>0
    BEGIN
      EXEC ApplyRuleSingleCandidate @cells OUTPUT
      print SPACE(@@nestlevel) + 'Iteration ' + CAST(@iteration AS VARCHAR(5)) + ' ApplyRuleSingleCandidate solves ' + CAST(@cells AS VARCHAR(5)) + ' cells.'
     
      EXEC ApplyRuleSingleTokenInGroup @affected OUTPUT
      print SPACE(@@nestlevel) + 'Iteration ' + CAST(@iteration AS VARCHAR(5)) + ' ApplyRuleSingleTokenInGroup solves ' + CAST(@affected AS VARCHAR(5)) + ' cells.'
      SET @cells=@cells+@affected

      SET @iteration=@iteration+1
    END

    -- execution of rules that give values directly ends here
    -- now we apply rules that remove candidates
    EXEC RemoveCandidateCompleteMatch @cells OUTPUT
    PRINT SPACE(@@nestlevel) + 'Removed ' + CAST(@cells AS VARCHAR(10)) + ' candidates due to complete match in same group'

    EXEC RemoveCandidatePartialSet @affected OUTPUT
    PRINT SPACE(@@nestlevel) + 'Removed ' + CAST(@affected AS VARCHAR(10)) + ' candidates due to exclusive partial sets in same group'
    SET @cells=@cells+@affected

    EXEC RemoveCandidateValueInSameRowOrColumnAndSquare @affected OUTPUT
    PRINT SPACE(@@nestlevel) + 'Removed ' + CAST(@affected AS VARCHAR(10)) + ' candidates due to candidate values in a square in only one row or column'
    SET @cells=@cells+@affected

  END
 So that is all we can do right now. If you know more rules, let me know.

Guessing

But I found some sudokus that could not be solved be these rules. I was more curious about implementing guessing than implementing another rule. With guessing, I wanted to make the process as following: 
  1. Try solving by rules
  2. If that does not work out, make up a list of choices. Take the cell with most candidates, fill in the first choice. Try step 1. If it does not solve the sudoku, try the next choice.
You see, it is going to be recursive. Here is the code:
  DECLARE @guess TABLE (CellID SMALLINT, Value SMALLINT)
  INSERT INTO @guess (CellID, Value)
  SELECT c.CellID, c.Value
  FROM Candidate c
  WHERE c.CellID = (SELECT TOP 1 cm.CellID
                    FROM CandidateMask cm
                    INNER JOIN SetAll s ON s.SetValue=cm.Mask
                    ORDER BY s.Size DESC, cm.CellID )
  ORDER BY c.Value

  DECLARE @cellID SMALLINT, @value SMALLINT, @count SMALLINT
  WHILE (SELECT COUNT(*) FROM Cell)<81
  BEGIN
    SET @count=(SELECT COUNT(*) FROM @guess)
    IF (@count=0)
      THROW 50002, 'Error in guessing, all candidates used but not solved.', 1
    IF NOT EXISTS(SELECT 2 FROM Candidate)
      THROW 50002, 'Error in guessing, no candidates.', 1
    SELECT TOP 1 @cellID=g.CellID, @value=g.Value
    FROM @guess g
    ORDER BY g.CellID, g.Value
    DELETE FROM @guess
    WHERE CellID=@cellID AND Value=@value
    PRINT SPACE(@@nestlevel) + 'nestlevel ' + CAST(@@NESTLEVEL as VARCHAR(10)) +': guessing value ' + CAST(@value as CHAR(1)) + ' in cell ' + CAST(@cellID as VARCHAR(5))
    BEGIN TRY
      SAVE TRANSACTION Guessing
      INSERT INTO Cell(CellID, Value, [Rule])
      VALUES (@cellID, @value, 'Guess')
      EXEC TrySolve @iteration
    END TRY
    BEGIN CATCH
      PRINT SPACE(@@nestlevel) + 'rolling back transaction due to ' + ERROR_MESSAGE()
      ROLLBACK TRANSACTION Guessing
      IF (ERROR_NUMBER() NOT IN (50001, 50002))
        THROW
    END CATCH
  END
END;

This code together with the code above make up the procedure TrySolve, with is called recursively at EXEC TrySolve @iteration.

Next Part

In the next part I will introduce some convenience features for entering start data and executing the solver.

No comments:

Post a Comment