I may be carbon-dating myself here, but one of the first things I implemented on my Apple IIe in the early 80s was John Conway's Game of Life. If you're not familiar with it, it's a mathematical game that simulates cellular automata with fixed rules. There's an infinite square grid containing spaces for cells. Every generation, cells can be born or they can die, according to very simple rules. Despite the simple rules, very complex structures can be created. See the wiki page for more information.
All the code here can be found on github: https://github.com/pauljchang/sandbox/blob/master/life/life.sql
The rules for the game are:
- Any empty space with exactly three neighbours gives "birth" to a new cell.
- Any cell with 1 or 0 neighbours "dies" from starvation and is removed.
- Any cell with 4 or more neighbours "dies" from overcrowding and is removed.
For fun, I've implemented this in BASIC and C++, and maybe even Java, but I thought I'd share my MSSQL implementation, as it uses some interesting concepts.
First of all, instead of using an array, we are going to create a table that contains one row for every cell that is in the field. This may seem overkill, but for very sparse fields where there are just a few cells occupying a very large area, this can be surprisingly efficient, and more efficient than a very large, two-dimensional array.
So, let's see the table:
-- -- dbo.cell -- Table to hold all cells -- -- Each cell has an entry in the dbo.cell table -- It knows its (x,y) coordinates, as well as the generation number -- The generation number is useful for diagnostics -- and indicates how "old" a cell is -- CREATE TABLE dbo.cell ( x SMALLINT NOT NULL , y SMALLINT NOT NULL , gen SMALLINT NOT NULL ); CREATE UNIQUE CLUSTERED INDEX KUX_cell ON dbo.cell (y, x); CREATE UNIQUE INDEX UX_cell_x_y ON dbo.cell (x, y) INCLUDE (gen); CREATE INDEX X_cell_gen ON dbo.cell (gen, y, x);
You might notice that I clustered the table along the y-axis, which will facilitate printing out the field for display. I also included other indices that will help later on as we step through generations of cells.
There is also a "gen" field for the generation of the cell. This is interesting information, as to the age of the cells, and also helps us distinguish between newly "born" cells and cells that were already there.
Let's also seed the field with the classic R-Pentamino shape. This happens to be one of the smallest patterns that grows into a huge shape that doesn't stabilise for many generations. Here's what we want to seed:
+---+ 1 | 00| 0 |00 | -1 | 0 | +---+
The numbers on the left are the y-axis, and I didn't think it was necessary to display the x-axis. Here's the code to seed the table:
INSERT INTO dbo.cell (x, y, gen) VALUES ( 0, 1, 0) , ( 1, 1, 0) , (-1, 0, 0) , ( 0, 0, 0) , ( 0, -1, 0) ;
In order to step through, there are two calculations we must make. One, we have to see if any new cells will be born with the existing cells on the field. And two, we have to see if any of the existing cells (not the newly born cells) will die.
To do this, I use CTEs (Common Table Expressions), which are like temporary queries or views that live only for the lifespan of a query, but can be referenced by the query.
The first CTE I create is "delta", which allows me to look at neighbouring cells or spaces in all eight directions. Just one delta CTE can be referenced multiple times, which is very convenient.
WITH delta (num) AS ( SELECT CAST(-1 AS SMALLINT) AS num UNION ALL SELECT CAST( 0 AS SMALLINT) AS num UNION ALL SELECT CAST( 1 AS SMALLINT) AS num )...
The next CTE is for generating a set of neighbouring empty spaces. This is based on the current set of cells, but looking in left, right, up, down, and diagonally for empty spaces. Here, we can see how "delta" is used. It's a pretty simple query -- return distinct neighbouring spaces that don't already have a cell.
... , empty_neighbour (x, y) AS ( SELECT DISTINCT dbo.cell.x + delta_x.num AS x , dbo.cell.y + delta_y.num AS y FROM dbo.cell CROSS JOIN delta AS delta_x CROSS JOIN delta AS delta_y WHERE NOT EXISTS ( SELECT * FROM dbo.cell AS other_cell WHERE other_cell.x = dbo.cell.x + delta_x.num AND other_cell.y = dbo.cell.y + delta_y.num ) ) ...
And another CTE, but this one counts how many neighbouring cells exist for each of those empty spaces. Again, we use "delta" to look in eight directions to count neighbouring cells.
... , neighbour_count (x, y, neighbour_count) AS ( SELECT empty_neighbour.x , empty_neighbour.y -- This expression is here to eliminate the silly NULL agregation warning -- Otherwise, we could just COUNT(other_cell.gen) , COALESCE(SUM(CASE WHEN other_cell.gen IS NOT NULL THEN 1 ELSE 0 END), 0) AS neighbour_count FROM empty_neighbour CROSS JOIN delta AS delta_x CROSS JOIN delta AS delta_y LEFT JOIN dbo.cell AS other_cell ON other_cell.x = empty_neighbour.x + delta_x.num AND other_cell.y = empty_neighbour.y + delta_y.num GROUP BY empty_neighbour.x , empty_neighbour.y ) ...
I have a funny COALESCE(SUM(CASE...)) statement because I'm trying to avoid the annoying aggregation of NULL values warning. Just think of it as counting the number of neighbours.
... INSERT INTO dbo.cell (x, y, gen) SELECT neighbour_count.x , neighbour_count.y , @gen AS gen FROM neighbour_count WHERE neighbour_count.neighbour_count = 3 ORDER BY neighbour_count.y , neighbour_count.x ; ...
So with the newly inserted cells, our field now looks like this:
+---+ 1 |100| 0 |00 | -1 |10 | +---+
The "1" cells are the ones that we just inserted, based on the rule of empty spaces with three surrounding cells. But we still need to delete cells that are overcrowded or starved. In this case, the centre cell should be removed because it has four neighbours (not including the newly "born" ones).
... , neighbour_count (x, y, neighbour_count) AS ( SELECT dbo.cell.x , dbo.cell.y -- This expression is here to eliminate the silly NULL agregation warning -- Otherwise, we could just COUNT(other_cell.gen) , COALESCE(SUM(CASE WHEN other_cell.gen IS NOT NULL THEN 1 ELSE 0 END), 0) AS neighbour_count FROM dbo.cell CROSS JOIN delta AS delta_x CROSS JOIN delta AS delta_y LEFT JOIN dbo.cell AS other_cell ON other_cell.x = dbo.cell.x + delta_x.num AND other_cell.y = dbo.cell.y + delta_y.num -- Don't count the cells we just created AND other_cell.gen < @gen -- We don't want to count the cell itself, just neighbours AND ( other_cell.x <> dbo.cell.x OR other_cell.y <> dbo.cell.y ) WHERE -- Don't count the cells we just created dbo.cell.gen < @gen GROUP BY dbo.cell.x , dbo.cell.y ) ...
...and the final delete:
... DELETE dbo.cell FROM dbo.cell INNER JOIN neighbour_count ON neighbour_count.x = dbo.cell.x AND neighbour_count.y = dbo.cell.y AND ( neighbour_count.neighbour_count <= 1 OR neighbour_count.neighbour_count >= 4 ) ; ...
Now our field looks like the following:
+---+ 1 |100| 0 |0 | -1 |10 | +---+
If we step forward a few generations, we see the pattern grow:
+----+ 2 | 2 | 1 | 10 | 0 |2 2| -1 | 10 | +----+ +----+ 2 | 32 | 1 | 103| 0 |2 2| -1 | 10 | +----+ +----+ 2 | 3 4| 1 |4 3| 0 |2 2| -1 | 10 | +----+ +-----+ 2 | 5 | 1 |45 35| 0 |2 2 | -1 | 10 | +-----+
And after 100 generations, we see it grows, and keeps growing:
+--------------------------------------------------+ 12 | 98 | 11 | 9 9 | 10 | 58 8 8 | 9 | 45 98 | 8 | 89 | 7 | 0 70 45 00 9 | 6 | 989 0 69 49 80 | 5 | 9 70 | 4 | 80 07 069 89| 3 | 950 980 08 80| 2 | 09 0 06 0 6 89 00076 | 1 | 0 0 38 8 5 0 7 90 | 0 | 0 0 6 8 9 8 | -1 | 9 9 9 98 9 0 0 | -2 | 00 0 089 | -3 | | -4 | 09 80 | -5 | 7 | -6 |07 0 71 | -7 |88 0 52 | -8 |0 | -9 | 9 0 | -10 | 0 8 | -11 | 090 | +--------------------------------------------------+
I'm reasonably happy with this code. Unlike array implementations, which slow down as the patterns grow larger and occupy more rectangular space, this implementation grows O(n), linearly with the number of cells, n.
Is there a better way to implement this in SQL? Of course, we all know that SQL is likely not the best way to implement this, but within the bounds of SQL, how else can we improve this?
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
DeleteI thought I had my own version, but there's an error somewhere. I'll track it down and repost! Sorry for messing up the comments section.
ReplyDeleteLet's try this again. The big differences are 1) The way I calculate the new generation *includes* self, and produces the entire board instead of the diff. 2) This entire board is then merged to perform the diff, and rows that already exist in the dbo.Cell table are ignored during the merge.
ReplyDeleteMy display routine is having a problem past nvarchar(4000) so I'll be fixing that later and posting it! In the meantime, you can adapt either my or your script and use your print routine.
CREATE TABLE dbo.Cell (
X smallint NOT NULL,
Y smallint NOT NULL,
Generation smallint NOT NULL
);
CREATE UNIQUE CLUSTERED INDEX CI_Cell ON dbo.Cell (Y, X);
CREATE UNIQUE INDEX UQ_Cell_X_Y ON dbo.Cell (X, Y);
INSERT dbo.Cell (X, Y, Generation) VALUES
( 0, -1, 0),
( 1, -1, 0),
(-1, 0, 0),
( 0, 0, 0),
( 0, 1, 0)
;
IF object_id('dbo.AdvanceLife', 'P') IS NULL EXEC ('CREATE PROCEDURE dbo.AdvanceLife AS RETURN;');
GO
ALTER PROCEDURE dbo.AdvanceLife
AS
MERGE
dbo.Cell C
USING
(
SELECT
P.X,
P.Y,
Generation = Max(C.Generation) + 1
FROM
dbo.Cell C
CROSS APPLY (
SELECT
C.X + X.X,
C.Y + Y.Y,
CASE WHEN X.X = 0 AND Y.Y = 0 THEN 1 ELSE 0 END
FROM
(VALUES (-1), (0), (1)) X (X)
CROSS JOIN (VALUES (-1), (0), (1)) Y (Y)
) P (X, Y, IsSelf)
GROUP BY
P.X,
P.Y
HAVING
Count(*) * 2 - Max(IsSelf) BETWEEN 5 AND 7
) N
ON C.X = N.X
AND C.Y = N.Y
WHEN NOT MATCHED BY SOURCE THEN
DELETE
WHEN NOT MATCHED BY TARGET THEN
INSERT (X, Y, Generation)
VALUES (N.X, N.Y, N.Generation)
;
GO
Hi, Erik,
DeleteThis is great! I should have thought about an "UPSERT" like this! Honestly, I haven't used code like that, other than very specific ETL code, but I'm glad you thought of it! Thanks for reading the blog and contributing!
This comment has been removed by the author.
ReplyDelete