Quick Script: Finding Columns

http://www.flickr.com/photos/mag3737/1420554128/
Locked Lock, Tom Magliery, http://www.flickr.com/photos/mag3737/1420554128/
Is this secured enough?

For any data professional, data security should be an utmost priority.  Having unsecured data lying about can cause major issues, if discovered.  I’m speaking of software security, more specifically data encryption or data masking.

Finding the instances of unsecured data in an environment can be a pain.  Last year, I had to look for unencrypted data within a non-production environment.  In this case, I was looking for Social Security or also known as Tax Identification Numbers.

Red-Gate offers a tool, SQL Search, which is completely free and completely awesome. I’ve used this tool previously on many occasions .   The problem that I had was that I wanted to look across all servers within an environment utilizing a Central Management Server.  I haven’t been able to find a way to accomplish that with SQL Search so I set out to create my own script.

There are some things to keep in mind about this script,

  1. It makes assumptions that the column is named in a certain pattern.
  2. It uses dynamic SQL.
  3. It uses a WHILE loop.

However, even with these nuances,  the script works pretty well to help find potential sensitive data that might not be encrypted across multiple servers.   Of course, I’m also assuming that encryption is intended to be handled by the database and not some other process.

DISCLAIMER:  While I’ve been able to run it across multiple servers, over several hundred databases without an issue, your mileage will vary.  Please test accordingly.   NEVER run code that you find on the internet against your Production systems without testing first.  You have been warned!

/***************************************************************
-- Author: John Morehouse
-- Date: March 2015
-- T: @SQLRUS
-- E: [email protected]
-- B: http://landingzone.jmorehouse.com/sqlrus

--THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.

--IN OTHER WORDS: USE AT YOUR OWN RISK.

--AUTHOR ASSUMES ZERO LIABILITY OR RESPONSIBILITY.

--You may alter this code for your own purposes.
--You may republish altered code as long as you give due credit.
***************************************************************/
DECLARE @dbs AS TABLE ( name sysname, processed BIT)
DECLARE @x INT = 1
DECLARE @sql NVARCHAR (2000)
DECLARE @dbName VARCHAR (50)

-- check for the table and drop if in existance
IF object_id ('tempdb..#results') IS NOT NULL
BEGIN
DROP TABLE #results
END

-- Create our new table
CREATE TABLE #results ( dbName sysname
, tableName VARCHAR (100)
, schemaName VARCHAR(100)
, columnName VARCHAR (100)
, DataType VARCHAR (100)
, MaxDataLength INT
, theRowCount INT )

-- Fetch a list of databases
INSERT INTO @dbs ( name, processed )
SELECT name, 0 FROM sys.databases
WHERE database_id > 4
AND [state] = 0 --online

-- Loop through the list of databases; use dynamic sql to look for specific columns
WHILE @x <= (SELECT COUNT( 1) FROM @dbs WHERE processed = 0 )
BEGIN
SELECT TOP 1 @dbName = name FROM @dbs WHERE processed = 0

SET @sql =
'INSERT #results (dbName, tableName, schemaName, columnName, DataType, MaxDataLength, theRowCount)
SELECT ''' + @dbName + ''',t.name,s.name,c.name,st.name, c.max_length,p.rows
FROM [' + @dbName + '].sys.columns c
INNER JOIN [' + @dbName + '].[sys].[tables] t ON c.object_id = t.object_id
INNER JOIN [' + @dbName + '].[sys].[schemas] s ON t.schema_id = s.schema_id
INNER JOIN [' + @dbName + '].[sys].[systypes] st ON c.user_type_ID = st.xusertype
INNER JOIN [' + @dbName + '].[sys].[partitions] p ON t.object_id = p.object_ID
INNER JOIN [' + @dbName + '].[sys].[allocation_units] au ON au.container_ID =
CASE
WHEN au.TYPE IN (1,3) THEN p.hobt_id
WHEN au.type IN (2) THEN p.partition_id
END
WHERE (c.name LIKE ''TIN%''
OR c.name LIKE ''TIN_TX%'') -- looking for specific column name
AND c.OBJECT_ID > 100 -- exluded system objects
AND p.index_id IN (0,1) -- Heap or Clustered Index
AND au.type = 1 --IN_ROW_DATA only'

-- used for debugging
--PRINT @sql

BEGIN TRY
EXEC sp_executesql @sql
END TRY
BEGIN CATCH
SELECT ERROR_LINE () AS 'Error Line'
, ERROR_MESSAGE () AS 'Error Message'
, ERROR_NUMBER () AS 'Error Number'
, @dbName AS 'Database'
END CATCH

UPDATE @dbs
SET processed = 1
WHERE name = @dbName
END

-- show us the goodness
SELECT dbName, tableName,schemaName,columnName,DataType,MaxDataLength,theRowCount FROM #results
GO

The script will return a number of things:

  • Table Name
  • Schema Name
  • Column Name
  • Column Data Type
  • Size of the Column
  • Number of rows

Using this script in conjunction with a Central Management Server, I’ve been able to find out where potential sensitive data might be residing.  I used this list of columns to investigate whether or not it contained sensitive data.  If it did, I would either mask it or encrypt it, depending on our company policies.

I haven’t run into any issues with running the script in Production, however, I would again advise caution.  Your mileage will vary.  Test accordingly!

Enjoy!

© 2015, John Morehouse. All rights reserved.

Hey you! Leave me a comment and start a discussion!

This site uses Akismet to reduce spam. Learn how your comment data is processed.