This post will attempt to tie everything together that we have discussed pertaining to how to de-identify Personal Health Information (“PHI”) for reporting purposes. For this final post, we will walk through a step-by-step example using SQL Server and a SQL Server Integration (“SSIS”) package that will allow us to conduct a bulk de-identification process.
Step 1 – Create the sample Database
Run the TestDb.SQL command (found in the project code here) to create the sample database. Note that you may need to change the path for the database and log files for the command to run successfully.
We want to deidentify the Last Name, Email Address, and MRN. We also want to hash the Last Name and MRN so that we can do joins across tables. Since our hash value is unique to the actual value, this will allow us to deidentify the value but still give us a unique value by which we can do our joins.
Step 2 – Install the Certificate
For our example, we’ll use the existing certificate in the project code. The readme.txt contains the password for the private key for this and the code signing certificate. Of course, you are welcome to use your own certificate and you can follow the information in the previous blog post to configure the code to use a different certificate.
Open the certificate management shell (see previous blog for step by step) and import the certificate and private key. Make certain that the certificate is trusted.
If you use the certificate from the sample, the code is already configured with the correct serial number. However, if you use your own certificate, you must change the serial number in the code as detailed previously.
Step 3 – Compile the Library and Add to the GAC
With the serial number correctly configured, compile the NCryptLibs project. However, before we use the library in our SSIS package, we must import it into the Global Assembly Cache (“GAC”). To do this, you can use the Gacutil utility or Explorer. For our example, we open two Explorer windows side by side and in one instance open the folder where the compiled library, NCryptLibs.dll was created by Visual Studio. In the other, open the C:WindowsAssembly folder (use the correct drive letter that holds your Windows folder).
Now drag the NCryptLibs.dll file from the first folder to the Assembly folder. You must drop it into the right panel in Explorer. TIP: put a second copy in the c:temp folder so you can easily navigate to it later when you add a reference to the SSIS package.
Step 4 – Create the SSIS Project
Open SQL Server Business Intelligence Development Studio (“BIDS”) and create a new Integration Services Project.
Step 5 – In the Control Flow tab, create a new Data Flow Task on the design surface:
Step 6 – Create one or more Connection Manager objects pointing to your source database and if you need a different target, create one for the target too. For our sample, we will create just one since the source table and target table are in the same database and can share the same connection.
Step 7 – Click on the Data Flow tab and drag a new OLE DB Source object to the design surface. Insure it is pointed to your source DB Connection Manager and the source table you want to encrypt.
Step 8 – Repeat for an OLE DB Destination object and confirm it is pointed to the Target location. For our example, the TargetTable.
Step 9 – Between the Source and Target, insert a Script Component of a Transformation type:
Step 10 – Connect the OLE DB Source to the Script Component so that the source columns are visible to the component.
Step 11 – We need to let the Script Component know which columns we are going to modify, and which to send along to the destination. Select the Script Component and then right click to edit and then select the Input Columns. Check all the Available Input Columns that you will use in the destination:
Step 12 – We will edit the mappings to the target in a moment, but first we need to add new columns for the new rows we will encrypt and hash. For our example, we need to create five new columns: MRNHashed, MRNEncrypted, LastNameHashed, LastNameEncrypted, and EMailEncrypted to hold the new encrypted and hashed data.
The hashed values will always be a string datatype of length 44 characters. The encrypted data will be a Byte Stream type of sufficient length to hold the encrypted data. For our fields, 500 character is sufficient.
Right-Click the Script Component and select Edit. Then highlight the Inputs and Outputs in the far left panel, and then Expand the Output 0 and select the Output Colmuns icon to enable the Add Column button. Press the Add Column button and give it a name that makes sense. Next, we’ll illustrate how to create the encrypted and hashed columns for our MRN data.
Step 13 – First, we’ll create a column to hold our hashed MRN value. We will call the column MRNHashed, with a Data Type of string, and Length of 44 characters:
Step 14 – We do the same thing for MRNEncrypted but we need to make the datatype a byte stream with a length for MRN of 500 (a rather arbitrary value):
Once we finish adding the rest of the columns, the columns and properties should look something like:
Step 15 – Connect the Script Component to the OLE DB Destination. Do not worry about the warning “x”s yet. We’ll fix those shortly:
Step 16 – Edit the OLE DB Destination item – and make sure you have configured the target table.
Step 17 – Click the Mappings items in the left panel and modify the Input Column to Destination Column map. Note that in LastName and MRN, we are putting a hashed value that is cryptographically secure but unique to that value so that we can join across tables later:
Step 18 – Click OK and the red “X” should go away for the OLE DB Destination. If not, fix any errors.
Step 19 – Now we need to write our encryption code in the Script Component. Fortunately, this only involves one line of code for each column. Right-click the Script Component and select Edit and the press the Edit Script Button to bring up a new instance of the development studio:
Step 20 – After the Script Component Editor loads, we need to add a reference to our custom encryption library. Expand the References folder in the Project Explorer. Highlight the References folder and right-click and select Add Reference and then Browse for the NCryptLibs.dll file (if you copied the file to the C:temp folder earlier, then just point to the file there).
Step 21 – The wizard will automatically drop you into the function area you need to edit. In the area that tells you to “Add your code here”, make your edits.
Step 22 – Each new Row contains the columns we need to map to. Your code will end up looking something like this:
Step 23 – Build the project and correct any reported errors. Close the Script Editor (BIDS) when done.