(951) 268-7836 info@authintel.com

Data Security and the Lack thereof

Americans are increasingly becoming aware of the vulnerability of their private data with breaches now up in the millions that have not only disclosed credit card information, but even more potentially dangerous information such as social security numbers linked to birthdates, names, and addresses.  Your name and address along with a birth date and social security is a virtual ticket to complete identity theft creating risks of massive financial destruction. There are multiple pieces of data that are private to an individual, but the social security number (SSN) is the one that is the most critical and is the focus of this post. The same principles apply to securing SSN as any other private data. How did we get to this point? Is this just the inevitable consequence of Internet access, distributed systems, and evolving technology? Or is much of this due to irresponsibility of those that hold the most confidential aspects of your data? In this post, I make the argument that it is due to corporate irresponsibility, that there are ways to ensure safeguarding of data protecting not only against but external threats, but against the more powerful threats from within an organization. This post first articulates the drivers behind the situation which can be divided into two main areas: wrong assumptions and lack of diligence. Next, we identify the architectural principles of the software and infrastructure that mitigate the risks for security breach and contrast that with what most organizations believe is adequate.  Finally, we provide practical strategies to remediate the situation. Wrong Assumptions The most common wrong assumption is that security threats are primarily outside...

Home-grown PCIE SSD VM vs. Azure (Update 12/7)

More testing today with larger database, multiplied the number of rows by 10x up to 1 billion rows in the test dataset.  The latest results indicate I/O about 3 times slower on an Azure D13 server using similar CPU/Ram specs from a dedicated VM on a Gen-1 raid Fusion-io duo drive. Also, the time to insert the test record set was about 2.5 times slower.  This aligns with the performance metrics outlined by http://azure.microsoft.com/blog/2014/10/06/d-series-performance-expectations/. The D series scales upward in terms of SSD capability, so the only way to achieve the I/O of the VM would be to upgrade to a D14 configuration with 800 GB SSD. Along with that comes more cores and memory (16 cores and 112 GB RAM) but that should not significantly affect the I/O comparison. Here are the bottom line backup logs: From the dedicated VM (48GB RAM, 6 cores @ 2.66, Dell-branded Fusion-io 640 GB MLC card: Processed 1931504 pages for database ‘TestDb’, file ‘TestDb’ on file 1.100 percent processed.Processed 9 pages for database ‘TestDb’, file ‘TestDb_log’ on file 1.BACKUP DATABASE successfully processed 1931513 pages in 26.102 seconds (578.114 MB/sec). From the Azure D13 server (56 GB RAM, 8 Cores @ 2.20 GHz, 400 GB SSD) Processed 1931648 pages for database ‘TestDb’, file ‘TestDb’ on file 1.100 percent processed.Processed 9 pages for database ‘TestDb’, file ‘TestDb_log’ on file 1.BACKUP DATABASE successfully processed 1931657 pages in 81.958 seconds (184.131 MB/sec). I’m going to come up with some more tests to perform such as how long it takes to provision a VM from scratch, how long to provision from an image, boot up times, and...

Using INSTEAD OF Triggers to enable updatable views

A common misconception with SQL Server views is that only indexed views can provide update capability. Actually, any view that is schema-bound can be enabled for update by simply using an “INSTEAD OF” trigger.  Although an indexed view may also be setup with a trigger, it is not required.  The view does not need to obey the constraints of an indexed view other than being schema-bound since the trigger writes to the base tables. There are a few scenarios where this is useful. One is in the case of tables that utilize a master-secondary design pattern.  In many scenarios, there may be central objects that manifest themselves in different forms requiring different attributes. Often the central objects need to be accessed singularly for their core attributes so splitting up the objects into multiple tables makes running queries that need objects of both types together awkward and forces the use of UNION queries. The other complexity involved with separating objects based on type into separate tables is that they often share relationships to common child objects. To support the different types of interfaces for different object types, multiple views can be configured. One example of this is a name and address system I worked on a long time ago related for a non-profit organization. It turns out that donors may be organizations or individuals or married couples. Different attributes are associated with each. To further complicate matters, a donor could be a member – that is a member could donate to a donor. The member might also be married and the organization need to track the household unit associated with...

Solving Sudoku with T-SQL

Note: This is an update to the earlier post to include the query to actually display the solved puzzle (for the skeptics… ) Recently I had to make a short trip and took a look at a Sudoku puzzle in the airline magazine. In the past, I’ve posted some about automating solving problems and just finished a PhD in this area that outlined an automated technique for algorithm discovery through analyzing solution sequences from simulations. What intrigued me about the Sudoku scenario is that it lends itself to a Cartesian product solution approach – i.e. rows and columns and groupings must all contain specific values. Based on that, it should be solvable with relational algebra. I modeled the data structure on the plane to see how hard it would be to arrive to a solution using T-SQL for a puzzle. Before spending too much time on this, I searched the Internet and found that somebody else had already modeled and solved this. Samuel Aina actually undertook this scenario back in 2010 – the link to his approach is at http://www.developerfusion.com/article/84374/solving-sudoku-with-sql/. In keeping with my generic problem model approach, I still thought it would be worthwhile to do this via a less procedural approach than that of Aina. My approach for generic problem solving relies on a schema that defines the start, transition, and final states for any problem solving scenario. In keeping with that approach, I decided to try to solve the problem based on a view that would present to a solver procedure to enumerate through the steps. For testing purposes, I went to http://www.websudoku.com/ and pulled in...

Creating a 3-way 10 Gbe Cluster without a switch– Part 1

After finishing the PhD, I’m back into the lab to test out some new high-speed computing experiments. Recently, I retrieved one of my servers from co-location and put it back into my home-office lab giving me 3 servers with Fusion-IO cards all in the same spot. I’m trying to move around some virtual machines and update the trading optimizer (CapGen) investing databases so thought it would be useful to get all of the servers talking on at least 10 Gb. The Fusion-io cards easily overwhelm the 1 Gb connection since even one of the duos provides 1.5 GB/s which is actually 9 Gb. A few months back I managed to get 2 servers working using Mellanox Connectx cards on QFSP connections with 20 Gb/s, but that stopped working on me with a driver update (ConnectX not supported on Windows Server 2012 R2), so went to a better supported, although slower connection using 10 Gb/s. To do this, I got the Mellanox 3 EN cards for 2 of the servers and bought the add-on dual 10 Gb/e adapter for one of the HP DL 370 G6 servers. One advantage to using the HP add-on adapter is that it doesn’t require an additional slot although you do trade off 2 of the 1 Gb connectors. This approach allows the maximum number of Fusion-io (HP IO Accelerator) cards in the server (current at 9 with 8 of them being duos) as shown below. In this arrangement, each server has a dedicated high-speed connection to the other two servers via the dual interface as shown in the below table without the need for...

Generating a Type-2 Dimension from Multiple Tables with Effective Dates

In slowly changing (SCD) dimensions, type-2 attributes involves ending a dimension row when the attribute value changes with the current date and creating a new row starting from the current date. While this works great to capture history once a warehouse/cube is implemented, it does not address the situation of historical data if such tracking was already occurring. For most scenarios, the history can simply be loaded without the cube processing having the type-2 rules implemented to pre-populate the SCD start/end dates. However in some business such as health care, history may sometimes need to be “re-written”. For example, a claim may not get into the system until several days after it occurs and the claim would need to be effective for a point in time in history already loaded. This challenge is highlighted in the below examples. A real-world example might be tracking what certifications were held by an individual at a given point in time. For example a software professional may have been an Oracle OCP from 6/1/2010 to 8/31/2010, and a PMP from 7/1 TO 12/31. To track the history of all certifications held by date ranges, 3 rows are needed: – 6/1/2010 – 6/31/2010 : OCP – 7/1/2010 – 8/31/2010  : OCP, PMP – 9/1/2010 – 12/31/2011: PMP only The spreadsheet illustrates the scenario where sets of data for a dimension row coming from 2 different tables have values effective for different historical periods – the dates associated with the 2 sets of values could be overlapping, equivalent, or covering (i.e. ‘A’ start date before ‘B’ start date with end date also greater than ‘B’...

Dealing with a corrupt transaction log in SQL Server

I just went through the experience of a corrupted transaction log for a large SQL Server database. It was actually not as bad as I thought it would be. I ended up restoring from a backup and then putting the corrupt database into emergency mode to generate scripts for updated objects and dumped out data added since the last backup. Fortunately, it wasn’t too hard to figure that out. As it turns out, I really don’t think anything was lost on the database since it had no live activity for several hours prior to the failure. Unless I’m misunderstanding the SQL checkpoint feature for the transaction log, a database that is at rest and has no activity is likely to have very little reliance on the transaction log in order to be current. Based on comparison of the data sources used to load the database and what was in the tables and inspection of the code changes made from scripts, there appears virtually no loss of data or metadata. What was the most distressing of this was the fact that the transaction log was actually on a Raid-1 device. The Raid-1 was based on Windows raid because it utilized 2 Fusion-IO (HP IO Accelerator version) drives. I had even coupled together drives from two different HP IO Accelerators Duos to minimize the impact of a failure at the card level rather than at the module level. Only one of the drives failed. However, instead of simply going to a failed redundancy state, both halves of the Raid device ended up corrupted. This is Windows Server 2008 R2. The problem...

Normalizing time-series data using a table function in SQL

Recently I had to normalize quarterly, monthly, and ad-hoc reported metrics into daily rows in order to do correlative analysis. Below is the function that I came up with. The idea is to generate values that are in line with the trend. This can be used with a union on a table that does not contain the gap data to display a smooth transition that moves the values from each actual increment. This is useful for trading strategies that integrate macro data with technical data. For example, I might want to have a strategy that factors in NYSE margin debt or market sentiment to determine that a hedge position might be useful, while also integrating in technical indicators that suggest an appropriate entry or exit point. Most investment strategy generation programs utilize bar data or statistical files that must match up on a daily basis so intermediate rows must be created with values that follow the trend to fill in the gaps. create function [dbo].[tdf_GenerateDailyGapValues] (    @StartDate date,    @EndDate date,    @StartValue money,    @EndValue Money)– Generate intervening values for missing days based on a calendar returns @GapValues table(    GapDate date,    GapValue money)as begin    declare @DayCount money = datediff(DAY,@StartDate, @EndDate)    declare @DayChange money = (@EndValue – @StartValue) / @DayCount    declare @Counter int = 0    declare @LastGapValue money = @StartValue    while @Counter < @DayCount – 1    begin        set @LastGapValue = @LastGapValue + @DayChange         set @Counter = @Counter + 1        insert into @GapValues         (GapDate, GapValue)        values (dateadd(day, @Counter, @StartDate), @LastGapValue)    end    returnend Below is sample output from using the function: SELECT * FROM dbo.tdf_GenerateDailyGapValues    (‘2014-01-01′,’2014-02-01′,100.00, 200.00) GapDate    GapValue2014-01-02    107.14282014-01-03    114.28562014-01-04    121.42842014-01-05   ...

Converting Hyper-V machines from Windows 2008 R2 to Windows 2012 R2

I am posting this because it wasn’t obvious just from searching the knowledge base. I was recently converting some VMs running under Windows 2008 R2 to Windows 2012 R2 and dismayed to find out that I could not import the VMs into 2012 R2 that I exported from Windows 2008 R2. It turns out this is just a case of doing more than you need to do. All you need to do is shut down the 2008 R2 VMs, copy the files directly to the location you want them on in the Windows 2012 R2 server and then import them. That is right, you just import them even without exporting them. If you export them, you can’t import them.  This is explained in more detail in http://blogs.technet.com/b/rmilne/archive/2013/10/22/windows-hyper-v-2012-amp-8-1-hyper-v-did-not-find-virtual-machine-to-import.aspx There are a few caveats, you need to reconnect the network adapter and if you might get prompted for the exact location of the virtual hard drives if you’ve moved things around some from the original setup. Also, be sure to manually edit the .vhd hard drive files to update location for the parent of any differenced discs. There is also the option to upgrade to the new .VHDX format for the drives. That can be done without updating to generation 2 for the virtual machine configuration (requires Windows 2012, Windows 8 or later to be on the guest VM). There are significant scalability and performance advantages with the .VHDX format, especially for SSD and newer drives that natively use 4K sectors. This is explained in http://blogs.technet.com/b/askpfeplat/archive/2013/09/09/why-you-want-to-be-using-vhdx-in-hyper-v-whenever-possible-and-why-it-s-important-to-know-your-baselines.aspx I’ve started using the new .vhdx format for my legacy Windows 2008 R2 guest machines...

Moving Databases from One Instance to Another Painlessly (Attach Statement Generator)

If you’ve ever had a bunch of databases and need to just move them to another instance without having to manually do the attach, these two scripts may be helpful. I recently was faced with this situation and have learned from experience that the time it takes to script something involving more than a few configuration actions is worth it, not just to ensure reliability for the task at hand, but for the likelihood that I will need it again. The script is especially useful if you have lots of files. It just generates the SQL so you can edit. You will probably want to edit for most scenarios to specify the new locations, but that is a editor-processing task to do global search/replace, etc. There are two scripts involved. One does the meat of the work and generates the concatenation of T-SQL file clauses using a cursor to enumerate the filenames associated with the database. The other simply calls the attachment scalar function and wraps it with the statements needed to actually create the database and specify the attach option. Below are the two scripts. They work for my purposes, but I do not warranty them. create function dbo.util_GenerateAttach (@dbid int)     returns nvarchar(max)as begin    /*    Robert Leithiser, Authentic Intelligence 2013-11-23 – Generates the attach statement        to move over a database. Run this as follows to generate attach statements for all databases        on an instance:             select ‘create database ‘ + d.name, dbo.util_GenerateAttach(d.dbid) + ‘ for attach;’         from sys.sysdatabases  d where d.sid <> 01    */    declare @filestring nvarchar(max) = ”     declare x cursor for select ‘(filename...