When designing the data model for a Revenue Management project, the initial step is always receiving and cleansing the data. Invariably, the client data is, to some extent, is messy. And one of the most common issues is NULL values. How you deal with NULL values is critically important, but most companies across nearly all industries fail to have a specific strategy for how to deal with these data gaps. While most end users tend to think of NULL values as a blank, they actually signify a missing or unknown value. Because of this status, NULLs can cause serious bugs and logic issues if they are not handled correctly.
To complicate things further, not only is the handling of NULLs often counterintuitive within a single database, but their handling across databases also varies.
To demonstrate some of the issues which are potentially encountered when dealing with NULL values, let’s start with a simple, sample table:
Looking at the table and knowing the data we inserted, we clearly have 3 distinct values for the TestValuesChar field, right?
However, the following SQL would give us the correct result:
Does a NULL count as a minimum or a maximum value in a table? The answer would be neither:
Attempting to do math with NULL is another place where small differences can turn into horrible mistakes. A simple mathematical formula will return a simple answer:
However, an aggregation function is more complicated, because the answer actually depends on the database used! In the following example, results using SQL Server 2014 and Oracle 12.1 databases are shown:
Obviously the issue remains as each database returns a different value.
What happens if we concatenate strings with NULLs? Again, the answer is database platform specific:
One last monkey wrench for NULLs is the use of the NOT IN vs the NOT EXISTS clause:
With these examples, it’s easy to see the potential havoc created in a Revenue Management system by incorrectly handling a NULL value. Imagine that your Pricing Engine incorrectly multiplies a price by a NULL value, returning a NULL price to be published on your website; or that a NULL value is incorrectly used in a Forecast model, skewing the Forecast dramatically. By proactively planning and designing for the problems caused by NULL values, your data architecture can avoid errors and ensure that the calculations to maximize profits and revenue growth are correct, leading to the right recommendations.
As a project data architect, it’s important to understand the implications of NULL values in your dataset, but with proper data and code design, issues can be addressed, or avoided altogether.