Where is your company data? Most people would answer in a “Database”. Also, people usually think of relational databases such as Oracle or SQL server when we say the word database. However, a Relational Database Management System (RDBMS) is not capable of processing Big Data. In order to make it possible, a significant technology investment is required. With that said, should you migrate all your organization systems to a Big Data platform in this Big Data era? Will all RDBMS systems be replaced by Big Data platforms like Hadoop? The answer is No. RDBMS and Big Data platforms are different in many ways. So it’s essential to understand how RDBMS and Big Data platforms are different before making a decision, and selectively use them as you need, which lead you and your organization to flexible data environment. Here are some aspects you may want to consider:
1) Data Types
The types of data retail companies utilize are becoming various and complicated as social network and smartphone technologies have been adopted and used broadly. There are generally three different data formats:
- Structured data is stored in fixed fields. The information comes with a high degree of organization, such as in a database or spreadsheet;
- Semi-structured data is a form of structured data that is stored in fixed fields but contains tags or metadata, such as XML, HTML and system logs;
- Unstructured data is not stored in fixed fields. It contains information that does not have a predefined data model, such as an image, video, email or document.
With a Big Data platform, companies can collect, load and analyze structured and semi-structured data in addition to conducting an analysis based on structured data in RDBMS. However, organizations that don’t utilize semi-structured or unstructured data, it’s not necessary to introduce Big Data technology to your organization.
Another aspect to consider is how quickly and how often a company expects to see results. A Big Data platform is designed for large distributed data processing that addresses every file in a database, which will take a significant amount of time to provide results compared to a traditional database. Companies that are analyzing smaller data sets and need them in real time should stick with a traditional database system. If your company’s level of analytical maturity is sophisticated and you apply high-end modeling to your data, go with a database. It’s not only about speed, it’s also about the accuracy of analysis which results in data consistency.
3) Data Design
A relational database is organized data in tables. In order to map your data into relational tables, you need to apply the process of organizing the columns and tables of a database. The goal of this process is to minimize redundancy and ensure data integrity. In addition, it makes false assumptions that your real-world data is strongly consistent, and static at all time. Unfortunately, it is not. As I mentioned in my previous post Master Data Management in a Dynamic Retail Environment, it’s a painful and expensive process to design and maintain data integrity in a relational data model. On the other hand, a Big Data platform supports an open structure to data modeling. The unstructured data modeling reduces design costs because tables can be designed without columns, type, length, and constraints. RDBMS keeps relationships that can be created among the tables. It enables a relational database to efficiently manage data and effectively retrieve selected data. However, partial data loss affects other data, therefore, integrity is important and data shouldn’t be lost. Big Data technology puts up with partial data loss, processing huge amounts of data relatively quickly. RDBMS is the solution when data integrity matters.
Companies should realize that there is no one-size-fits-all solution for any data analysis. While the benefits from Big Data analytics seem enticing (deeper insights, larger datasets), some analytics won’t require a huge dataset to deliver actionable results.
Understanding your organization’s data, the analytics you want to do, accuracy and the insight you want to achieve is the first step to cultivating a flexible data environment.