In part I, I discussed why we at IPC believe Qlik is far more than an analytics platform, that in fact it is a powerful data warehousing platform in addition to a visualization and reporting functionality. This is due to the ETL layer. This ETL layer (Extract, Transform, Load) performs the same capabilities that a database engine performs, but it can handle these processes at a much faster pace, does not require pre-aggregation or flattening, and does not put stress on the organization’s source systems the way that overtaxing a database engine does. Once the data is extracted, it is processed into reusable data files, creating a data store that enables true self-service.
In part II, we’ll discuss data storage, partitioning data, and how modeling in Qlik supports a multitude of users’ analytics and reporting needs.
Putting what we have learned into context, our data warehouse looks like a manufacturing plant with a receiving, processing, quality assurance and shipping departments. All these capabilities come with the Qlik platform. Our Receiving ETL uses connectors to obtain data from multiple data sources automatically. We organize, store and secure the Raw Data. Production ETL refines the raw data into Finished Data Objects. Quality checking and monitoring are engineered into the ETL processes and exceptions are logged and managed. And finally, the Pick Pack n Ship ETL securely delivers the data objects and models to the user.
Once you’ve created a date warehouse, whatever path you’ve chosen, you will now have an enormous amount of data that you need to store. This is another huge advantage of using Qlik as your data warehouse, because Qlik has the lowest cost of ownership to store data.
Why is that? Qlik runs on commodity hardware. We’re just storing it on normal file storage. I do not need to go stand up a RAID Array or buy EMC data storage that I would with Oracle, SQL, etc. Instead I can simply request, for example, 500 gigs of data on the network storage on an organization's already existing network. Being able to store Qlik objects on normal, cheap file storage is a huge benefit over traditional data warehousing platforms. The Qlik files are backed up just like any other file.
Next, we can partition data. This is a function you’d find in Relational Database Management Systems (RDBMS) called “table partitioning.” Qlik does the same thing with “data object partitioning.” We will take and partition our data across natural boundaries. For example, journal entries from the past three years can be broken up into three objects so they are stored in three separate files. We can easily go in and separate customers by geographic region (North America, Latin America, APAC, etc.). The organization’s data is now accessible in ways that users would naturally go to look for it.
Like traditional warehousing platforms, we partition the data. However, the Qlik method is superior because these objects are now available to the user community for analysis. When a user goes shopping to get their data to do self-service, and they look at the data objects that are available, they can easily identify and select the ones they want. They can select the years they’d like or the regions, and bring the data into their analysis quickly and independently, avoiding having IT repeatedly hand code these partitions.
Supporting a Multitude of Analytics and Reporting Needs
Lastly, once the data is organized and stored into reusable data objects, I need to create a data model. At IPC, we call this a NoSQL view -- where users can ask virtually any questions of their data without code. A data model is multiple data objects, such as customer master, order detail, or shipping transactions, linked together. Those objects come into the data model, and the user can create a multitude of analytics and reporting by using these objects without having to write a SQL statement. Better yet, the number of SQL stored procedures and views maintained by IT is dramatically reduced.
Inside of the data model is a hyper-index of every column and field, so the user has a lightning fast analysis regardless of the question, getting responses in sub-second time. In addition, with Qlik, I can ask questions that the relational database tools simply cannot do. This is because simple views, which the database tools are based on, only provide inter-join analysis. Qlik does a full outer join of every value on every table from the beginning. For example, if I have customers who are not in the order detail file, I can still see that data. I can even go a full lineage of data (for example, order to shipment to payment) in one data model. This is near-impossible in SQL-based cubes because you need to do left-joins from one table to the next, and you’ll lose data with each new join. In Qlik, the user has access to the full range of data without seeing or writing a SQL statement. Instead, they simply bring in the objects they want and click through their analysis.
In brief, Qlik is a data warehouse, and it performs faster than traditional data warehouses, for a lower storage cost, and gives an organization superior accessibility to their data. At IPC, we refer to this as a data factory, where we work with clients to bring in their multiple systems of data. The end goal is to have data that is thoughtfully engineered, systematically refined, of an assured quality, and grants users straightforward access throughout the organization.