About data vaults
When we talk about data warehouse modeling there are two well known techniques that are competing for being the best option for modeling the layer where all the data is stored.
- The Multidimensional Approach (Kimball) – The Kimball design is developed with the end user requirements in mind, the data must be re-factored and de-normalized using user requirements. The logical architecture is multidimensional, not relational; Kimball data warehouse is the composition of multiple subject oriented data marts which each are center on some few core fact tables.
- The Relational Approach (Inmon) – Compared to the architecture developed by Kimball, Bill Inmon’s approach is based on a more flexible relational design that is shaped by the data model. The relational design is a more abstract and flexible one, of course this comes with a cost, that the relational model is not optimal when we talk about reporting performance requirements.
Both methods have issues when dealing with changes in the systems. When loading the information, it is almost always necessary for the data to be clean and this is something undesirable
In order to minimize the impact of those issues the concept of Data Vault was introduced. Dan Linstedt, the creator of the method, describes the resulting database as follows:
“The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise” – https://danlinstedt.com/about/
So, we have a data warehouse modeling technique that might fit perfectly for what we need. Next step would be to search and find out how popular this concept is and assess the size of the developer community. We might want to find out more about data vaults and in time we might even have questions for which we would expect to find answers. We also know that creating these structures and loading them can be time consuming, so we are interested to know weather there are any tools build around data vault that can ease our work.
We know for sure that for the Kimball and Inmon methodologies there is a strong community and information can be easily found. Also there are several tools that were build around these techniques, tools that can be used for process automation.
On the other hand, when talking about data vaults we know that the concept is relatively new and the community is in the growing stage. However if we have a look at it’s description we can assume that this is something that needs to be taken into consideration, this is confirmed when we look at more and more companies that choose to use the data vault for their systems (see here some examples).
What are the next steps? Are there any available tools that can be used with data vaults, tools that can automate different processes? Of course we can use for example SSIS (SQL Server Integration Services) for making the load packages but this would take a lot of time for big and complex database structures. In a quick web search you will find that things are already moving and more applications are starting to appear. Further on I will present two applications that look very promising, which in some way will raise the value of the data vault concept.
Quipu – data warehouse management
“Quipu is an open source data warehouse generation system. With Quipu you can implement a data warehouse much quicker and easier. Quipu automates the data model design and generates the load code (ETL) to fill the data warehouse from source systems.
With Quipu you can simply and quickly generate and implement a source driven Data Vault, often referred to as source or raw Data Vault. Additional business value can be achieved by implementing a business Data Vault, where source data is combined in a Data Vault implementation of a single business model. Quipu supports both source and raw Data Vault architectures. (the multidimensional and relational approach )” https://www.datawarehousemanagement.org/Quipu.aspx
From the above description we can understand that this open source application can make the loading, creating and general data vault jobs simpler. As presented on the official website the most important benefits of this application are:
- faster integrated source data availability
- complete audit trail of source data and business view of (integrated) source data for the business environment
- automated data warehouse data model design and implementation and lower total cost of ownership using open source licensing model for IT environment
Demonstration videos of the Quipu application can be found on OSQuipu’s youtube channel.
Additional to what you can find on the official website there are various blog entries by BI Future on using Quipu, installation steps, sample models (Northwind) and other templates.
A great feature of Quipu is that it allows you to change the structure of the generated data vault, also it allows you to improve the staging model by identifying business keys, marking tables as reference tables or defining additional relations between tables.
Being able to automate the model generation and ETL process for staging areas and data vaults and also letting you identify business keys and allowing you to alter the model resulted makes Quipu a great tool for data warehouse management.
RapidACE – saving you time and money
RapidACE is presented as being an application that helps you “save time and money”. Even though there isn’t a release version yet, it looks very promising and worth to be mentioned. Seams that even Dan Linstedt (creator of data vault) is involved in developing this product, in setting vision, writing code and supporting customers. Knowing this, the tool seams to have a bright future.
Some of the strong points of RapidACE are:
RapidACE has a number of products that will be available in 2011.
- RapidGen – Rapid Generation of ETL and ANSI SQL views for loading data
- RapidMerge – Rapid merging, consolidation, and reduction of all classes of data models
- RapidVis – 3D data model visualization utility
- RapidBuild – Forward Engineering of data models (generate staging, Data Vault, and star schema data models), Reverse engineering of data models (generate 3nf, Data Vault based on a Star Schema)
- RapidModel – 3D Data model builder
Besides being a tool that can be use for generating and loading data vaults, RapidACE software can also be used for creating the staging model, star schema model, master data model and exploration model. These capabilities make the software even more interesting.
You can find a more detailed product description visiting the official website.
As mentioned earlier in the post, despite the concept is in an early stage, due to the increasing need for data traceability, real time data processing, historical data information and capability to change business rules with ease, the data vault looks like a good future solution for enterprise data warehousing.