So I’ve seen a lot of blogs recently talking about the Data Lake. What it is and what it means. My favorite has been Steve Todd's which gives a good high level over of what a data lake is. In the EMC open innovations lab (OIL) we are constantly working with Internal EMC BU’s, different parts of the federation (RSA, VMware, Pivotal), and customers to explore and pilot new technologies. Recently we began working with a customer who had a unique challenge. Their old legacy analytic systems and databases could not help them with their manufacturing needs. In real time they want to be able to run analytics on data coming off of the assembly line while also comparing it to historical data. This is the perfect use case for a data lake! As I prepped for this pilot I found lots of blogs that talked about the theory and the “why” you should build a data lake. I also found lots of internal EMC/Pivotal docs about how to install different piece parts, but no overarching documentation on how to build a data lake. As always this blog focuses on the how. This Pilot is just kicking off and as it proceeds I will be updating this blog with the architecture, how to deploy and configure it, load data, and run analytic applications. Should be a fun project!
Comments