Just saw a quality video on an end solution by Red Point that provides a high end solutions that abstracts out the complications of Hadoop, Yarn and infrastructure. See this presentation on the importance of a data lake
My understanding was that Red Points approach is to store all the data in raw format in Haddoop. A refined cluster of the data would also be stored and they had a way of managing all the complex keys that would link various elements of data.
Their technology would manage the complications of shifting and storing the data. Further when querying data they would not use and complex map reduce code, but rather a Binary file would be deployed by via YARN. This file would be created via a visual programming interface.
The Red point solution promised less developer time for faster procession / compute in the Hadoop environment.
Worth keeping an eye on this technology as its got some promise, particularly given how complex map reduce can be. The only other consideration is how well this technology would work with something like Spark and Scala.
Scala from what I can see seems to bridge the world of the data scientist, data analyst and programmer quite well. Providing yet another paradigm in the big data world!