Posted in Uncategorized

Red Point and Big Data

Just saw a quality video on an end solution by Red Point that provides a high end solutions that abstracts out the complications of Hadoop, Yarn and infrastructure. See this presentation on the importance of a data lake 

My understanding was that Red Points approach is to store all the data in raw format in Haddoop. A refined cluster of the data would also be stored and they had a way of managing all the complex keys that would link various elements of data.

Their technology would manage the complications of shifting and storing the data. Further when querying data they would not use and complex map reduce code, but rather a Binary file would be deployed by via YARN. This file would be created via a visual programming interface.

The Red point solution promised less developer time for faster procession / compute in the Hadoop environment.

Worth keeping an eye on this technology as its got some promise, particularly given how complex map reduce can be. The only other consideration is how well this technology would work with something like Spark and Scala.

Scala from what I can see seems to bridge the world of the data scientist, data analyst and programmer quite well. Providing yet another paradigm in the big data world!




Posted in Uncategorized

Starting my journey in learning about Big Data

In my quest to identify the key players in the Big Data space, I investigated the Gartner Magic quadrant for web analytics

Figure 1.Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics
Gartner Magic Quadrant for web analytics

I then looked into courses that would help me learn about the field. IBM’s Big data university provided some great tutorials and I have already earned a few badges after passing the online learning tests.

The great thing about the course was that you could easily setup VMware to try out and play with the product and solutions that IBM provide.

The next really good industry resource that I stumbled upon was this video Elephant Riders where some of the key Big Data players debate about the relevance of Hadoop. The key companies present in the panel were MapR, Cloudera, HortonWorks and Continuity. The video is worth watching just to see the Elephant bouncing on the trampoline.

Overall this is a great to start to my journey in learning and investigating more about Big Data.


Posted in Uncategorized

A Check-list for Google Analytic’s Implementation

This is a quick list of post launch checks to make sure Google Analytic’s is Setup correctly:

Implementation of Google Analytics Tracking Code:

If the tracking code is not implemented correctly the website will not collect data. Ideally you should have a staging environment which has a different analytics account for testing. Make sure on go live that this code is updated. Also ensure that the code is not added in twice as this would pass the tracking check but would double count the data.

Ideally your website would use Google Tag Manager and all tracking code would be inserted via this method.

E-commerce Tracking Code

Make sure either the data layer for google tag manager is provided by the website or google ecommerce tracking has been implemented for all transactional sites.

If the site uses an external payment gateway cross domain tracking needs to be enabled otherwise all the data will show incorrect attribution. This type of situation will show lots of transactions associated with direct or referral traffic.

Implementation of Event Tracking Code

Ideally you would have google tag manager setup and the following are key areas to have even tracking:

  • Contact forms
  • Form Completion Abandonment
  • Tracking form errors
  • Outbound links
  • Newsletter subscriptions
  • Social media sharing buttons
  • Videos (for Play, Pause, Stop)

More customised Event tracking areas:

  • For chat – as part of the customer support service, a chat widget can be implemented within the website to track each time (for instance) visitors that have contacted with customer support.
  • For (online) booking forms – for hotel websites.
  • Product Ratings – for e-commerce sites.
  • Baskets and checkouts – for e-commerce sites
  • Phone calls

Setup Goals

Goals can be setup for key conversion elements of the site, note that once we create a goal is not possible to delete them and further we can’t add data retrospectively but only from the moment we set them up. Further Goals can have a value associated with them e.g. a newsletter conversion might be worth 100 points whilst a pdf download is worth 10 points.

Implementation of funnels

Funnels can allow us to visualise the goal journeys, for example the checkout process of an ecommerce site or a booking process


Filters provide a way of excluding data sets e.g. internal IP traffic or excluding a certain type of user. One google analytics view should be there that collects all data. Filters can also help manipulate data e.g. a converting the utm medium to lower case, by default google analytics will provide data as Upper and lower case.

Site Search

If there is a search box on the website enabling tracking on this setting will allow us the ability to see what visitors are looking for. The site search can be activated by a simple toggle in Google Analytics.

Linking Adwords and Google Webmaster tool with Google Analytics

Enabling this option will further enhance the information from other Google sources. For example we can see AdWords cost data directly in google analytics. Further we can get search engine optimisation data from webmaster tools.

Custom Alerts

These would allow us to be proactive around how any issues the website is having for example if the site has a bounce rate of > x% on a particular page send us an alert.

Advanced Segments

Advanced segments can allow us to better analyse a group of visitors by a specific characteristics e.g. all users from a mobile device, or all visitors that have been to a particular page.

Custom Reports/ Shortcuts

This report would allow you to group together various metrics and provide a quick way to access this data.


Dashboards are a collection of widgets that give you an overview of reporting and metrics that a user cares about most. The main advantage is that you get a quick view of all your key metrics in one shot. Further users can share these easily with others.