Posted in GDPR, Microsoft, Sitecore, Uncategorized

Getting Ready for GDPR

The GDPR (General Data Protection Regulation) is a set of rules reforming privacy and security regulations that takes effect on 25th May, 2018. There are severe penalties for breaching the GDPR regulation and these can reach as high as 20 million euros or 4% of turnover. We can either look at this as a pain in the backside or as an opportunity

Why GDPR regulation necessary?

GDPR defines personal data as anything that can be used to directly or indirectly identify the person. Names, photos, email addresses, bank details, posts on social networking websites, medical information or even IP addresses.

Since then, Internet usage has become a great deal more widespread, and technological advances such as cloud storage and social media have changed the way data is processed and transferred. The rules needed updating, they needed to be uniform, and they needed to be applied more rigorously.

Business Impact:

  1. Appoint one of your directors to be accountable. This person should be suitably competent to handle the technicalities involved, and it’s worth considering where you want the accountability to fall – with IT, legal, marketing or elsewhere.
  2. Ensure you have safeguards in place: procedures to ensure data is confidential, accurate, available when necessary, backed up and encrypted.
  3. Ensure your suppliers are GDPR-compliant. Any service provider you use to process data has to comply with GDPR standards – and ensuring they do is on you.
  4. Ensure your customers, clients or website users have explicitly consented to their data being stored. This is a significant change, and most current measures are not sufficient. Your records need to prove that users have agreed to you storing their data – and failing to disagree is not enough. Crucially, users will also have a statutory right to have their data erased permanently from your records – so you’ll need the capacity to do that too.
  5. Ensure you’re explaining to users, in plain language, what data you’re holding, how long you’re holding it for, and how users can withdraw their consent. Your policy has to be simple and appropriate, as well as containing all the required information.
  6. Report breaches. Under GDPR, any breach of data protection must be reported to the Information Commissioner’s Office within 72 hours. You’ll need a robust process for detecting, reporting and responding to data breaches.
  7. Be prepared for more access requests. As people become more aware of their data privacy rights, they are likely to query the data you’re holding, and you’ll need to turn those requests around in good time.
  8. Ensure that any IT / Marketing related project has relevant process in place to screen against GDPR regulation. E.g. Agile project with user stories or IT projects with necessary risks highlighted and mitigated against

Key steps for preparation

  • Awareness, create necessary user stories for Agile projects / Make it part of requirements for all new projects. Log as Risks in projects that and make sure its mitigated
  • Keep track of all personal data you hold and where it came from, from a website perspective this could be:
    1. Contact Us Forms
    2. Newsletters
    3. Even signups
    4. User Registration
    5. Orders / Donations if ecommerce enabled
    6. Sharing / comments on blogs
  • Update your privacy statement, incorporate how rights will be adhered too
  • Check for the following rights are addressed for individuals
    1. the right to be informed;
    2. the right of access;
    3. the right to rectification;
    4. the right to erasure;
    5. the right to restrict processing;
    6. the right to data portability; (NEW)
    7. the right to object;
    8. the right not to be subject to automated decision-making including profiling
  • If storing data particularly in the cloud or with external suppliers make sure data is:
    1. Encrypted at rest and in motion, use https when submitting details.
    2. The encryption keys should be managed by the organisation and the the SaaS vendor
    3. Evaluate each SaaS offering to make sure it complies with GDPR
Posted in Uncategorized

Learning Tree Beta Course:- Apache Spark with Scala for Big Data Solutions

I was quite fortunate to be awarded a spot with this Beta Course with learning tree. Basically learning tree test out their courses before making them live.

The course instructor Max Van Daalen was very knowledgeable and had made good use of Hadoop and Spark with his work at CERN.

The key infrastructure for the learning environment was built around CentOS we had our own Hadoop cluster and local environments that had Scala and Spark configured.

The course certainly helped my understanding around big data. There are obviously a lot of technologies out there however to keep things simple from a technology perspective most of these work on the paradigm of parallel computing. Basically to process large sets of data e.g. a Peta byte file, its not possible in some cases to even load the file on a single machine. In order to deal with this file you can spread it over multiple machines and get each machine to work on its portion of the file.

What has really opened up in the last few years are the number of solutions with quite a few being open source that allow you to utilise this paradigm. Commodity hardware or the use of cloud computing makes purchasing / deployment of a cluster of computers very cost effective.

One big change that occurs from a programming perspective is how you can take advantage of the parallel architecture and express succinctly the processing request. Functional programming which was around in the 1970’s seems to be the way to go. Scala, which is a functional programming language with some nice hybrid options based around java provides an elegant mechanism for programmers to express their intent.

The course covers Scala at good depth whilst also exposing us to various libraries provided by Apache Spark. Spark is a very powerful open source platform with advanced data analytics capabilities.

My favourite exercise on the course was using spark streaming in conjunction with the Twitter API to perform real time monitoring of social media. We could further enhance this with technologies like Kafka that allow a cluster of machines to ingest various streams

Whilst there was quite a bit covered in the course I think the following are my key takeaways:

  • The concept and usages of RDD’s (Resilient Distributed Datasets), Data Frames and Transformations. These are generally core to how Spark works
  • The concept of HDFS, Hadoop and other infrastructure particularly the Java Virtual Machine (JVM)
  • SCALA particularly concepts around Case Classes, flexible management of collections and clever functions like Map and Flat Map that can be used with various data structures. Some clever examples were provided particularly using tuples

Overall it’s a course I would recommend and certainly has helped me get a better foundation around Big Data.

Posted in Uncategorized

Red Point and Big Data

Just saw a quality video on an end solution by Red Point that provides a high end solutions that abstracts out the complications of Hadoop, Yarn and infrastructure. See this presentation on the importance of a data lake 

My understanding was that Red Points approach is to store all the data in raw format in Haddoop. A refined cluster of the data would also be stored and they had a way of managing all the complex keys that would link various elements of data.

Their technology would manage the complications of shifting and storing the data. Further when querying data they would not use and complex map reduce code, but rather a Binary file would be deployed by via YARN. This file would be created via a visual programming interface.

The Red point solution promised less developer time for faster procession / compute in the Hadoop environment.

Worth keeping an eye on this technology as its got some promise, particularly given how complex map reduce can be. The only other consideration is how well this technology would work with something like Spark and Scala.

Scala from what I can see seems to bridge the world of the data scientist, data analyst and programmer quite well. Providing yet another paradigm in the big data world!




Posted in Uncategorized

Starting my journey in learning about Big Data

In my quest to identify the key players in the Big Data space, I investigated the Gartner Magic quadrant for web analytics

Figure 1.Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics
Gartner Magic Quadrant for web analytics

I then looked into courses that would help me learn about the field. IBM’s Big data university provided some great tutorials and I have already earned a few badges after passing the online learning tests.

The great thing about the course was that you could easily setup VMware to try out and play with the product and solutions that IBM provide.

The next really good industry resource that I stumbled upon was this video Elephant Riders where some of the key Big Data players debate about the relevance of Hadoop. The key companies present in the panel were MapR, Cloudera, HortonWorks and Continuity. The video is worth watching just to see the Elephant bouncing on the trampoline.

Overall this is a great to start to my journey in learning and investigating more about Big Data.


Posted in Uncategorized

A Check-list for Google Analytic’s Implementation

This is a quick list of post launch checks to make sure Google Analytic’s is Setup correctly:

Implementation of Google Analytics Tracking Code:

If the tracking code is not implemented correctly the website will not collect data. Ideally you should have a staging environment which has a different analytics account for testing. Make sure on go live that this code is updated. Also ensure that the code is not added in twice as this would pass the tracking check but would double count the data.

Ideally your website would use Google Tag Manager and all tracking code would be inserted via this method.

E-commerce Tracking Code

Make sure either the data layer for google tag manager is provided by the website or google ecommerce tracking has been implemented for all transactional sites.

If the site uses an external payment gateway cross domain tracking needs to be enabled otherwise all the data will show incorrect attribution. This type of situation will show lots of transactions associated with direct or referral traffic.

Implementation of Event Tracking Code

Ideally you would have google tag manager setup and the following are key areas to have even tracking:

  • Contact forms
  • Form Completion Abandonment
  • Tracking form errors
  • Outbound links
  • Newsletter subscriptions
  • Social media sharing buttons
  • Videos (for Play, Pause, Stop)

More customised Event tracking areas:

  • For chat – as part of the customer support service, a chat widget can be implemented within the website to track each time (for instance) visitors that have contacted with customer support.
  • For (online) booking forms – for hotel websites.
  • Product Ratings – for e-commerce sites.
  • Baskets and checkouts – for e-commerce sites
  • Phone calls

Setup Goals

Goals can be setup for key conversion elements of the site, note that once we create a goal is not possible to delete them and further we can’t add data retrospectively but only from the moment we set them up. Further Goals can have a value associated with them e.g. a newsletter conversion might be worth 100 points whilst a pdf download is worth 10 points.

Implementation of funnels

Funnels can allow us to visualise the goal journeys, for example the checkout process of an ecommerce site or a booking process


Filters provide a way of excluding data sets e.g. internal IP traffic or excluding a certain type of user. One google analytics view should be there that collects all data. Filters can also help manipulate data e.g. a converting the utm medium to lower case, by default google analytics will provide data as Upper and lower case.

Site Search

If there is a search box on the website enabling tracking on this setting will allow us the ability to see what visitors are looking for. The site search can be activated by a simple toggle in Google Analytics.

Linking Adwords and Google Webmaster tool with Google Analytics

Enabling this option will further enhance the information from other Google sources. For example we can see AdWords cost data directly in google analytics. Further we can get search engine optimisation data from webmaster tools.

Custom Alerts

These would allow us to be proactive around how any issues the website is having for example if the site has a bounce rate of > x% on a particular page send us an alert.

Advanced Segments

Advanced segments can allow us to better analyse a group of visitors by a specific characteristics e.g. all users from a mobile device, or all visitors that have been to a particular page.

Custom Reports/ Shortcuts

This report would allow you to group together various metrics and provide a quick way to access this data.


Dashboards are a collection of widgets that give you an overview of reporting and metrics that a user cares about most. The main advantage is that you get a quick view of all your key metrics in one shot. Further users can share these easily with others.