I come from a background where I have built up and run as successful marketing agency. I am now interested in looking at new more innovative approaches, particularly in the area of Big Data, Business Intelligence, Artificial Intelligence and Statistics.
I recently had the pleasure of completing a Design Thinking Course that was recommended by my manager at Capgemini. Normally I would be inclined to take more technical courses so this was certainly different….
The key motivation was that at Capgemini employee’s should think outside the box and help come up with ideas that are novel for our clients. Design thinking is particularly strong for any form of human centric problem solving, it helps accelerate innovation to create better solutions for challenges facing business and society.
The course was based on IDEO’s design thinking approach and they provided a very comprehensive methodology whilst helping us solve a real-world case study. What was really interesting was how we could collectively come up with insights that were founded on observations and experiencing the process was great… what was more encouraging was that I found myself in the winning team (makes things even more fun)
Design thinking starts with people and we apply creative tools, like storytelling, prototyping and experimentation to deliver new breakthrough innovations.
When doing observations, it’s good to look at extreme cases and then based on fact based observations coming up with insights that are authentic, non-obvious and revealing. This needs to be followed by framing opportunities and this becomes a springboard for ideas and solutions.
Whilst at the course we came up with this novel thinking and more recently as part of an AIE project at Capgemini we were again able to make this leap making this a very powerful and credible approach.
The following a you tube video that was recorded whilst we were having fun doing some field research for one of UK’s most innovative brands:
It’s now been over a year since I have had the opportunity to work with Sitecore in a hands on capacity. As you may or may not know Sitecore has continued to make it on the top right hand corner of the Gartner Magic Quadrant.
Sitecore has a huge amount of functionality and in my opinion one of the most functionally rich and scalable content management platforms out there. More importantly it comes with a lot of key features that can be leveraged for GDPR conformance.
A key feature of Sitecore is to personalise content for its users and it can do it at a very granular level. Features such as displaying the most relevant content make Sitecore a pleasure to use. On the other hand I feel that many people might be under the misconception that by utilising Sitecore’s marketing features they will not be GDPR compliant.
The following are my list of key points that support my view that by good configuration its perfectly straightforward to become compliant.
Since version 8.2 IP addresses are already anonymised and are hashed, or there is the option not to save the IP address at all. Instead Sitecore can use a Cookie on the user machine, obviously this would need to be in the website privacy statement.
With XDB there is a way to find, update and remove personal data that’s been collected. This is great for managing a situation where people have the right to update their information or the right to get forgotten.
EXM support double opt in and verification of identify and there is a sophisticated List manager.
There are built in configurable audit logs to prove data has been added, updated or deleted, one can choose how long these logs are kept. More importantly they can be used for proof that data was actually deleted.
Data exchange framework for integration with other systems can be a powerful way to synchronize changes, including bidirectional sync.
Further Sitecore is built on infrastructure that is GDPR compliant, there is the option to use Mlabs for the XDB database and as its delivered as SaaS encryption of data at rest or in transit is provided as standard with both Amazon and Google cloud hosting options.
I am really getting into Sitecore and am excited about its capabilities, so am devoting more time learning about this amazing customer experience solution.
The GDPR (General Data Protection Regulation) is a set of rules reforming privacy and security regulations that takes effect on 25th May, 2018. There are severe penalties for breaching the GDPR regulation and these can reach as high as 20 million euros or 4% of turnover. We can either look at this as a pain in the backside or as an opportunity
Why GDPR regulation necessary?
GDPR defines personal data as anything that can be used to directly or indirectly identify the person. Names, photos, email addresses, bank details, posts on social networking websites, medical information or even IP addresses.
Since then, Internet usage has become a great deal more widespread, and technological advances such as cloud storage and social media have changed the way data is processed and transferred. The rules needed updating, they needed to be uniform, and they needed to be applied more rigorously.
Appoint one of your directors to be accountable. This person should be suitably competent to handle the technicalities involved, and it’s worth considering where you want the accountability to fall – with IT, legal, marketing or elsewhere.
Ensure you have safeguards in place: procedures to ensure data is confidential, accurate, available when necessary, backed up and encrypted.
Ensure your suppliers are GDPR-compliant. Any service provider you use to process data has to comply with GDPR standards – and ensuring they do is on you.
Ensure your customers, clients or website users have explicitly consented to their data being stored. This is a significant change, and most current measures are not sufficient. Your records need to prove that users have agreed to you storing their data – and failing to disagree is not enough. Crucially, users will also have a statutory right to have their data erased permanently from your records – so you’ll need the capacity to do that too.
Ensure you’re explaining to users, in plain language, what data you’re holding, how long you’re holding it for, and how users can withdraw their consent. Your policy has to be simple and appropriate, as well as containing all the required information.
Report breaches. Under GDPR, any breach of data protection must be reported to the Information Commissioner’s Office within 72 hours. You’ll need a robust process for detecting, reporting and responding to data breaches.
Be prepared for more access requests. As people become more aware of their data privacy rights, they are likely to query the data you’re holding, and you’ll need to turn those requests around in good time.
Ensure that any IT / Marketing related project has relevant process in place to screen against GDPR regulation. E.g. Agile project with user stories or IT projects with necessary risks highlighted and mitigated against
Key steps for preparation
Awareness, create necessary user stories for Agile projects / Make it part of requirements for all new projects. Log as Risks in projects that and make sure its mitigated
Keep track of all personal data you hold and where it came from, from a website perspective this could be:
Contact Us Forms
Orders / Donations if ecommerce enabled
Sharing / comments on blogs
Update your privacy statement, incorporate how rights will be adhered too
Check for the following rights are addressed for individuals
the right to be informed;
the right of access;
the right to rectification;
the right to erasure;
the right to restrict processing;
the right to data portability; (NEW)
the right to object;
the right not to be subject to automated decision-making including profiling
If storing data particularly in the cloud or with external suppliers make sure data is:
Encrypted at rest and in motion, use https when submitting details.
The encryption keys should be managed by the organisation and the the SaaS vendor
Evaluate each SaaS offering to make sure it complies with GDPR
Adobe and Microsoft have been friends for a while, obviously they have had their history with flash and silver light…
Recently Microsoft and Adobe announced a major strategic partnership where Adobe said it would make Microsoft Azure its preferred cloud platform, and Microsoft said it would make the Adobe Marketing Cloud its preferred marketing solution for Dynamics 365 Enterprise Edition.
This makes sense as Adobe does not have its own CRM solution and for Microsoft it provides a powerful SAS offering for digital marketing. The one question it does raise is where does it leave Sitecore?
Sitecore announced its own strategic partnership with Microsoft, with support for Sitecore on Azure and a major investment in joint commerce solution development that joins Dynamics for Retail with the Sitecore Experience Platform.
Whilst Adobe’s main competitor is Sitecore there are various modules within Sitecore such as PxM (Print Experience Manager) that are actually utilising the Adobe creative suit.
Further Sitecore is very much a framework rather than a SAS offering. So my take on this is that Microsoft wins regardless and the market wins too as there are various options available.
There is also developments with a new standard data model called XDA (Experience Data Model) Given how many “Experience” modules Sitecore provides this just slots right in with the Sitecore lingo…
It will be great to see how things pen out, but one thing is for sure as is reflected by Microsoft Share price growth of 60% in the last year, Azure is a great success and all the partnerships ensure even more consumption on the Azure Cloud.
Recently I had the opportunity to explore microsoft’s offering for managing Big Data. It was amazing to see how easy it was to setup a Hadoop & Spark cluster using the Azure HDinsights framework. There are numerous setups available and pre-made distributions are also available from vendors such as Cloudera and Hortonworks.
It’s great to see Microsoft not only embracing but also actively contributing to some of the BigData solutions in the Open Source community. Apache Spark is an open-source framework for cluster computing and one that I actively follow. I have seen regular contribution from Microsoft staff members towards this project.
The key advantage of using Azure HDInsights is that a Cluster of computers can easily be configured and made available within 20 to 30 mins. The default Spark cluster also comes pre-configured with lots of applications such as Ambari, PIG, HIVE, Flume, Kafka as well as a Jupyter notebook that will work with Python and Scala.
It’s worth noting that rather than the filesystem residing on commodity hardware, Microsoft would utilise Azure Blob Storage. The following are some common patterns on how we might use Microsoft Power BI (Data Visualisation Tool for Big Data which is similar to Tableau):
Option 1: (Hive to Data Visualisation tool)
Use a Hive Table and query this via an OBDC driver, note that with this approach the entire table is downloaded to PowerBI. In most cases the hive table will be derived by querying data in the Azure Blob Storage using a cluster of computers
Option 2: (Process Data-> Save to Azure Blob -> Analyse Flat File with Visualisation tool)
Another way to import data from HDInsight into Power BI is by connecting to flat files in either Blob or the Data Lake Store
In this situation use HDInsight to process your data and write the resulting curated or aggregated data into text files. Generally this will give better refresh performance as we bypass the ODBC driver.
Option 3: (This is in Beta) Direct Query with Spark Cluster
This option allows you to keep the data in the Azure Blob storage and utilise technology like Spark SQL to query the data, the summarised results are sent back to Power BI. This approach can allow for huge data sets to be analysed using a cluster of machines
Option 4: Direct Query with Azure SQL DB
DirectQuery using Azure SQL Database (DB) you would process your data in your cluster, but write the resulting data to tables in Azure SQL DB (or Azure SQL Data Warehouse). Power BI would take care of the data refresh as well as getting only the data that is required from the Azure SQL database.
In my view the most common implementation scenario is going to be utilising a cluster to mine the inital data and get aggregated results. This data would then reside in flat files on Azure Blob storage or Azure SQL Database. This approach will mean that the cost of keeping a cluster turned on will be at a minimum. Further tools like Jupyter notebooks can have prpre-builtcripts that can be easily modified for Adhoc data processing.
Let me know if there is anything you would like me to conver furter with Azure HDInsights, I also have access to some rolling microsoft credits to prototype sample solutions.
A few days ago I attended a meetup hosted by Outreach Digital, a diverse community of Digital Professionals in Europe.
The format for the challenge was simple:
– 1 mystery dataset
– X randomized teams
– 20 minutes to analyze the problem
– 10 minutes to create your solution .
..aaaaand 2 minutes to pitch
Federico set up the scene for the challenge and we were emailed a copy of the dataset. The objective of the exercise was to find a predictive modal that could help identify If a person survived the crash.
I found myself seated with a multi-disciplinary team and it quickly became apparent that we all came from a number of different backgrounds.
We took to the challenge by initially hypothesising on the story and looking to validate the hypothesis by analysing the data. It quickly became quite obvious that Men over a certain age were more than likely not to survive. Also females were more likely to survive and if they were in the top tier classes the chances were significantly improved.
We were also very lucky to have Francesco on the team as he had awareness of a software tool called KNIME, the tool allows you to provide a data set and works out a decision tree algorithm that would fit the dataset. Obviously there is some level of manipulation required to better utilise the tool. Also given the short time we did not bother getting into data set sampling or major cleaning of the data.
In the end we got a model that had an 83% likelihood to predict the outcome of the crash. This was actually quite good, one of the other teams manged 81%. Needless to say our team managed to win by the narrowest of margins.
One thing that really came out for me in the session was the importance of thinking about the story as that provided a good foundation for any modelling. It was great to have Pantea on the team as she really drove the story telling…
The whole team really contributed and engaged on the task and I am sure “Dream Team” will be up for a similar challenge in the not so distant future.
So… I am now into week 5 of the Coursera course on Functional Programming Principles in Scala. It’s been hard work so far as I have had to learn a lot more about functional programming.
Week 1 was really a challenge as I had to get my head around recursion. Particularly the coin counting assignment which I eventually figured out. What’s really been fascinating is that once you start “getting” Scala you realise how powerful the language really is. One starts to appreciate it’s elegance and expressiveness.
My initial reason for learning Scala was to understand Spark better as I see Spark as a key component for many Big Data Solutions. Spark is written in Scala and hence I felt the need to learn Scala.
Having learned the foundations of Scala, I am now debating on next steps. I have a number of choices either getting a role where I can do some hands on coding or building my own software product. Ideally, I would prefer the former provided I have a good team of people that I can work with. I also joined the Slack chat for Spark and there is a nice channel dedicated to Scala Algorithms (Scala_viz)
Coursera is yet to launch the Spark and Scala course but when it’s on I think it will be a really good course.
In terms of the Coursera course, I would highly recommend it. It’s challenging but you really do get a lot out of it particularly if you have not programmed in a functional manner previously.
In terms of gaining experience with Scala, there seems to be a shortage of Scala developers that have a few years experience. However, for the newbies, it’s the classic chicken and egg situation where it’s hard to find a junior role that will provide sufficient commercial experience. The great thing about coding is you can still build your own App or contribute to an open source project.
I was at the IBM Datapalooza Mashup day in June 2016. It was really interesting to learn how IBM go about capturing all the data and statistics at Wimbledon. The ability for the Wimbledon’s team of content creators to get interesting facts before the press is a real world example on how data can give a strategic advantage.
By predicting the type of records that are likely to be broken well in advance and then getting real time data when these records are broken give team Wimbledon an edge over any other news organisations.
There were various demonstrations on how IBM’s Bluemix product is being utilised to manage social media through its text analytics interface.
Another really interesting workshop was about a product called Node Red, this is a visual programming language ideal for Hackathons, it allows one to connect various rest services visually and if you need more flexibility one can get under the hood by editing the node.js scripts the application produces in the background.
I was quite fortunate to be awarded a spot with this Beta Course with learning tree. Basically learning tree test out their courses before making them live.
The course instructor Max Van Daalen was very knowledgeable and had made good use of Hadoop and Spark with his work at CERN.
The key infrastructure for the learning environment was built around CentOS we had our own Hadoop cluster and local environments that had Scala and Spark configured.
The course certainly helped my understanding around big data. There are obviously a lot of technologies out there however to keep things simple from a technology perspective most of these work on the paradigm of parallel computing. Basically to process large sets of data e.g. a Peta byte file, its not possible in some cases to even load the file on a single machine. In order to deal with this file you can spread it over multiple machines and get each machine to work on its portion of the file.
What has really opened up in the last few years are the number of solutions with quite a few being open source that allow you to utilise this paradigm. Commodity hardware or the use of cloud computing makes purchasing / deployment of a cluster of computers very cost effective.
One big change that occurs from a programming perspective is how you can take advantage of the parallel architecture and express succinctly the processing request. Functional programming which was around in the 1970’s seems to be the way to go. Scala, which is a functional programming language with some nice hybrid options based around java provides an elegant mechanism for programmers to express their intent.
The course covers Scala at good depth whilst also exposing us to various libraries provided by Apache Spark. Spark is a very powerful open source platform with advanced data analytics capabilities.
My favourite exercise on the course was using spark streaming in conjunction with the Twitter API to perform real time monitoring of social media. We could further enhance this with technologies like Kafka that allow a cluster of machines to ingest various streams
Whilst there was quite a bit covered in the course I think the following are my key takeaways:
The concept and usages of RDD’s (Resilient Distributed Datasets), Data Frames and Transformations. These are generally core to how Spark works
The concept of HDFS, Hadoop and other infrastructure particularly the Java Virtual Machine (JVM)
SCALA particularly concepts around Case Classes, flexible management of collections and clever functions like Map and Flat Map that can be used with various data structures. Some clever examples were provided particularly using tuples
Overall it’s a course I would recommend and certainly has helped me get a better foundation around Big Data.