Big Data Innovation Summit 2014 Toronto: Day 2 Highlights
Highlights from the presentations by Big Data leaders from Aviva, Canadian Imperial Bank, Royal College of Physicians and Surgeons of Canada, and University Health Network on day 2 of Big Data Innovation Summit 2014.
As many organizations are now working with unmanageably large data sets, the importance of using and maintaining an analytics platform which can cope with this scale of information is essential. This presents both a challenge and opportunity as organizations must identify patterns and gain actionable results in order to gain a crucial advantage over competitors. Big Data Innovation will help businesses understand & utilize data-driven strategies and discover what disciplines will change because of the advent of data. With a vast amount of data now available, modern businesses are faced with the challenge of storage, management, analysis, visualization, security and disruptive tools & technologies.
The Big Data Innovation Summit (June 4 & 5, 2014) was organized by the Innovation Enterprise at Toronto, Canada. Illustrated intermittently with case studies, interactive panel sessions and deep-dive discussions, this summit offered solutions and insight from the leaders operating in the Big Data space.
We provide here a summary of selected talks along with the key takeaways.
Highlights from Day 1.
Here are highlights from Day 2 (Thursday, June 5, 2014):
Masoud Charkhabi, Director and Shirin Mojarad, Data Scientist from Advanced Analytics, Canadian Imperial Bank of Commerce (CIBC) delivered an interesting talk on "Finding Focus Areas in Big Data". It was famously said that a wealth of information brings a poverty of attention. Big data can accompany this paralyzing effect that not only stalls progress but undermines confidence in the value of analytics. While the sheer volume of big data overwhelms the most scientific analytical methods, pockets exist that accommodate these scientific methods quite well.
Masoud elaborated on a framework that helps find focus areas in big data, and demonstrated that the problem is not information overload; it's filter failure. Citing research and surveys, they stated that market is currently bullish on Big Data.
Shirin stated that there are 2 approaches in Analytics, which she referred to as "Stone Age" approach and "Information Age" approach. She shared the following anonymous quote:
“The Stone Age was marked by the clever use of crude tools;
the Information Age has been marked by the crude use of clever tools”The "Stone Age" approach in Analytics uses Basic Tools + Clever People. Quite a few companies follow this approach which does hold true to some extent - as far as you are looking for the "known unknowns" only. Whereas, the "Information Age" approach uses Clever Tools + Basic People. By putting the majority of technical expertise in the tools, such organizations have a better distribution of departments with analytical capabilities. Also, this approach is better suited for Big Data, where you look for a signal in huge amount of noise, and are often looking for the "unknown unknowns".
It is very important to identify the focus areas and ask the right questions. The focus should be on the biggest and highest value opportunities, and within each opportunity, start with questions, not data. Early focus should be on areas that involve no more than first or second degree inferential analysis.
Finally, by walking through a practical example, they outlined the following suggestions for Analytics processes:
- Quantify questions
- Explore validity
- Keep track of surprises
- Focus on finding differentiators
- Explore segments
- Explore relationships
David (Dave) Perfetti , Chief Information Officer, Royal College of Physicians and Surgeons of Canada gave a thought-provoking talk on "Big Data Gone Bad, or Bad Data Gone BIG?". Over the course of his talk, Dave explored various aspects of Big Data. Challenging the paradigm and industry hype, he presented a provocative view of analytics. After providing a quick overview of the Royal College, he talked about the role of questions and user's beliefs in Big Data. The big change implied by Big Data is bound to resistance, as the employees worry about loss of control, uncertainty, competence concern, etc.
He asked: Is Big Data really the answer or are we just taking bad data creating massive analytics? As a result, are we taking small problems and turning them into BIG ones? Addressing the ultimate question: "how to generate true business value?", he shared some tools, techniques and thinking models, to help the audience understand various perspectives when it comes to analytics and the roles they play in decision-making.
Peter Rossos , Chief Medical Information Officer , University Health Network talked about "Big Data In Healthcare – Issues & Opportunities". For a number of reasons healthcare lags decades behind other sectors in leveraging and adopting information and communication technologies. Big Data has potential to contribute to health service transformation by enabling improved access, quality, safety and personalization of care while containing costs and improving operational efficiencies.
He outlined the Big Data opportunities in Healthcare as: Quality, Safety, Access, Efficiency, Cost, Outcomes (reporting and optimization), Analytics (personalized medicine, clinical decision support) and R&D. The new sources of health data include genomic data (gene sequencing data), streamed data (home monitoring, tele-health, bio-sensors) and clinical data (80% unstructured documents, images, clinical or transcribed notes). He highlighted the following major challenges:
- Collecting data (structured vs. unstructured, data quality)
- Aggregating data (interoperability, privacy, ownership)
- Analyzing data (data discovery, interpreting results)
- Consuming data (alignment of payers and providers)
Most legacy systems were not initially designed for clinical care. They were focused on visits rather than patients. Those systems had complicated work-flows and poor inter-operability. Currently, the healthcare industry is facing major challenges around standards, inter-operability, customization, marketplace fragmentation, regulation and user adoption. Big challenges require big resources, and thus, need for the power of community (eg. World Computing Grid). In conclusion, he recommended eHealth initiatives to focus on: Design (empower patients to boost self-efficacy, connect providers to reduce medical errors), Technology (leverage economies of scale, support & fund innovation), and Governance (national standards & strategy, international comparison & benchmarking).
Hashmat Rohian, Assistant Vice President, R&D, Aviva gave an interesting and insightful talk on "Why We Should All Love Graph Analytics & Stop Worrying". Today's complex data is big, variably-structured and densely connected. Hashmat explained how size, structure and connectedness have converged to change the way we work with data. Connected data is prevalent in social networking (as you mention), logistics networks (for package routing), financial transaction graphs (for detecting fraud), telecommunications networks, ad optimization, recommendation engines, bio-informatics, and in many other places. He shared the new opportunities for creating end-user value that have emerged in a world of connected data, through illustrations with graph analytics examples implemented using graph database.
Graphs are a very powerful tool for dealing with more complex data. He defined data complexity through the following equation:
Data Complexity = f (connectedness, size, structure)The benefits of a graph database include "miutes to milliseconds" performance, fit for the domain and business responsiveness (easy to evolve). Then, he walked through several examples such as geographical routes graph, internet networking graph, friendship graph, e-commerce graph (buying patterns & relationship among customers).
A graph database is one that uses graph structures with nodes, edges, and properties to represent and store data. Graph databases provide index-free adjacency. Examples of popular graph database: Neo4j, FlockDB, AllegroGraph, InfiniteGraph, OrientDB, etc. Neo4j is a NOSQL graph database with powerful traversal framework. It works with the Cypher query language over HTTP. Cypher is a human readable language that was purpose built for working with graph data (with inspiration from SQL syntax). It's a primary tool for building graph applications. Finally, he suggested that Graph Theory is particularly useful when we first want to gain some insight into a new domain and understand insights to extract from a domain.