MoDAT: Designing the Market of DATa – Workshop Report
An overview of MoDAT workshop on "Designing the Market of DATa" - key research ideas such as recommending expertise, chance discovery, "data jackets", privacy risks, and more.
By Yukio Ohsawa, May 28, 2014.
It was March 2013 that an academic competition of data analysts was broadcast in a Japanese TV program. Large scale data were provided by companies of SNS, car manufacturing, etc. were shared by competing analysts. Although this was interesting, I was surprised to find all the shared datasets had been closed by the time I opened the Web page of this competition after watching the program. Several days later, I came to understand the datasets were hidden as they were “felt” to be useful for "some" (undefined) business. In other words, business people do not like to open their data if there is uncertainty about the way the data may be used. The trend of Linked Open Data may look prevalent world-wide, but we should say data including chances in business are not easily provided free.
The Market of Data is an approach for sharing data with people who use, provide, or analyze them. That is, data as items in this market can be shared reasonably because people clarify latent values of data and reasonable conditions for sharing data e.g., opened free or shared for a price defined by negotiation.
Since 2013, we have been discussing and studying how to design this market, in the workshop of the Market of DATa (MoDAT). In the first workshop in 2013, supported by IEEE International Conference on Data Mining (ICDM-2013) at Dallas, a noteworthy presentation was the keynote by Hiroe Tsubaki from The Institute of Statistical Mathematics, about the valuation of partly disclosed datasets for prediction. The earnest attitude of audience proved everyone is interested in that fully disclosed data are still rare, and participants of the market of data should decide to buy a dataset without knowing its full content. Regular presentations also included essential papers, such as by Maruyama, Okanohara, and Hido who proposed Data Value Field, where people who provide/buy data and those who evaluate data interact. Here we learn that data collection and disclosure are a complex process involving costs and risks, where time affects the value of data.
Various data were discussed as components of the market. Dongping Fang, on behalf of five authors from IBM Thomas J Watson Research Center, presented a method for quantifying and recommending expertise when new skills emerge. Data about human resources are normally confidential considering privacy risks. However, once the market of data gets opened, such a method can trigger discoveries of new human talents. Linked to this, Teramoto and Nakamura from Volvo Group Trucks & Technology, pointed out the similarity between the market of data and quality function deployment (QFD), that have been employed for logical assembly of elements in manufacturing firms - especially of cars.
Here we focused discussions on the effect of QFD to enhancing and educating the thoughts of market participant. As we emphasized in workshops on Chance Discovery since year 2000, subjective data i.e., text data the cognition of viewers, can be used for meta-cognition toward externalizing latent values. From this aspect, the paper by Zhang and Wang from Chinese Academy of Sciences, about IdeaGraph plus that is an algorithm for aiding the perception of unnoticed events, showed an extension of KeyGraph(R) that has been used for chance discovery (discovery of "chances" -- uncertain events significant for decision making).
In the last session of the workshop, Innovators Marketplace® on Data Jackets (IMDJ: relevant paper was presented by Chang Liu and myself) was introduced, which is a gamified miniature market of data where players discuss how to share data or analysis methods and combine them. Here relevance among data jackets, that are small pieces of digest information about existing data of which the contents are hidden, are visualized in a graph. Looking at the graph, players propose scenarios to combine and analyze the datasets on the graph, so that others evaluate the interestingness of ideas. Find the two steps in IMDJ in the figure below:
Here is Step 2
As the photo shows, participants got excited with proposals toward creative analysis.
Methods for prediction, visualization, and creation may come to be core techniques in MoDAT. However, our strength is in that we connect these techniques to compose humans' process of communication, education, and innovation. Communication of stakeholders externalizes the value of data, which is an effect strengthened by visualization and by the construction of ontologies explaining variables in data and linking heterogeneous events in data (presented by Kushiro in MoDAT2013).
After all, we reached a conclusion: “Let us continue MoDAT!” with active people including innovators and educators.
We look forward to seeing all at MoDAT2014 in December, at IEEE ICDM 2014 in Shenzhen, China http://www.panda.sys.t.u-tokyo.ac.jp/MoDAT/2014.html.
Yukio Ohsawa is a professor of Systems Innovation in the School of Engineering, The University of Tokyo. He initiated the research area of Chance Discovery, and relevant series of international conference sessions and workshops. He edited books on chance discovery including "Chance Discovery" (2003) and also wrote "Innovators Marketplace: I Using Games to Activate and Train Innovators (Understanding Innovation" (2012). His research interests started from non-linear physics, and, via working in artificial intelligence, he initiated chance discovery and extended it to methods for innovation - applying his original methods of chance discovery and borrowing ideas from the dynamics in the real market.