Follow Gregory Piatetsky, No. 1 on LinkedIn Top Voices in Data Science & Analytics

KDnuggets Home » News » 2014 » Jun » Opinions, Interviews » Exclusive: Raul Valdes-Perez on OnlyBoth, Scientific Discovery, Advice for Winners ( 14:n14 )

Exclusive: Raul Valdes-Perez on OnlyBoth, Scientific Discovery, Advice for Winners


Our exclusive interview covers OnlyBoth and Vivisimo startups, Scientific Discovery, legendary Herbert A. Simon, venture capital, Big Data, advice for winners, and more.



By Gregory Piatetsky, @kdnuggets, Jun 5, 2014.

Raul Valdes-PerezRaul Valdes-Perez is a leading researcher in Scientific Discovery, and now a serial enterpreneur, having co-founded his second company, OnlyBoth, in March 2014.
He co-founded Vivisimo in 2000, a search software company that provided enterprise products and web-based consumer services, and was its CEO for nine years and Chairman until its acquisition in 2012 by IBM.

While at Vivisimo he was named a 2007 Ernst & Young Entrepreneur of the Year for the North Central Region and a top ten reader favorite for Entrepreneur of the Year by Inc. magazine. Earlier, he was on the CMU computer science faculty, where he received 7 NSF research grants and published 50 journal articles and book chapters. He received a PhD from CMU in 1991, where his advisor was the Nobel Laureate Herbert A. Simon.
Raul authored the book "Advice is for Winners: How to Get Advice for Better Decisions in Life and Work" in late 2012, and is on several company and non-profit boards.

I had the pleasure of meeting him at several KDD and machine learning conferences, and we always had very interesting and stimulating discussions. I recently received his email about his new company OnlyBoth, and our discussions led to this interview.

Gregory Piatetsky: Q1. You recently started a new company called OnlyBoth, which you described as "IBM Watson in reverse". OnlyBoth is able to automatically find interesting concepts in data and phrase them in English. What are some of the most interesting things it found? What do you see as promising applications for OnlyBoth?

Raul Valdes-Perez: My co-founder Andre Lessa and I started OnlyBoth OnlyBoth in March 2014. The OnlyBoth technology discovers new insights in structured data and writes them up in perfect English.

In its application to colleges data, most of the insights are novel to me, even for colleges I know well. A favorite of mine is that Harvard has the highest diversity and inclusion ratio (0.51) of all the 60 colleges that are members of the American Association of Universities, surpassing Berkeley (0.34), Iowa State (0.34), and the rest. When you look at the definition of diversity and inclusion ratio, this is pretty interesting, and surprises me.

Other favorites are insights that reveal operating efficiencies, or their lack.

For example,
  • "U of Wisconsin-Madison has the most members of the National Academy of Sciences (39) of the 2,703 colleges with an average full-time teaching salary of at most $95,322."

 
I should reiterate that it's software that's both finding and writing these statements, not us.

GP: Q2. You started your professional career as a researcher at MIT AI Lab (1984-86) and Research Faculty at CMU (1991-2000). What were some of the most interesting problems you were working on then?

RVP: I did some work on spatio-temporal reasoning at MIT and published a AAAI paper from it. For my PhD work at CMU, I automated the discovery of reaction pathways from experimental data and background knowledge. After my PhD, I mostly worked on problems of automating scientific inference. Together with chemist, we published a paper that argued, on the basis of results with the program (called MECHEM for Me Chemist), that the number of equally-simple pathway models consistent with the available evidence and background knowledge was larger than chemists suspected.

Sometimes these ideas led to non-science applications, such as a better text-clustering approach that was the basis for founding Vivisimo Vivisimo in 2000. We came at this problem completely from left field, with no prior background in text processing or clustering algorithms, so devised a completely different approach, which worked better.

GP: Q3. Your Ph.D. advisor at CMU was the legendary Herbert A. Simon (Nobel Laureate and winner of many awards). What was he like to work with? What did you learn from him?

RVP: Herbert A. Simon Professor Simon was the most insightful person by far that I've ever known. You could ask him something about anything and walk away smarter. After he passed away, I contributed an article "Personal Recollections from 15 Years of Monthly Meetings" to an MIT Press memorial volume. My article tries to share especially valuable insights that I learned from him, both about conducting research and other things.

GP: Q4. What led you to move from being a researcher to starting a company (Vivisimo)? What surprised you the most about the transition from researcher to CEO?

RVP: We had a better way to do a task (text clustering) than others and wanted a way to get the technology out there and have an impact. Startups were in the air, because it was during the dot-com boom, or actually just after the bust of April 2000, but the bust didn't discourage us. Also, whereas others start the day with "good morning" my visiting scientist and eventual co-founder would say "when are we going to start a company?".

No great surprises about transitioning to CEO. My time at CMU was very entrepreneurial in many ways, and that transferred over to Vivisimo. Of course, there are many business subjects that one knows nothing about and has to learn, but that wasn't a surprise!

I credit Pittsburgh's Jack Roseman with the best definition of an entrepreneur that I've heard: An entrepreneur is someone who sees the whole world as a resource.

GP: Q5. Tell us more about Vivisimo - What was the main initial focus of Vivisimo? How did it evolve ? What happened to Vivisimo technology after its IBM acquisition by IBM in 2012?

RVP: The initial focus was to productize the university-developed code that we had, get NSF SBIR and state funding, develop public demonstrations of the technology, and identify the initial market to go after.

We launched Clusty.com - a clustering web search engine based on meta-search and some of our own web indexing, around 2004, both as further tech demonstrations and later as a money-maker through placement of ads. But these were always sidelines; our main focus was the enterprise market, which we concentrated on more and more. Vivisimo's products and technology are today robust IBM products, under new names, and are also part of the huge Watson initiative.

GP: Q6. You are now also involved in several VC companies. What is your role there and what do you look for in new startups?

RVP: I'm a limited partner, i.e., passive investor, in North Atlantic Capital in Maine, which invested in Vivisimo, in Riverfront Ventures in Pittsburgh, whose principals also backed Vivisimo while at another group, and in Blue Tree Allied Angels. Others evaluate the opportunities; at most, I'll make an individual decision whether to invest. So I'm not an active investor, but if I were, certainly the qualities of the individuals involved would be my #1.

GP: Q7. You were at MIT AI Lab in 1984. What are the most significant developments in the last 30 years in Machine Learning and KDD ? What surprised you the most? Which areas have disappointed in their progress?

RVP: I confess that I find many PhD thesis topics now rather dull. Automation of human capabilities is what attracted me to AI. Taking a novel task and trying to automate it, or at least provide a sophisticated computational aid, is rather out of fashion these days. The criticism was always where's the generality in that, but I think that conceptual generality can be found, for example, in design principles that hold over a certain category of tasks, or other qualitative generalizations of greater or lesser scope.

GP: Q8. What do you expect in the next 30 years? Will we reach Singularity?

RVP: One valuable thing I learned from Prof. Simon is that human beings are poor at predicting the future. Who predicted the fall of the Soviet Union? Nobody. So I don't try.

(GP: actually, there were a few Soviet dissidents, like Andrei Amalrik, who predicted the fall of Soviet Union in 1970, but very few people believed them at the time)

GP: Q9. What do you think about current "Big Data" buzz ? How much hype and reality is there?

RVP: The phenomenon is certainly new. On one side, you have sensors creating data broadly, cheaply, and profusely that didn't exist before. On the other side, you have people creating text broadly, cheaply, and profusely (within our human limits), that didn't exist before. You can further pile on with technologies such as OnlyBoth that also create lots of text. For example, our initial colleges application creates, depending on what you count, between 5 and 20 million words of well-written, insightful sentences and paragraphs.

So the Big Data phenomenon, and the opportunities it creates, are new and genuine. In such cases, you can expect hype and bandwagon effects.

GP: Q10. What advice would you give to students considering study of Machine Learning and Data Science ?

RVP: My book Advice is for WinnersAdvice is for Winners claims that


advice must be based on knowledge of the advisee's circumstances and goals.
Otherwise, it's not advice, but principles, or motivational examples, etc. So I don't offer advice for invisible readers. However, I can state that students of those fields who don't also acquire a good understanding of the classical AI concepts of heuristic search in problem spaces will be needlessly handicapped professionally. The writings of Newell and Simon, among others, supply that conceptual understanding.

GP: Q11. Tell us about what led you to write the book "Advice is for Winners: How to Get Advice for Better Decisions in Life and Work".

RVP: As a result of starting a company without knowing anything about business, combined with the central role of "domain knowledge" in my AI education, I came to the conclusion that people often make inferior decisions, needlessly, because they don't proactively reach out for advice from others, or are not skilled at doing so. This observation applied to both work and life.

So I undertook a study of scholarly and other writings that are relevant, reflected on my own entrepreneurial and life experiences, studied the genre of "self-improvement" books, and wrote one, that is perhaps of a different style than other self-help books. The book teaches how to leverage the world's knowledge and experience through social (not social media) means.

GP: Q12. What do you like to do when away from a computer, smartphone, and internet? What interesting book you read recently?

RVP: When I’m away from a screen, I prefer people contact, e.g., I enjoy social conversation and banter. Also, I can’t really concentrate when in front of a screen, so I put it down in order to think hard or clearly about an issue.

I most like books that change how I see some aspect of the world. The book "The Innovative University: Changing the DNA of Higher Education from the Inside Out" by Clayton Christensen and Henry Eyring changed how I perceive U.S. universities: How they got to be how they are, and what the future holds for them. U.S. universities emulate the Harvard model of "more of everything". Harvard can ultimately afford it, but most can't. This is different than the business mentality of "let's find an ample niche where we can deliver a product or service efficiently at a good profit". Their book forecasts some disruptions coming.



Related:

Sign Up