- Interview: Sastry Malladi, StubHub on Designing Big Data Architecture for the Unknown Future - Jul 28, 2014.
We discuss the Big Data architecture at StubHub, important factors in architecture design, hybrid approach of using Big Data along with traditional data warehouses, challenges, importance of meta-data and more.
- Containers: The Enabler of YARN - Jul 28, 2014.
The evolution of a data-center operating system is discussed along with the underlying challenges and approaches being followed. Containers play a big role in enabling the required abstraction and deliver additional benefits.
- Upcoming Webcasts on Analytics, Big Data, Data Science – July 29 and beyond - Jul 28, 2014.
Applications in R, Data-Driven Business, The Grammar and Graphics of Data Science, Data Mining: Failure To Launch, Hadoop and the Relational Database, and more.
- Upcoming Webcasts on Analytics, Big Data, Data Science – July 22 and beyond - Jul 21, 2014.
Data Visualization, Hadoop and Hadoop 2.0, R, Apache Spark, How Can Analytics Improve Business, The Grammar and Graphics of Data Science, Data Mining: Failure To Launch and more.
- Dear CIO, what you have is NOT a Data Lake - Jul 17, 2014.
Data Lakes are often the ideal structure of a company's big data, but the reality is that data is often split into data puddles. Xurmo seeks to eliminate this by integrating Data Virtualization into the Data Lake.
- GraphLab Create: large-scale machine learning platform for graph, structured, and text data - Jul 15, 2014.
GraphLab Create 1.0 brings large-scale machine learning capabilities to enterprises, and is the first to handle graph, structured, and text data in one platform.
- Upcoming Webcasts on Analytics, Big Data, Data Science – July 15 and beyond - Jul 14, 2014.
Hadoop, Data Curation, Text Mining, Driving business value with text analytics, SQL on Hadoop, Graph Analytics on Hadoop, Apache Spark, How Can Analytics Improve Business, and more.
- Upcoming Webcasts on Analytics, Big Data, Data Science – July 8 and beyond - Jul 8, 2014.
Machine Learning w. SAS and Hadoop, Data Lakes, Quality First, MongoDB, Text Mining, Driving business value with text analytics, Graph Analytics on Hadoop, and more.
- Top KDnuggets tweets, Jul 4-6: Cartoon: Facebook Data Science and happy cats; plyrmr makes R work seamlessly with Hadoop - Jul 8, 2014.
KDnuggets Cartoon examines happy kittens and Facebook emotion manipulation data science experiment; plyrmr package for making R work seamlessly with Hadoop; Useful for #DataScience: Simple script from setting up R, Git, and Jags on Amazon EC2; How companies use R to compete.
- Upcoming Webcasts on Analytics, Big Data, Data Science – Jun 30 and beyond - Jun 30, 2014.
CAP Theorem (key Big Data idea), Analytics and Machine Learning, SAS and Hadoop, MongoDB, Data Lakes, Data Visualization, and more.
- Top KDnuggets tweets, Jun 27-29: Google says Hadoop era is over - Jun 30, 2014.
Google says #Hadoop era is over, Google Cloud Dataflow can do much more; Machine learning, data mining, predictive analysis, and advanced analytics ~ same; Do you need a Masters Degree to become a Data Scientist? ; Larry Page: "if we data mined health care data, we could save 100K lives next year".
- 100 Big Data Companies Analyzed - Jun 29, 2014.
We analyze the CRN Big Data 100 for insights into trends in the future of Big Data companies, including changes in database solutions, active regions, and what industries are undergoing the most change right now.
- CRN 50 Emerging Big Data Vendors - Jun 26, 2014.
We examine CRN top 50 Emerging Big Data Vendors, with 65% located in Silicon Valley. The prototypical company is located in San Francisco and develops software for Hadoop analytics platform. Competition will be tough!
- Upcoming Webcasts on Analytics, Big Data, Data Science – Jun 23 and beyond - Jun 23, 2014.
Reducing employee churn, Wolfram language, Rise of Machine Learning, Analytics with Hadoop, Social Media Analytics for Healthcare, and more.
- KDnuggets Analytics, Data Mining, Data Science Software Poll – Analyzed - Jun 17, 2014.
We analyze the results of KDnuggets Software Poll, including correlations between tools, and relationships between commercial, free, and Hadoop/Big Data tools. We identify a potential capability gap. Download anonymized data and analyze it yourself.
- Upcoming Webcasts on Analytics, Big Data, Data Science – Jun 16 and beyond - Jun 16, 2014.
The Marriage of BI and Big Data, Unstructured Data on Hadoop, Future of Decision Making, Stopping employee churn, and more.
- YARN is All the Rage at Hadoop Summit 2014 - Jun 12, 2014.
Apache YARN, which enables much broader types of computations than MapReduce, is quickly becoming an integral part of Hadoop projects. We review best practices considerations for a YARN cluster.
- Upcoming Webcasts on Analytics, Big Data, Data Science – Jun 9 and beyond - Jun 9, 2014.
Data Mining FTL, Analytically Speaking with Dan Ariely, Solr, Hadoop, Cloud BI, Employee Churn, and more.
- KDnuggets 15th Annual Analytics, Data Mining, Data Science Software Poll: RapidMiner Continues To Lead - Jun 7, 2014.
With over 3,000 data miners taking part in KDnuggets 15th Annual Software Poll, RapidMiner continues to lead. Free software is used much more outside US, and Hadoop usage grows fastest in Asia.
- Upcoming Webcasts on Analytics, Big Data, Data Science – Jun 2 and beyond - Jun 2, 2014.
SQL-on-HaDOOP, BigML, ClearStory, Analytic Maturity with Dean Abbott and TIBCO, Just Enough Math, Analytically Speaking with Dan Ariely, Data Mining FTL, and more.
- Top KDnuggets tweets, May 30 – Jun 1: Guide to Setting Up an R-Hadoop ; 100+ Interesting Data Sets - Jun 2, 2014.
Tutorial: Step-by-Step Guide to Setting Up an R - #Hadoop System; 100+ Interesting Data Sets for Statistics (and Data Science); #BigData sets available for free - big list from Data Science Central ; Twitter to release all tweets to scientists - a research boon and an ethical dilemma.
- Big Data Use Case: Zookeeper at Rubicon Project - May 27, 2014.
What is the big idea with ZooKeeper - a summary of an excellent Big Data use case using Apache ZooKeeper for Hadoop implementation.
- Upcoming Webcasts on Analytics, Big Data, Data Science – May 26 and beyond - May 26, 2014.
Purchase history to customer projects, Hadoop, YARN, BigML, Amazon Redshift, ClearStory Data, and Analytically Speaking Featuring Dan Ariely - author of Predictably Irrational.
- Upcoming Webcasts on Analytics, Big Data, Data Science – May 19 and beyond - May 19, 2014.
Data Mining: FTL; Deep Learning with H2O; Purchase history to Customer Projects; Apache Hadoop, Hive, Kafka, Solr; Python for Big Data Analytics, and more.
- Poll Results: Data Types/Sources Analyzed - May 17, 2014.
Trends in data sources for data mining include: table data dominates, followed by time series and text; audio, JSON grows in popularity, while itemsets decline; 70% access DB engines, but only 20% access NoSQL stores; Hadoop, MongoDB used more for text; Europe is lagging in NoSQL usage.
- Top KDnuggets Tweets, May 14-15: Easier Facebook Network Analysis; Cloudera Live, a New Way to Start with Hadoop - May 16, 2014.
Facebook Network analysis, visualization is easier with httr from R wizard; Cloudera Live offers a new way start with #Hadoop - No downloads; Watch: Basics of Machine Learning ; BigML Machine Learning platform Spring Release.
- Upcoming Webcasts on Analytics, Big Data, Data Science – May 12 and beyond - May 12, 2014.
Who Owns the Data, The New Database Frontier, Analytically Speaking, Data Mining: Failure To Launch, Deep Learning with H2O, Purchase history to Customer Projects, Hadoop and YARN, and more.
- Cartoon: Data Scientist Salary Negotiation - Apr 29, 2014.
New KDnuggets Cartoon looks at Data Scientist Salary Negotiation situation.
- Top KDnuggets tweets, Apr 23-24: It does look similar, but …; Why people are bad at technology predictions - Apr 25, 2014.
#BigData Cartoon: "It does look similar - but this one is powered by Hadoop"; Great list: 9 Python Machine Learning Books; Why people are bad at technology predictions; Too busy recommending things to experience them.
- Upcoming Webcasts on Analytics, Big Data, Data Science – April 21 and beyond - Apr 21, 2014.
Traditional RDBMS Wisdom is All Wrong, What Is Hadoop and Where Is It Going, Trupanion and SiSense, Measuring Skill Level and Optimizing Player-Matching Algorithms in Online Games, and Analytically Speaking Featuring David Meintrup.
- Microsoft Expands Big Data Platform - Apr 21, 2014.
Microsoft expands its data platform with 3 major features: SQL Server 2014 with in-memory technology, Azure Intelligent Systems Service, and Analytics Platform System - SQL Server + Hadoop. New CEO Satya gives low-key but impressive presentation.
- Exclusive Interview: Michael Brodie, Leading Database Researcher, Industry Leader, Thinker - Apr 21, 2014.
We discuss the most important database research advances, industry developments, role of relational, NoSQL, Graph databases, Computing Reality, and more.
- Apache Spark, the hot new trend in Big Data - Apr 18, 2014.
Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. Leveraging Hadoop Yarn, Alpine has made it very simple to get started with Spark.
- Upcoming Webcasts on Analytics, Big Data, Data Science – April 10 and beyond - Apr 10, 2014.
In-Database Scalable R & Python, Why Analytics Belongs in the Cloud, Data Mining: Failure To Launch, Pivotal Big Data Suite, What Is Hadoop and Where Is It Going, Optimizing Player-Matching Algorithms in Online Games, and more.
- Interactive Big Data Timeline - Apr 8, 2014.
A very interesting interactive Big Data timeline takes you from the beginning of information overload in 1880s to Business Intelligence, World Wide Web, Hadoop, Cloud, and more.
- Book Review: Data Just Right - Apr 7, 2014.
An introduction to technology and software at play in the current quest to define the Big Data Analytics computing paradigm, the book Data Just Right is reviewed in detail here.
- Upcoming Webcasts on Analytics, Big Data, Data Science – April 3 and beyond - Apr 3, 2014.
HP Vertica, Hadoop Data Warehouse with Impala, In-Database Scalable R & Python, Data Mining Failure To Launch, What Is Hadoop and Where Is It Going, and more.
- Top KDnuggets tweets, Mar 26-27: Watch “Statistics with R for newbies”; Coursera free #DataScience courses - Mar 28, 2014.
Also free ebooks on Practical Machine Learning: Innovations in Recommendations, and Apache Hive - How to access big data on Hadoop with SQL/HiveQL.
- Is Data Scientist the right career path for you? Candid advice - Mar 28, 2014.
Candid advice from an industry veteran reveals the true picture behind the much-talked-about Data Scientist "glamour" and helps people have the right expectations for a Data Science career.
- Upcoming Webcasts on Analytics, Big Data, Data Science – Mar 27 and beyond - Mar 27, 2014.
Best Practices in Predictive Analytics, Best Decision Trees with Angoss 9, Thick Data, In-Database Scalable R and Python, Data Mining - Failure to Launch, Algorithms in Online Games, and more.