- Making Sense of Public Data – Wrangling Jeopardy – Part 2 - Oct 27, 2014.
Wrangling Jeopardy (Part 2) describes the remaining steps of the data transformation process, detailing how we used Trifacta to structure, clean, enrich and distill Jeopardy data for analysis.
- Top stories for Oct 5-11: Analyzing Ebola spread; Data science shows surveys may assess language more than attitudes - Oct 12, 2014.
Analyzing Ebola - Is it spreading at exponential rate?; Data science shows surveys may assess language more than attitudes; Making Sense of Public Data - Wrangling Jeopardy.
- Salaries in IT – Scrape, refine, and plot case study - Oct 11, 2014.
Very good case study, showing how to scrape with import.io, refine with OpenRefine, and plot with Plot.ly. Also learn about salaries vs age in Belgium.
- Top KDnuggets tweets, Oct 6-7: Great TED talk by @KnCukier “Big Data is better data”; Top 10 One-Person Startups - Oct 8, 2014.
Great TED talk by @KnCukier "Big Data is better data"; Top 10 One-Person Startups; 7 critical elements of effective dashboards and visualizations; Making Sense of Public Data - Wrangling Jeopardy.
- Making Sense of Public Data – Wrangling Jeopardy - Oct 7, 2014.
Trifacta’s Alon Bartur & Will Davis detail their process for transforming or “wrangling” publicly available Jeopardy data found on the web for downstream analysis.
- Dataiku Data Science Studio - Aug 26, 2014.
Data Science Studio (DSS) from Dataiku is a complete Data Science software tool for developers and analysts,
which significantly shortens the time-consuming load-clean-train-test-deploy cycles of building predictive applications.
A community edition and a free trial available.
- InnovAccer: Simplifying Research and Analysis - Jun 5, 2014.
Innovaccer cleans and prepares data for analysis by researchers to save time and improve confidence in the quality of the data.
- Paxata automates Data Preparation for Big Data Analytics - Mar 7, 2014.
Paxata wants to shorten and automate the data cleaning process, by augmenting data from a huge number of sources and by using machine learning to see statistical similarities between the data imported.