Follow Gregory Piatetsky, No. 1 on LinkedIn Top Voices in Data Science & Analytics

KDnuggets Home » News » 2014 » Jul » Software » BIDMach machine learning toolkit ( 14:n18 )

BIDMach machine learning toolkit


BIDMach machine learning toolkit offers "rooflined" (optimized to the limit) compute primitives and competitive performance on learning tasks like regression, clustering, classification, and matrix factorization.



By John Canny, Berkeley, July 2014.

We are very pleased to announce the beta release of the BIDMach machine learning toolkit, version 0.9. The main page, which includes precompiled downloads for 64-bit Windows, Linux and Mac OSX, is here:

bid2.berkeley.edu/bid-data-project/ BID Data Project

BIDMach has several unique features:

Speed: BIDMach is currently the fastest tool for many common machine learning tasks, and the list is growing. When run on a single machine with graphics processor, BIDMach is faster than any other system (on a single node or cluster) for regression, clustering, classification, and matrix factorization. Every compute primitive has been "rooflined" which means its been optimized close to theoretical performance limits.

Checkout the benchmarks in: github.com/BIDData/BIDMach/wiki/Benchmarks

BIDMach machine learning toolkit

Scalability: BIDMach has run larger calculations on one node than most cluster systems: with a large RAID, it has run LDA (Latent Dirichlet Allocation) on a 10 TB dataset. BIDMach can also run on a cluster, and includes a new communication protocol called "Kylix" which gives nearly-optimal throughput for distributed ML and graph tasks. It currently holds the record for PageRank analysis of large graphs, and was 3-6x faster than any other system on 64 nodes.

Usability: BIDMach inherits a powerful command line/batch file interpreter from the Scala language in which it is written. It has the simplicity of R, Python etc. but with uniformly high performance. It fully taps Scala's extensible syntax, so that math written in BIDMach looks like math. BIDMach includes a simple plotting class, and we are adding "interactive models" which allow interactive tuning.

Customizability: BIDMach includes likelihood "mixins" to allow the qualities of basic models to be tailored to more specific needs. e.g. topic models can be tuned to favor more coherent or more mutually-independent topics.

Modularity: BIDMach favors mini-batch algorithms and includes core classes that take care of optimization, data sourcing, and model tailoring. Writing a new model typically requires writing a small generic model class with a gradient method. The learner classes take care of running the model, using sparse or dense data, running on CPU or GPU and single or double precision.

Related:

Sign Up