MADlib: Big Data Machine Learning in SQL for Data Scientists

  • Open Source, commercially usable BSD license
  • Supports Postgres, Pivotal Greenplum Database, and Pivotal HAWQ
  • Powerful analytics for Big Data

Read More

Latest News

MADlib v1.6 Release Announcement

MADlib v1.6 is released and available for download.

New features include:

  1. A new unified ‘margins’ function that computes marginal effects for linear, logistic, multilogistic, and cox proportional hazards regression.
  2. A a new helper function to convert categorical variables using dummy encoding to indicator variables which can be used directly in regression methods.
  3. Multi-fold performance for cox proportional hazards and ARIMA.
  4. New functionality to export linear and logistic regression models as a PMML object, using the PyXB python library.

Various bug fixes include:

  1. A check in K-Means to ensure dimensionality of all data points are the same and also equal to the dimensionality of any provided initial centroids.
  2. A check in multinomial regression to quit early and cleanly if model size is greater than the maximum permissible memory.
  3. Error out when grouping columns have same name as one of the output table column names.

For a more detailed list of changes see the MADlib v1.6 Release Notes.

Access the binaries on the MADlib Download Page. As always the MADlib user forum is open for questions.

MADlib v1.5 Release Announcement

MADlib v1.5 is released and available for download.

New features include:

  1. Support for the Pivotal Distribution of Hadoop (PHD) via HAWQ.
  2. Updated design and improved usability for Conditional Random Fields (CRFs).
  3. Performance improvements for linear and logistic prediction functions.

Various bug fixes have been made including:

  1. Fixed elastic net prediction to predict using all features instead of just the selected features to avoid an error when no feature is selected as relevant in the trained model.
  2. For corner probability values, p=0 and p=1, in bernoulli and binomial distributions, the quantile values should be 0 and num_of_trials (=1 in the case of bernoulli) respectively, independent of the probability of success.

For a more detailed list of changes see the MADlib v1.5 Release Notes.

Access the binaries on the MADlib Download Page.

MADlib v1.4 Release Announcement

MADlib v1.4 is released and available for download.

New features include:

  1. Improved interface for Multinomial logistic regression.
  2. Robust variance and clustered variance estimators for Cox Proportional Hazards.
  3. NULL handling for various regression methods.

Deprecated functionality includes:

  1. Old mlogregr() function has been deprecated in favor of new mlogregr_train() function.
  2. Optimizer parameters for robust variance functions have been gathered into a single parameter instead of three separate parameters.  See documentation for details.

For a more detailed list of changes see the MADlib v1.4 Release Notes.

Downloads are available on the MADlib Download Page.

As always the MADlib user forum is open for questions.

MADlib v1.3 Release Announcement

Announcing the availability of MADlib v1.3 including the addition of Stratification support for Cox Proportional Hazards and improvements in NULL handling.

Binary packages are available for CentOS/RedHat and for Mac OS X. On other platforms, MADlib can be built from source. Our Wiki provides detailed instructions for deploying MADlib on PostgreSQL and Greenplum installations. For a list of new features, bug fixes, and known issues, please refer to the Release Notes.

As always, the MADlib forum is open for questions and discussions. Try it out and let us know about your feedback!

Older News