MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
The MADlib mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development.
Apr 3rd 2012: A technical report outlining MADlib's architecture and design patterns is now available. It provides a description of various statistical methods and includes performance and speedup results of a core building block. Furthermore, the paper reports on two initial efforts at incorporating academic research into MADlib.
Feb 9th 2012: MADlib v0.3 is out! Binary packages are available for CentOS/RedHat and for Mac OS X. On other platforms, MADlib can be built from source. Our Wiki provides detailed instructions for deploying MADlib to PostgreSQL and Greenplum installations. For a list of new features, bug fixes, and known issues, please refer to the Release Notes. As always, the MADlib forum is open for questions and discussions. Try it out and let us know about your feedback!
Sep 14th 2011: MADlib v0.2.1beta is ready. See the Release Notes for details about the numerous bug fixes and improvements.
Jul 8th 2011: MADlib v0.2beta is ready for download. For detailed information about its content please refer to the Release Notes. Although the bulk of the code is standard SQL, some methods use platform-specific subroutines to achieve good performance. Initial targets are PostgreSQL and Greenplum; ports to other SQL platforms are expected. After beta release, we will invite participation from the broader community through a standard open-source contribution process.MADlib grew out of discussions between database-engine developers, data scientists, IT architects and academics, who were interested in new approaches to scalable, sophisticated in-database analytics. These discussions were written up in a paper in VLDB 2009 that coined the term "MAD Skills" for data analysis. The MADlib software project began the following year as a collaboration between researchers at UC Berkeley and engineers and data scientists at EMC/Greenplum.
Binary packages of the latest MADlib release (v0.3):
• Mac OS X 10.6 and higher:
Greenplum 4.0, 4.1, 4.2 / PostgreSQL 8.4, 9.0, 9.1 (64-bit)
• CentOS / Red Hat 5 and higher (64-bit):
Greenplum 4.0, 4.1, 4.2 / PostgreSQL 8.4, 9.0, 9.1
Source Code:
• Snapshot of development repository (unstable):
.zip
.tar.gz
• Latest stable release (v0.3):
.zip
.tar.gz
Installation guides can be found in the
MADlib Wiki.
Documentation for the latest release (v0.3):
• Users: http://doc.madlib.net
• Developers: http://devdoc.madlib.net
Pre-release documentation generated out of development repository:
• Users: http://doc.madlib.net/master
• Developers: http://devdoc.madlib.net/master
• Project Wiki:
MADlib Wiki.
• Project Roadmap:
https://github.com/madlib/madlib/wiki/MADlib-Roadmap
• Contribution Guide:
https://github.com/madlib/madlib/wiki/Contribution-Guide
• User forum:
http://groups.google.com/group/madlib-user-forum
• Developer forum:
http://groups.google.com/group/madlib-dev-forum
• Bug reporting and feature requests:
http://jira.madlib.net