Tools and Guides

The Data Science and Public Policy Team at Carnegie Mellon University develops and maintains a growing suite of open-source tools and guides empowering data scientists, governments, and nonprofit organizations to create impactful and equitable data science solutions for social challenges. Our tools and guides are built on over a decade of experience collaborating with government and nonprofit partners on data science projects that help them better serve their communities.

Tools

Machine Learning

  • Triage: Our data science platform. Triage enables machine learning experimentation and reproducibility. It covers everything from feature generation and model training to evaluation and governance.
  • Post modeling: Our model selection solution. This tool enables the analysis and exploration of feature importance, performance metrics, and model outputs, to identify the best model to deploy.
  • Aequitas: Our Bias and Fairness platform. Aequitas conducts bias audits on machine learning outputs and can mitigate bias.
  • The Lorax: A simple tool that identifies individual feature importances from Random Forests classifiers.

Data Engineering

  • Entity Deduplication: A standard interface that provides deduplication of entities in large databases with custom pre and post-processing steps.
  • Ohio: Useful I/O utility to work with CSVs, postgres, and Python optimizing computational resources.
  • Argcmdr: A wrapper to the argument parser that makes it easy to declare (hierarchical) command interfaces via Python.
  • Dickens: Additional Python decorators implementing the descriptor interface.

Data Science for Social Good projects code

Github repository

Guides

  • Scoping: Our guide to assist in scoping a project that could have AI elements on it.
  • Data Maturity Framework: Our easy-to-use questionnaire, helps to identify the level of data maturity an organization, agency, or government has; providing also with the steps required to change their maturity.
  • Training materials: Resources generated for ML practitioners.