Tools and Guides

The Data Science and Public Policy Team (DSSG Lab) at Carnegie Mellon University develops and maintains a growing suite of open-source tools and practical guides. These resources empower data scientists, governments, and nonprofit organizations to build impactful and equitable data science solutions to address pressing social challenges.

Our tools are grounded in over a decade of hands-on experience partnering with public sector and nonprofit organizations to improve service delivery through data science.

Tools

Machine Learning

  • Triage A data science platform for machine learning experimentation and reproducibility. Triage supports the full pipeline —from feature engineering and model training to evaluation and governance.
  • Post-modeling: A model selection tool. Post-modeling enables exploration of model performance, feature importance, and output analysis to support informed model deployment decisions.
  • Aequitas: A Bias and fairness audit toolkit. Aequitas helps teams assess and mitigate bias in machine learning predictions, promoting fairness in model-driven decision-making.
  • The Lorax: A lightweight tool that surfaces individual feature importances from Random Forest classifiers for greater interpretability.

Data Engineering

  • Entity Deduplication: A standard and customizable interface for deduplicating records in large databases, with support for user-defined  pre- and post-processing steps.
  • Ohio: A utility that streamlines I/O operations across CSV, PostgreSQL, and Python workflows, with a focus on performance and efficiency.
  • Argcmdr: A lightweight wrapper around Python’s argparse module. Argcmdr simplifies the creation of hierarchical command-line interfaces.
  • Dickens: A collection of Python decorators that extend functionality using the descriptor interface.

Data Science for Social Good project’s code

Explore codebases from our Data Science for Social Good projects:

👉 Github repository

Guides

  • Scoping Guide: A practical resource for framing and designing data science projects that incorporate AI components.
  • Data Maturity Framework: An easy-to-use questionnaire helps to assess an organization’s data maturity. It also provides guidance on advancing to higher maturity levels.
  • Training materials: A collection of educational resources and materials developed for machine learning practitioners.