Tools and Guides
The Data Science and Public Policy Team at Carnegie Mellon University develops and maintains a growing suite of open-source tools and guides empowering data scientists, governments, and nonprofit organizations to create impactful and equitable data science solutions for social challenges. Our tools and guides are built on over a decade of experience collaborating with government and nonprofit partners on data science projects that help them better serve their communities.
Tools
Machine Learning
- Triage: Our data science platform. Triage enables machine learning experimentation and reproducibility. It covers everything from feature generation and model training to evaluation and governance.
- Post modeling: Our model selection solution. This tool enables the analysis and exploration of feature importance, performance metrics, and model outputs, to identify the best model to deploy.
- Aequitas: Our Bias and Fairness platform. Aequitas conducts bias audits on machine learning outputs and can mitigate bias.
- The Lorax: A simple tool that identifies individual feature importances from Random Forests classifiers.
Data Engineering
- Entity Deduplication: A standard interface that provides deduplication of entities in large databases with custom pre and post-processing steps.
- Ohio: Useful I/O utility to work with CSVs, postgres, and Python optimizing computational resources.
- Argcmdr: A wrapper to the argument parser that makes it easy to declare (hierarchical) command interfaces via Python.
- Dickens: Additional Python decorators implementing the descriptor interface.
Data Science for Social Good projects code
Guides
- Scoping: Our guide to assist in scoping a project that could have AI elements on it.
- Data Maturity Framework: Our easy-to-use questionnaire, helps to identify the level of data maturity an organization, agency, or government has; providing also with the steps required to change their maturity.
- Training materials: Resources generated for ML practitioners.