Granting Access to Data
We work with data from project partners in two ways:
- Some partners prefer we do our work on their systems. From the partner’s perspective, this may have significant benefits: The partner retains control of the data, and it is easier to deploy our work at the end of the project.Partners who choose this approach need to provide us with the computational resources necessary to handle our machine-learning pipeline. For most projects, we can do well with 2-4 cores, 16-32 GB of RAM, and 500 GB of disk space. The more computational resources we get, the faster we can build good models.
We use all free amd open-source software, including the following:
- We use linux command-line tools
- Python (numpy, pandas, scipy, scikit-learn at a minimum)
- Postgres. We can use other database systems, but it will slow our work.
If you have questions about any of this, please let us know.