This repo contains a set of resources, tutorials and general best practices that our team has developed and continues to refine for effectively collaborating as a group of data scientists. Our team relies heavily on the Open Data Hub (ODH) project (which we also recommend), so many of our examples will use that toolbox. But our hope is that there is enough generally applicable content that this information can be helpful to those outside the ODH ecosystem. Further, we invite others to contribute their best practices and implementation alternatives : )
- What is the Open Data Hub and Operate First?
- Access JupyterHub
- Manage data with remote storage
- Setup CI
- Use Thoth tooling to enhance development
- Best practice for contributing as a Data Scientist
- Build ML pipelines from notebooks
- Track your metrics and experiments
- Share reproducible notebook images
- Quickly deploy interactive environments and JupyterBooks
- Create interactive dashboards to visualize results
- Getting Started with Data Science
- Monitor JupyterHub environment workloads
- Serve your model with Seldon
- Create custom serving images with Seldon
- Tips for starting a new ML project from scratch
- Recommendations for structuring an E2E ML project
- Template for writing a project document
- Simplify Project Management with GitHub project boards and issue
This project is maintained as part of the Operate First and Emerging Technologies group in Red Hat’s Office of the CTO. More information can be found at https://www.operate-first.cloud/.
Have a question? Open an issue in this repo or join us on Slack in the #data-science channel. :)