Mentor: Nicholas Lind, Engagement Manager, Strategy at Deloitte Consulting
Team members: Megan Jacob, Rahul Chandra, Riteka Murugesh
Build a distributed machine learning pipeline in Pandas and Dask using gigabytes of retail data from a large retail company. Team will learn ML, CI/CD, and data engineering skills applicable to the real world.
Technologies Used : pandas, dask, various modeling algorithms (xgboost, lightGBM, catboost, prophet), seaborn, GitHub, Trello [note that the team is free to suggest alternative technologies]
Final Deliverable: presentation in Jupyter / PowerPoint describing your exploration approach, modeling techniques, final results, and considerations for the future
How much experience does your group have? Does the project use anything (art, music, starter kits) you didn't create?
CodeDay Labs advanced-track team