Distributed ML Pipeline in Pandas and Dask

CodeDay Labs 2020 ∙ 
Permalink
Mentor: Nicholas Lind, Engagement Manager, Strategy at Deloitte Consulting

Team members: Megan Jacob, Rahul Chandra, Riteka Murugesh

Build a distributed machine learning pipeline in Pandas and Dask using gigabytes of retail data from a large retail company. Team will learn ML, CI/CD, and data engineering skills applicable to the real world.

Technologies Used : pandas, dask, various modeling algorithms (xgboost, lightGBM, catboost, prophet), seaborn, GitHub, Trello [note that the team is free to suggest alternative technologies]

Final Deliverable: presentation in Jupyter / PowerPoint describing your exploration approach, modeling techniques, final results, and considerations for the future

How much experience does your group have? Does the project use anything (art, music, starter kits) you didn't create?

CodeDay Labs advanced-track team
151
98
73
91
 
Participation Certificate

Members

Megan J