Thursday, November 12, 2015

Synergy Scheduler

Synergy Scheduler started as a side-project to orchestrate data pipeline for a large project called... well, Synergy :) In late 2011 Scheduler got its first UI, a github repository, and my belief is that its two most attractive features were a custom db driver and a pythonic repeating timer.

Built by an engineer for engineers, Scheduler was rough and few outsiders were willing to like it. Every new deployment required anywhere from 30% to 50% of the codebase revisiting; features were rarely portable; maintenance was a nightmare.
It became clear that if Scheduler was to survive, it needed a significant overhaul to make it much simpler to operate and at least an order of magnitude easier to deploy and maintain.

An active development started in 2013 and for the next 18 months every release was very much backward-incompatible. While it was hardly seen as a news, the trend to productize Scheduler was positive. Below are the milestones:
  • v1.0 generalization! Gone were the days of total re-factoring - now you only needed to re-visit 25% of the codebase
  • v1.2 support for external configuration allowed Scheduler deployment as a Python .egg
  • v1.3 first version available via pypi
  • v1.7 configuration shrank from 7 to 2 files; introduction of the object-to-document mapping; new tiled UI
  • v1.11 garbage collector re-factoring; UI responsiveness
  • v1.15 Python3 compatibility
It is an interesting exercise to observe Scheduler's evolution thru the prism of its principal architectural decisions:
  • message queue: initially used for one-way communication with its subsidiaries, it ultimately became a two-way information highway keeping Scheduler up to date
  • document-based persistence: allowed quick development iterations, while keeping schema's maintenance cost low
  • notion of the timeperiod and the distinct life cycles for jobs and tasks (unit of work in SS vocabulary) allowed to build lean timetable and simple garbage collector
Parsing years of commits, I see how lucky I am as an engineer to have the Synergy Scheduler. It served as a test bed for many major techniques and approaches: multi-process deployment, multi-threaded core, message-driven communication bus, heap-based garbage collector, job trees, AJAX and template-driven front-end, etc.

Currently, Synergy Scheduler scores 14400 lines of code, 1600 lines of comments, 116 unit tests, and has two spin-offs:
  • launch.py: a toolset to deploy, launch and test Python projects
  • synergy odm: object-to-document mapping
While demanding and often difficult, this was and still is a rewarding journey. After going thru all the hurdles of the open-source release cycles, I came to appreciate the work that many of us take for granted, and truly admire the depth, breadth and high standard of the open-source community.

Cheers!

[1] Synergy Scheduler at github: https://github.com/mushkevych/scheduler

[2] launch.py at github: https://github.com/mushkevych/launch.py

[3] Synergy ODM at github: https://github.com/mushkevych/synergy_odm