I wanted to run my ML / AI scripts - an increasing number of them. This runs on Linux, and I am familiar with the tried-true cron. But I wanted something a bit better.
After an evaluation of Windmill and Prefect, I came across dagu.
I loved that it was lightweight. You can download the source, compile it, and off you go. A single binary.
Understanding how it works, however, was a bit more tedious.
dagu has a CLI, which is great. It also has GUI that runs on port 80 (SSL not supported I guess).
I decided to set it up as a System Service in Linux (systemd). So I crafted up a unit file for it. To do this, I had to create a dagu user, with no login (for security), and no home directory.
One problem I ran into, is that dagu needs a home directory. I created /opt/dagu and it created directories underneath that for dags
The binary likes to be in /usr/local/bin. But in addition to that, I created a directory called /opt/dagu.
If you use the CLI, and pass your yaml file into it, dagu wants to run it "then and there". In my testing at least, it ignored the schedule. Or, perhaps I could be wrong and maybe it will acknowledge the schedule but it still wants to do an immediate run the moment you type: "dagu start my_dag -- NAME=mydag".
So there's a couple of other ways to make your yaml dag file work.
- Craft your yaml inside the GUI - which will ultimately save your dag in the $DAGU_HOME/dags directory.
- Drop your yaml into the $DAGU_HOME/dags directory and restart the dagu service -- remember, I set it up as a service! "systemctl restart dagu". Since the service does a start-all it starts the scheduler as well which is a different process fork.
Once it loads in, you can get different views and perspectives of your workflow.
This is a pipeline view.
This is a timeline view.
And, this is a graph perspective.
No comments:
Post a Comment