Grasping Technology: DAGU.

Showing posts with label DAGU.. Show all posts

Wednesday, July 2, 2025

Using DAGU for Pipeline Workflows

I wanted to run my ML / AI scripts - an increasing number of them. This runs on Linux, and I am familiar with the tried-true cron. But I wanted something a bit better.

After an evaluation of Windmill and Prefect, I came across dagu.

I loved that it was lightweight. You can download the source, compile it, and off you go. A single binary.

Understanding how it works, however, was a bit more tedious.

dagu has a CLI, which is great. It also has GUI that runs on port 80 (SSL not supported I guess).

I decided to set it up as a System Service in Linux (systemd). So I crafted up a unit file for it. To do this, I had to create a dagu user, with no login (for security), and no home directory.

One problem I ran into, is that dagu needs a home directory. I created /opt/dagu and it created directories underneath that for dags

The binary likes to be in /usr/local/bin. But in addition to that, I created a directory called /opt/dagu.

If you use the CLI, and pass your yaml file into it, dagu wants to run it "then and there". In my testing at least, it ignored the schedule. Or, perhaps I could be wrong and maybe it will acknowledge the schedule but it still wants to do an immediate run the moment you type: "dagu start my_dag -- NAME=mydag".

So there's a couple of other ways to make your yaml dag file work.

Craft your yaml inside the GUI - which will ultimately save your dag in the $DAGU_HOME/dags directory.
Drop your yaml into the $DAGU_HOME/dags directory and restart the dagu service -- remember, I set it up as a service! "systemctl restart dagu". Since the service does a start-all it starts the scheduler as well which is a different process fork.

Once it loads in, you can get different views and perspectives of your workflow.

This is a pipeline view.

This is a timeline view.

And, this is a graph perspective.

Thursday, June 26, 2025

AI / ML - Using a DAG Workflow for your Data Pipeline

I have been trying to find a lightweight mechanism to run my increasing number of scripts for my data pipeline.

I have looked at a lot of them. Most are heavy - requiring databases, message queues, and all that goes with a typical 3-4 tier application. Some run in Docker.

I tried using Windmill at first. I liked the GUI - very nice. The deal killer for me, was that Windmill wants to soak in and make its own copies of anything it runs. It can't just reach out and run scripts that, for example, are sitting in a directory path. It can't apparently (could be wrong on this but I think I am right), do a git clone to a directory and run the content from where it sits. It wants to pull it into its own internal database - as a copy. It wants to be a development environment. Not for me. Not what I was looking for. And I only want my "stuff" to be in a single spot.

I then tried using Prefect. What a mess. You have to write Python to use Prefect. Right away, the SQLite Eval database was locking when I did anything asynchronous. Then, stack traces, issues with the CLI, etc. I think they're changing this code too much. Out of frustration I killed the server and moved on.

My latest is DAGU - out of GitHub - Open Source. Wow - simple, says what it does, does what it says. It does not have some of the more advanced features, but it has a nice crisp well-designed and responsive UI, and it runs my stuff in a better way than cron can do.

Here is a sample screenshot. I like it.

Grasping Technology

Wednesday, July 2, 2025

Using DAGU for Pipeline Workflows

Thursday, June 26, 2025

AI / ML - Using a DAG Workflow for your Data Pipeline

I Need More Financial Quant Data - Techniques On How To Get It

Search This Blog