Thursday, June 26, 2025

AI - ML - Using a DAG Workflow for your Data Pipeline

I have been trying to find a lightweight mechanism to run my increasing number of scripts for my data pipeline.

I have looked at a lot of them. Most are heavy - requiring databases, message queues, and all that goes with a typical 3-4 tier application. Some run in Docker.

I tried using Windmill at first. I liked the GUI - very nice. The deal killer for me, was that Windmill wants to soak in and make its own copies of anything it runs. It can't just reach out and run scripts that, for example, are sitting in a directory path. It can't apparently (could be wrong on this but I think I am right), do a git clone to a directory and run the content from where it sits. It wants to pull it into its own internal database - as a copy. It wants to be a development environment. Not for me. Not what I was looking for. And I only want my "stuff" to be in a single spot.

I then tried using Prefect. What a mess. You have to write Python to use Prefect. Right away, the SQLite Eval database was locking when I did anything asynchronous. Then, stack traces, issues with the CLI, etc. I think they're changing this code too much. Out of frustration I killed the server and moved on.

My latest is DAGU - out of GitHub - Open Source. Wow - simple, says what it does, does what it says. It does not have some of the more advanced features, but it has a nice crisp well-designed and responsive UI, and it runs my stuff in a better way than cron can do.

Here is a sample screenshot. I like it.


 

No comments:

ChatGPT Completely Lied to Me and Made Up Fictitious Information

This is an excerpt of a conversation I had with ChatGPT agent.  It outright lied and MADE STUFF UP!!!! And THIS is why AI is dangerous on so...