Hi r/Python! I’m the developer of Flowfile and wanted to share FlowFrame, a component I built that bridges the gap between code-based and visual ETL tools.

Source code: https://github.com/Edwardvaneechoud/Flowfile/

What My Project Does

FlowFrame lets you write Polars-like Python code for data pipelines while automatically generating a visual ETL graph behind the scenes. You write familiar code, but get an interactive visualization you can debug, share, or use to explain your pipeline to non-technical colleagues.

Here’s a simple example:

« `python import flowfile as ff from flowfile import col, open_graph_in_editor

Create a dataset

df = ff.from_dict({ « id »: [1, 2, 3, 4, 5], « category »: [« A », « B », « A », « C », « B »], « value »: [100, 200, 150, 300, 250] })

Filter, transform, group by and aggregate

result = df.filter(col(« value ») > 150) .with_columns((col(« value ») * 2).alias(« double_value »)) .group_by(« category ») .agg(col(« value »).sum().alias(« total_value »))

Open the visual graph in a browser

open_graph_in_editor(result.flow_graph) « `

When you run this code, it launches a web interface showing your entire pipeline as a visual flow diagram:

![FlowFrame Example](https://github.com/Edwardvaneechoud/Flowfile/blob/main/.github/images/group_by_screenshot.png?raw=true)

Target Audience

FlowFrame is designed for:

Data engineers who want to build pipelines in code but need to share and explain them to others Data scientists who prefer coding but need to collaborate with less technical team members Analytics teams who want to standardize on a single tool that works for both coders and non-coders Anyone working with data pipelines who wants better visibility into their transformations

It’s production-ready and can handle real-world data processing needs, but also works great for exploration, prototyping, and educational purposes.

Comparison

Compared to existing alternatives, FlowFrame takes a unique approach:

Vs. Pure Code Libraries (Pandas/Polars): – Adds visual representation with no extra work – Makes debugging complex transforms much easier – Enables non-coders to understand and modify pipelines

Vs. Visual ETL Tools (Alteryx, KNIME, etc.): – Maintains the flexibility and power of Python code – No vendor lock-in or proprietary formats – Easier version control through code – Free and open-source

Vs. Notebook Solutions: – Shows the entire pipeline as a connected flow rather than isolated cells – Enables interactive exploration of intermediate data at any point – Creates reusable, production-ready pipelines

Key Features

Built on Polars for fast data processing with lazy evaluation Web-based UI launches directly from your Python code Visual ETL interface that updates as you code Flows can be saved, shared, and modified visually or programmatically Extensible architecture for custom nodes

You can install it with: pip install Flowfile

I’d love feedback from the community on this approach to data pipelines. What do you think about combining code and visual interfaces?

submitted by /u/Proof_Difficulty_434 to r/Python
[link] [comments]

FlowFrame: Python code that generates visual ETL pipelines