Filtering Nodes in ET1

Filtering Nodes in ET1

The filtering nodes help you reduce the number of rows, drill into the exact information needed, and create a data set that will add value VS confuse your audience.

When filtering, remember you’re reducing the amount of data coming through the node, or you can swap to include.

Include, exclude, and ultimately work on your data.

The Filtering Nodes in ET1

1. 🔍 Any Column Filter

  • The Swiss Army Knife
    • Search across all columns at once
    • Perfect for quick data exploration
    • No setup required – just type and filter

2. 📋 Column Filter

  • The Precision Tool
    • Filter specific columns with exact matches
    • Create multiple filter conditions
    • Ideal for structured, clean data

3. 🧮 Measure Filter Node

  • The Number Cruncher
    • Filter numeric columns using conditions like:
      • Greater than/less than
      • Between ranges
      • Above/below average
    • Great for financial data, metrics, and KPIs

4. 🌀 Wild Headers

  • Include or exclude headers based on wildcard
    • Easily clean wide tables
    • No brainer approach to column filtering
    • Column filter is nice, but at times wild headers are king

Why This Beats Spreadsheet Hell

  1. Visual Feedback: See filtered results instantly
  2. Non-Destructive: Your original data stays safe
  3. Never recycle: You filter data unnecessarily.
  4. Stackable: Chain multiple filters for complex queries
  5. Reversible: Remove or modify filters anytime

Filtering Pro Tips

Be willing to test filters, and create branches. Then right click the beginning of the branch to duplicate the entire downstream operation. This then lets you edit filters across multiple streams of data, and see the difference between your filtering!

Start with “Any Column” to explore strings, measure filter to explore measures, then switch to specific column filters as you understand your data better, and wild headers for those edge cases where you have a lot of columns (but only a couple matter).

ET1 is built to easily filter (transform) your data. Remember, it’s like having a conversation with your dataset!

Return to ET1 Overview to learn more.

ET1’s Data Input Node Overview

ET1’s Data Input Node Overview

CSV, JSON, and Public CSV endpoints or manual tables.

These help you kick start your data pipeline.

Once your data comes into the data input, it begins to flow downstream using a custom DAG streaming engine.

Input Node Overview

The Input nodes are essential for moving the needle in ET1, without data, we are using our feelings!

  • The CSV Input node is great for getting your Comma delimited files into ET1.
  • The JSON Input node is great for getting JSON in the app, your engineering team will be happy.
  • The Github CSV is where you can get CSVs off the public internet. That’s fun. Enrich your data pipelines.
  • Manual table is great, synthesize a few rows, build a small table, save variables, make life easier.

The future of data inputs for ET1

We are eager to add more connections but today we are looking to keep it simple by offering CSV, JSON, Github CSV, and manual tables.

Next? Excel input perhaps.

ETL Input Nodes – Simple as Pie

📊 CSV Input Node

  • What it does: Loads data from CSV files or text
  • Why it’s cool:
    • Drag & drop any CSV file
    • Handles messy data with smart parsing
    • Preview before committing
  • No more: Fighting with Excel imports or command-line tools

🧾 JSON Input Node

  • What it does: Imports JSON data from files or direct input
  • Why it’s cool:
    • Works with nested JSON structures
    • Automatically flattens complex objects
    • Great for API responses and config files
  • No more: Writing custom parsers for every JSON format

📝 Manual Table

  • What it does: Create data tables by hand
  • Why it’s cool:
    • Add/remove rows and columns on the fly
    • Perfect for quick mockups or small datasets
    • Edit cells like a spreadsheet
  • No more: Creating throwaway CSV files for tiny datasets

🐙 GitHub CSV Node

  • What it does: Pull CSV files directly from GitHub
    • However GitHub isn’t required, any online csv endpoint “should work”
  • Why it’s cool:
    • Point to any public GitHub CSV file
    • Auto-refreshes on URL change
      • A fetch button to ‘get’ it again
    • Great for GitHub collaboration
  • No more: Downloading data, this gets it for you.

The Best Part?

No coding required.

No complex setup.

Just point, click, and start transforming your data like some data engineer, the pro.

What used to take hours (and a computer science degree) now takes seconds, and not scary…

Return to ET1 Overview to learn more.

About ET1, A Tool to Help Uncomplicate Data

About ET1, A Tool to Help Uncomplicate Data

It is undeniable that data can be overwhelming, and those who work with it daily are adept at navigating this complexity. ET1 is designed to assist individuals in managing this chaos, eliminating the constraints associated with outdated software that is over 20 years old.

ET1 is an easy-to-use tool. The intention is to create an ETL solution that empowers everyone to participate and isn’t intimidating.

Many ETL solutions are complex, not refined, and lack non-technical adoption.

Many ETL solutions require lengthy installations, frequent uneventful online searches, community forum posts full of “partners” desperate for clients, driver installs that fail, lengthy setups to gain <X> feature… Then your company must rely this dated technology (20 year old engines) to give them the competitive edge in 2025.

Most ETL software prioritize the database community, consequently hindering overall solution efficiency. Databases primarily focus on columns, data types, and tables, relegating row-level or cell-level data to a position of secondary importance. With our DAG engine, we’re able to treat each row as an important and essential asset to the data solution.

ET1 facilitates repeatable data cleaning within an intuitive environment. It serves as a visual data workbench, empowering users to explore, cleanse, and articulate data solutions with greater efficiency compared to costly “legacy solutions.”

Think of ET1 as a way to easily streamline data management in a user-friendly no-code environment that doesn’t require walls to climb. –Tyler

The vision

With 13 years of experience in drag-and-drop data engineering, data lakes, data warehouses, and dashboards, I have developed ET1 for users of all backgrounds. My intention is for ET1 to be intuitive and user-friendly, evoking the feeling of playing with guitar pedals to create music, rather than a complex, “engineer-only” interface that overwhelms even experienced users. I recognize that not everyone associates data cleaning with the ETL process – extract, transform, load – and I aim to make ET1 accessible and understandable for those unfamiliar with ETL terminology.

To illustrate, envision ET1 as a music platform facilitating the flow of data through inputs and outputs. Each node within ET1 functions similarly to a guitar pedal, each with its own inputs and outputs. These “pedals” are designed to be compact, require minimal configuration, and offer a streamlined, intuitive user experience.

What is ETL? Extract, Transform, Load.

Extract, Pull the data.

Transform, Filter, Sort, Join, Merge, Update, and essentially change the information.

Load, Send the data to a file, database, chart, report, or dashboard.

The Install, Setup, Operating Systems, Requirements?

No install and no setup and any operating system will work.

Unlike all ETL applications, ET1 opens instantly. There are no drivers. Lastly, no interesting implementation.

Do you have a web browser? How else did you get here. That’s the minimum requirement.

Also… With ET1 you can…

  • Start cleaning and filtering data before KNIME loads.
  • Begin your ETL journey before you understand the pricing of Alteryx.

Core Functionality

The core functionality of ET1 will change and mutate over a period of time.

  • Data Input: Supports CSV, JSON, and GitHub data sources
  • Data Transformation: Various transformation nodes (filters, joins, aggregations, etc.)
  • Data Output: Export to CSV, JSON, and Excel formats

What’s exciting is it’s easy for us to add new features because we are not a large platform!

ET1 Key Features

Here are a few features about ET1 that make the app exciting.

  1. Node-Based Interface:
    • Drag-and-drop nodes for different operations
    • Visual connections between nodes
    • Real-time data previews, flow data downstream
  2. Node Types:
    • Input: CSV, JSON, GitHub, Manual Table
    • Transform: Filter, Join, Union, Pivot, Sort, etc.
    • Output: CSV, JSON, Excel
  3. Advanced Features:
    • Branching and versioning of data flows
    • Node settings and configuration
    • Collapsible nodes for better organization
    • Context menus for node operations
  4. Data Processing:
    • Column operations (rename, filter, normalize)
    • String operations (concatenation, splitting)
    • Aggregations and calculations
    • Data types aren’t very interesting to solve
      • Data types are for databases
      • ET1 is not a database, it’s JavaScript
      • Data is stored in graphs
      • Data types are boring
  5. UI/UX:
    • Responsive design, everything just flows downstream
    • Interactive previews, makes it easy to see what changed
    • Not scary, eager to avoid the developer style tools
    • Highlight cells per data grid to visually inspect across canvas

Technical Specs

This application is not mobile friendly, mobility will not be a major focus, with that said, this will requires a computer.

There are no specific computer requirements needed, any machine should work, thanks for giving this an attempt.

The software requires no installation, and no setup.

We currently do not store any information about users, or your data. When we decide to add authorization, we will more than likely use Auth0, we are familiar with Auth0 due to our development with Canopys.

Thank you for learning about ET1.

Jump back to the ET1 Overview to continue learning about how ET1 will help your team save time and money when removing the chaos from your data.

Evolving the Perceptions of Probability

Evolving the Perceptions of Probability

What does the CIA’s “estimation probability” have to do with data visualization and a Reddit poll?

Think of it like this: the CIA, and many government agencies, has teams who dig through research, write up reports, and pass them along to others who make the big calls. A big part of that process is putting numbers behind words, predicting how likely something is to happen, and framing it in plain language. Even the headlines they draft are shaped around those probability calls.

The reddit pole? Just an interested group of data people who decided to re-create this same study.

Did you know the CIA releases documents on a regular basis?

The CIA has a large resource catalog and we will grab from three different sources.

Lets explore the development and history of a ridgeline plot that shows the “Perceptions of Probability,” the curious world of data lovers, migrating data from CSV to JSON, building a visual using D3, dive into the complex history, and more.

Numbers behind the words.

The raw data in our D3 chart came from /r/samplesize responses to the following question: What [probability/number] would you assign to the phrase “[phrase]”? source.

Note: An online community created a data source that resembles the same study the CIA completed, using 23 NATO officials, more on this below. Below you will see images created to resemble the original study, and the background of the data.

Within the CIA, correlations are noticed – studied – quantified and then later released publicly.

In the 1950’s, the CIA noticed something happening internally and created a study.

Before writing this article I did not realize how much content the CIA has released. Like the studies in intelligence, fascinating information here.

Our goal is research the history behind ‘Perceptions of Probability,’ find & optimize the data using ETL, and improve on the solution to ensure it’s interactive, and re-usable. The vision is we will be using an interactive framework like d3, which means JavaScript, html, and CSS.

For research, we will keep everything surface level, and link to more information for further discovery.

The CIA studied and quantified their efforts, and we will be doing the same in this journey.

Adding Features to the Perceptions of Probability Visual

Today, the visual below is the muse (created by a user on reddit) and we are grateful they have this information available to play with on their github. They did the hard part, getting visibility on this visual and gathering the data points.

This viz made the Longlist for the 2015 Kantar Information is Beautiful Awards *

When you learn about the Perceptions of Probability, you’ll see it’s often a screenshot because the system behind the scenes creates images (ggjoy package). Alternatively that’s the usual medium online, sharing content that is static.

A screenshot isn’t dynamic, it’s static and it’s offline, we can’t interact with a screenshot, unless we recreate the screenshot, which would require the ability to understand R, install R, and run R.

This is limiting to average users, and we wonder, is it possible to remove this barrier?

If we looked at this amazing visualization as a solution we can improve and make more adoptable, how would we optimize?

What if it could run online and be interactive?

To modernize, we must optimize how end users interact with the tool; in this case a visualization, and we do our best to remove the current ‘offline’ limitation. Giving this a json data source also modernizes it.

The R code to create the Assigned probability solution above;

#Plot probability data
ggplot(probly,aes(variable,value))+
  geom_boxplot(aes(fill=variable),alpha=.5)+
  geom_jitter(aes(color=variable),size=3,alpha=.2)+
  scale_y_continuous(breaks=seq(0,1,.1), labels=scales::percent)+
  guides(fill=FALSE,color=FALSE)+
  labs(title="Perceptions of Probability",
       x="Phrase",
       y="Assigned Probability",
       caption="created by /u/zonination")+
  coord_flip()+
  z_theme()
ggsave("plot1.png", height=8, width=8, dpi=120, type="cairo-png")

The code is used to manage the data, give it a jitter, and ultimately create a png file.

In our engineering of this solution, we want to create something that loads instantly, easy to use again, and resembles ridgelines from this famous assigned probability study. If we do this, it would enable future problem solvers another tool to solve, and then we are only 1 step away (10-30 lines of code) from making this solution accept a new data file.

The History on Estimative Probability

Sherman Kent’s declassified paper Words of Estimative Probability (released May 4, 2012) highlights an incident in estimation reports, “Probability of an Invasion of Yugoslavia in 1951.” A writeup on this was given to policy makers and their assumptions on what they read was a lower value than they had intended.

How long had this been going on? How often are policy makers and analysts not seeing the same understanding of a given situation? How often does this impact us negatively? Many questions come to mind.

There was possibly not enough emphasis on the text, or there was no such scoring system in place to explain the seriousness of a an attack. Even with the report suggesting there was a serious urgency, nothing happened. After some days past, in a conversation someone asked “what did you mean by “Serious Possibility?” What odds did you have in mind?

Sherman Kent, the first director of CIA’s Office of National Estimates, was one of the first to recognize problems of communication caused by imprecise statements of uncertainty. Unfortunately, several decades after Kent was first jolted by how policymakers interpreted the term “serious possibility” in a national estimate, this miscommunication between analysts and policymakers, and between analysts, is still a common occurrence.

Through his studies he created the following chart, which is later used in another visualization, and it enables a viewer to see how this study is similar to the study created here. Used in a scatter plot below this screenshot.

What is Estimation Probability?

Words of estimative probability are terms used by intelligence analysts in the production of analytic reports to convey the likelihood of a future event occurring.

Outside of the intelligence world, human behavior is expected to be somewhat similar, which says a lot about headlines in todays news and content aggregators. One can assume journalists live by these numbers.

Text has the nature to be ambiguous.

When text is ambiguous, I like to lean on data visualization.

To further the research, “23 NATO military officers accustomed to reading intelligence reports [gathered]. They were given a number of sentences such as: “It is highly unlikely that..” All the sentences were the same except that the verbal expressions of probability changed. The officers were asked what percentage probability they would attribute to each statement if they read it in an intelligence report. Each dot in the table represents one officer’s probability assignment.” This quote is from the Psychology of Intelligence Analysis.pdf, Richards J. Heuer, Jr.

The above chart was then overlayed on this scatter plot, of the 23 NATO officers assigning values to the text. Essentially estimating likely hood an event will occur.

Survey score of 23 NATO officers who has a responsibility to read this kind of text. They scored the text based on likely hood the situation/event would take place (Page 155 * )

Modernizing the Perceptions of Probability

Over time people see data and want to create art. My artwork will be creating a tool that can be shared online, interactive, and open the door to a different audience.

Based on empirical observations in data visualization consulting engagement, you can expect getting access to data to take more time, and for the data to be dirty. Luckily this data was readily available and only required some formatting.

The data was found here on github, which is a good sample for what we are trying to create. The current state of the data is not prepared yet to create a D3 chart. This ridgeline plot chart will require JSON.

Lets convert CSV to JSON using the following python:

import pandas as pd
import json
from io import StringIO

csv_data = """Almost Certainly,Highly Likely,Very Good Chance,Probable,Likely,Probably,We Believe,Better Than Even,About Even,We Doubt,Improbable,Unlikely,Probably Not,Little Chance,Almost No Chance,Highly Unlikely,Chances Are Slight
95,80,85,75,66,75,66,55,50,40,20,30,15,20,5,25,25
95,75,75,51,75,51,51,51,50,20,49,25,49,5,5,10,5
95,85,85,70,75,70,80,60,50,30,10,25,25,20,1,5,15
95,85,85,70,75,70,80,60,50,30,10,25,25,20,1,5,15
98,95,80,70,70,75,65,60,50,10,50,5,20,5,1,2,10
95,99,85,90,75,75,80,65,50,7,15,8,15,5,1,3,20
85,95,65,80,40,45,80,60,45,45,35,20,40,20,10,20,30

"""  # paste your full CSV here

# Load CSV
df = pd.read_csv(StringIO(csv_data))

# Melt to long format
df_long = df.melt(var_name="name", value_name="y")
df_long["x"] = df_long.groupby("name").cumcount() * 10  # create x from row index

# Group by category for D3
output = []
for name, group in df_long.groupby("name"):
    values = group[["x", "y"]].to_dict(orient="records")
    output.append({"name": name, "values": values})

# Save JSON
with open("joyplot_data.json", "w") as f:
    json.dump(output, f, indent=2)

print("✅ Data prepared for joyplot and saved to joyplot_data.json")

With data clean, we are a few steps closer to building a visual.

Using code from a ridgeline plot, I created this density generator for the ridgeline to show density. This enables us to look at dense data, and plot it across the axis.

// Improved KDE-based density generator for joyplots
function createDensityData(ridge) {
    // Extract the raw probability values for this phrase
    const values = ridge.values.map(d => d.y);

    // Define x-scale (probability axis: 0–100)
    const x = d3.scaleLinear().domain([0, 100]).ticks(100);

    // Bandwidth controls the "smoothness" of the density
    const bandwidth = 4.5; 

    // Gaussian kernel function
    function kernel(u) {
        return Math.exp(-1 * u * u) / Math.sqrt(2 * Math.PI);
    }

    // Kernel density estimator
    function kde(kernel, X, sample, bandwidth) {
        return X.map(x => {
            let sum = 0;
            for (let i = 0; i < sample.length; i++) {
                sum += kernel((x - sample[i]) / bandwidth);
            }
            return { x: x, y: sum / (sample.length * bandwidth) };
        });
    }

    return kde(kernel, x, values, bandwidth);
}

This ridgeline now closely resembles the initial CIA tooling rebuilt by the github user.

We have successfully created a way to create density, ridgelines, and in a space that can be fully interactive.

Transparency is a setting so here’s the lower setting.
Here’s a different transparency setting: .attr(‘fill-opacity’, 0.7)

Not every attempt was a success: here’s an index based version. Code below. This method simply creates a bell-shape around the most dense area, which does enable a ridgeline plot.

// Create proper density data from the probability assignments
function createDensityData(ridge) {
// The data represents probability assignments, we need to create a density distribution
// around the mean probability value for each phrase
                
// Calculate mean probability for this phrase
const meanProb = d3.mean(ridge.values, d => d.y);
const stdDev = 15; // Reasonable standard deviation for probability perceptions
                
// Generate density curve points
// Density Generation Resolution
const densityPoints = [];
for (let x = 10; x <= 100; x += 10) {
// Normal distribution density
const density = Math.exp(-3 * Math.pow((x - meanProb) / stdDev, 2));
densityPoints.push({ x: x, y: density });
}
                
return densityPoints;
 }

There’s a bit of fun you can have with the smoothing of the curve on the area and line. However I opted for the first approach listed above because it gave more granularity and allowed the chart to sync up more with the R version.

This density bell shape curve producer could be nice for digging into the weeds and cutting out potential density around the sides, in my opinion it didn’t tell the full story, but wanted to report back as this extra area where we adjust the curve was fun to toy with and even breaking the visual was pleasant.

// Create smooth area
const area = d3.area()
     .x(d => xScale(d.x))
     .y0(ridgeHeight)
     .y1(d => ridgeHeight - yScale(d.y))
     .curve(d3.curveCardinal.tension(.1));                
const line = d3.line()
      .x(d => xScale(d.x))
      .y(d => ridgeHeight - yScale(d.y))
      .curve(d3.curveCardinal.tension(.1)); 

Thanks for visiting. Stay tuned and we will be releasing these ridgelines. Updates to follow.

This solution was created while battle testing our ridgeline plot tooling on Ch4rts. Tyler Garrett completed the research.

Event-Driven Microservices with Persistent Logs

Event-Driven Microservices with Persistent Logs

Imagine a digital ecosystem where applications respond to business events instantly, where data is always consistent and traceable, and where scaling horizontally is the norm, not the exception. At Dev3lop LLC, we thrive at the intersection of agility, analytics, and engineering innovation. Event-driven microservices, underpinned by persistent logs, have revolutionized how leading organizations achieve these goals, turning bottlenecks into breakthroughs. In this article, we’ll dissect how this paradigm empowers modern enterprises to act on insights in real time, increase system resilience, and future-proof their architecture—all while serving as a launch pad for business growth and innovation.

The Strategic Advantage of Event-Driven Microservices

In the dynamic landscape of digital transformation, microservices have emerged as the architectural backbone for organizations seeking rapid innovation. However, traditional request-driven approaches often cause brittle integrations and data silos, restricting scalability and agility. Enter the event-driven microservices model; here, systems react asynchronously to events—such as a new customer signup or an inventory update—resulting in a more decoupled and scalable ecosystem.

Persistent logs are the silent heroes in these architectures. They not only preserve every business event like a journal but also unlock the potential for robust analytics and auditing. Leveraging event logs facilitates data integrity with advanced SQL server consulting services, allowing you to address business requirements around traceability and compliance. When your systems are event-driven and log-reliant, you future-proof your IT and data teams, empowering them to integrate novel services, replay events for debugging, and support ever-evolving analytics needs. This is not just about technology, but fundamentally reimagining how your organization creates and captures value through real-time insights.

Driving Data Consistency and Analytical Power with Persistent Logs

Persistent logs are more than a backbone for microservices—they are central to unlocking total data lineage, version control, and high-fidelity analytics. By storing every change as an immutable sequence of events, persistent logs make it possible to reconstruct current and historical system states at any point in time. This capability is critical for organizations seeking to implement robust slowly changing dimension (SCD) implementations in modern data platforms, and empowers analytics teams to perform forensic investigations or retroactive reporting without disruption.

Perhaps more strategically, persistent logs allow for data versioning at the infrastructure level—an essential ingredient for organizations exploring comprehensive data version control as a competitive advantage. Imagine launching a new service and safely replaying events to populate its state, or resolving issues by reviewing a granular, timestamped audit trail. When combined with semantic versioning, as discussed in this deep dive on schema and API evolution, persistent logs create a living, resilient record that enables true agility. This is the engine that drives reliable data workflows and breakthrough analytics.

Architectural Patterns and Implementation Considerations

Implementing event-driven microservices with persistent logs isn’t just a technical choice—it’s a strategic roadmap. Architectural patterns like event sourcing and Command Query Responsibility Segregation (CQRS) use logs as the source of truth, decoupling the write and read models for greater flexibility and scalability. Selecting the right log technology—be it Apache Kafka, Azure Event Hubs, or bespoke database approaches—depends on your needs for consistency, throughput, and integration with enterprise systems.

Choosing the best approach should factor in your existing ecosystem and integration requirements. Organizations comparing open source and commercial ETL solutions should also consider how ingestion pipelines and microservices will interact with these persistent logs. Thoughtful attention must be paid to data type handling—overlooked integer overflow issues can cripple analytics. That’s why working with a consultancy experienced in both grassroots and enterprise-grade deployment is critical. The right partner accelerates your transition, builds resilient patterns, and ensures your event-driven future is both robust and innovative.

Unleashing Business Growth and Innovation with Event-Driven Analytics

Event-driven microservices aren’t just about system performance—they’re a catalyst for business transformation. By unlocking granular, real-time data, persistent logs fuel data-driven decision making and create new possibilities for customer experience optimization. With the ability to correlate, enrich, and analyze data streams as they happen, organizations can harness the power of advanced analytics to drive strategic growth and outpace the competition.

When designed thoughtfully, event-driven architectures with persistent logs allow organizations to create feedback loops, respond instantly to emerging trends, and test innovations with minimal risk. As these systems evolve, the insights derived—not just from the data, but from how business events are recorded and acted upon—become invaluable assets. This is not just a technical evolution; it’s a new standard for agility and competitive advantage across industries.

Tags: event-driven architecture, microservices, persistent logs, data analytics, data version control, business innovation

Thank you for your support, follow DEV3LOPCOM, LLC on LinkedIn and YouTube.