ET1’s Github CSV Node is designed to help end users extract data from Github CSV URLs which are in public repositories.
A public repository on Github is not a place to put your private or company information. However City/State for USA is a commonly used resource in Map Data related solutions.
Unlike the CSV Input Node, which reads data local on your computer, the Github CSV Node is online data.
Any CSV file online is accessible for extraction with this node, if it’s a functional CSV file.
Quick Start
Paste a link to a CSV. We use Github because it’s a familiar repository for engineers and organizations.
If you have a link to github csv public repository, you’re ready to get started.
If you need an example, use the following tutorial below.
Finding Github CSVs for your ETL process
Start with www.google.com, search idea:
<Insert the data source you need> + “Github CSV” would be the best bet.
However github isn’t a requirement
Example: I found state/city data by googling and then:
Googling “City State Github CSV”
Then you’re taken to a page like this screenshot below
Click RAW..
Clicking RAW drives you to “githubusercontent”.”com”…. Here is where you’re going to grab the URL.
This URL will be pasted in your ET1 Github CSV Node. The screenshot below is what the CSV looks like on Github and what the URL looks like that you’d need to capture. A URL with “.csv” is ideal.
Using Github CSV Node
Extracting data from the Github CSV Node only requires a link, use the one above.
Follow instructions above to find the URL
Paste the URL, Github CSV Node will fetch data automatically
Fetch button: created to “refresh”
This node will work for any public facing CSV endpoint
we hope, not all CSV files are created equally
In this example we are pulling data using the Github CSV Node, and writing the data to a file using the CSV Output Node.
A closer view.
Thanks for learning more about ET1’s GitHub CSV Node
We appreciate you using ET1’s GitHub CSV Node, and know if you have any questions… Please contact us.
The CSV Input Node, what a classic, flat files living on your computer can be consumed and the data can be extracted here in ET1. CSV is a common file type for data gurus.
Comma-Separated Values, a plain text file format for storing tabular data with values separated by commas.
If you have a CSV file, you’re in the right place.
Click the drag-n-drop to open a file browser, choose your CSV file
Choose to remove rows from the top or bottom
Why would you want to remove the top and bottom of the row? Removing the top and bottom row of your CSV file happens often when dealing with CSV that comes padded with information that isn’t relevant to your solution.
Some CSV files are not structured correctly, and simply not clean enough for ET1, and that means these files will not open. That means the data grid will not populate with data and the CSV Input Node will be ready for the user to try another CSV.
Check your CSV files for strange/weird structures and remediate them prior to ET1.
CSV Node Input Bug Log
To remain transparent about our code driven solution we will also do our best to maintain a running log of bugs found, and the fixes applied to the bug for each node.
Tyler Garrett
While testing the CSV Input Node, I noticed the From Top: and From Bottom: are working separately but not together. Also, you could not undo a decision once completed. It seemed to indefinitely be broken, requiring the entire node to be restarted.
Furth insights:
The CSV Input node was mutating node.data in processCSVData() by slicing rows off the current node.data.
When you changed “Remove Rows: From Top/Bottom” back to 0, the code reapplied removal on already-trimmed data, so the removed rows were never restored.
Relevant code:
node/settings/csvInput.js → processCSVData(node) was computing from node.data rather than an immutable baseline.
CSV Input Bug Fix
Preserve the raw unmodified dataset when the file is loaded.
Always compute the processed dataset from node.originalData in processCSVData() so the operation is idempotent and reversible.
Thanks for learning more about ET1’s CSV Input Node
We appreciate you using ET1’s CSV Input Node, and know if you have any questions… Please contact us.
When extracting data from a JSON file, try the JSON Input Node.
JSON (JavaScript Object Notation) is a common data source.
With ET1’s JSON Input Node you can quickly open your JSON files and begin to transform the data, merge it with other data like offline CSV Node or online Github CSV data.
In ET1, data is normalized in a data grid view to understand the data as if it was in a normalized view. This is due to the desire for most users to see and understand their data like a data grid. However under the hood the data is JSON because under the hood of ET1 is JavaScript and a DAG Streaming (Graph) engine, which enables ET1 to offer features you’ve never seen before in an ETL software!
Using JSON Input Node
Find the Json Input Node in your with your Hands, or in the hamburger menu in the top right of ET1.
Once your JSON Input Node is on the canvas:
Drag and drop the json file on the node
Click the drag and drop area and a “browse to file” tool will open
find the JSON and open
If the JSON is structured correctly, the node works.
Otherwise the node does not work
Example of JSON format that will work:
[
{
"name": "Impossible",
"mean": "0.5",
"const": "CATS GO MOO"
},
{
"name": "Almost No Chance",
"mean": "2.0",
"const": "CATS GO MOO"
}
]
Now that you’re familiar. Let’s see JSON Input Node in action!
Thanks for learning more about ET1 and know if you have any questions… Please contact us.
Trim/Normalize Node is built to help you quickly clean your data pipelines and like the Column Renamer, built to make data pipeline maintaining simple, not complicated, and more than anything, easy to repeat.
AT TIMES WE NEED CAPITAL LETTERS! Perhaps you-have-a-lot-of-this-happening (special characters you don’t ne3ed).
then there are times we aren’t trying to scream, and perhaps lowercase is a requirement for user names or emails. okay, you’re in a good place. case sensitivity is here too. AlongWithTrimmingWhiteSpace.
ET1’s Trim/Normalize Node helps people quickly clean their data.
You can select more than one column to clean, or just choose 1 column to normalize.
The Trim/Normalize Node was created to help you people quickly clean data pipelines and improve data quality across your data environment (a data environment might be a grouping of individual solutions that look and feel similar).
Cleaning dirty unstructured text for sentiment analysis, parsing HTML, or optimizing pipelines for data visualization – this node helps transition your pipelines into what some consider a piece of their overarching data governance.
Using the Trim/Normalize Node in ET1
Using this node is easy and intuitive. Checkboxes, drop downs, and nothing crazy.
Connect data downstream to your node, adjust the settings, and keep solving.
Connect data
Choose column(s)
Decide to trim ends – space(s) on the left and right only
Decide to remove whitespace – any and all space(s)
Remove special characters, any characters, includes spaces
Choose the case sensitivity
Real-world use case Trim/Normalize Node
In this example we are gaining a file from an end user who needs help with capitalizing all of the Address.
Someone sends us this csv. We open it with the CSV Input Node in ET1 then we want to trim/normalize.
Supplier_ID,Supplier_Name,Address,Email
SUP001,Supplier X,123 Main Street|Suite 100|Anytown|CA 90210,supplierx@example.com
SUP002,Supplier Y,456 Oak Avenue|Building B|Sometown|NY 10001,suppliery@example.com
SUP003,Supplier Z,789 Pine Road|Floor 3|Othercity|TX 75001,supplierz@example.com
We are going to add trim ends, incase future data has padded spaces (thinking ahead), and swapping case to upper to follow internal best practices.
Upper case for Address passes this users current data strategy, their reasoning; some data inputs do not automatically swap to uppercase during the software writing to the database, and the software engineers don’t have time to optimize this part of the software.
Thanks for learning more about ET1 and know if you have any questions… Please contact us.
On your magic quest to join data? We call it the Joiner node.
A simple joining solution that helps people join data at a row level.
In ET1, Joiner is focused on “keeping it simple” and will aim to automatically infers your joins.
ET1 assumes.
Inferring a join means it assumes you prepared the data prior. Like, Id = Id..
Without preparing the data stream prior, the assumptions may fail. Use this to your power, and save time by letting ET1’s Joiner Node assume the correct column for you.
Hint; make it easier by preparing your column headers before using the Joiner Node by using the Column Renamer Node. This will help you save time while using ET1.
If your headers are clean this will automatically infer keys for you. What that means is it will try to find a join, without your help. However you may need to help it with doing the right thing if headers do not equal headers.
Connect table1, this will be the table on the “left” and we call it the left key
Connect table1, this will be the table on the “right” and we call it the right key
Pick type inner join or left* join
Right join is possible by swapping which table you connect to the Joiner node first. This order of operation is considered, and by adjusting what connects to this node first – you’re able to right join. You’re simple using the left join and the understanding of what you just read.
Type: The style of join. Today, we only have inner and left join.
The Joiner Node is the tool for joining data at a row-level, it removes complexities when joining data, and these row-level relationships are likely the ‘key’ we need to use the ET1’s Joiner Node.
Goal, join our data to see if we need more inventory.
Problem, the data is broken into many different tables.
Use case: Purchase data and inventory data can be joined, lets break it down.
While analyzing this request, I found the data has duplicate entries on the column Product. Product has a relationship between tables. However we need the tables to be grouped, or we will be creating a many-to-many join.
Here’s how our Inventory data will look after we group by Quantity, and rename our header to Inventory.
To create a KPI, you need to choose the column and how to aggregate.
Setting up your KPIs in ET1
Open an Aggregate Node, stream data into this node, and open the settings.
We need to create a Sum of Quantity.
We need to swap the measure column to Quantity and the operation to sum!
Recap: The Aggregation Node operation is set to sum and the measure column is set to quantity and this creates a single KPI value for the column quantity.