dev3lopcom, llc, official logo 12/8/2022

Connect Now

Increase Website Speeds: DALL·E images from PNG to JPEG or/and WEBP.

Increase Website Speeds: DALL·E images from PNG to JPEG or/and WEBP.

In todays article we will teach you how to trim your Dalle images and more than anything increase your load speeds on your websites, applications, and dashboards.

Often when you load an image to a software, you’re thinking about the load speed because that will dictate the overall user experience. This is both a technical problem with code and images.

This article covers the technical challenge by offering a script to manage your PNG files and opens a door into how optimization of images can make a big deal!

If you haven’t noticed we have rebranded and started trying out DALL·E images. We are seeking to improve our websites user experience, and trying a little bit of branding.

We really have fallen in love with the output and found we are consistently having to clean the OpenAI logo from the output. We always remove the bottom 16 pixels and lower the image quality because we want to use the image.

Imagine trying to wait 1 minute for a website to load. That’s what we want to avoid. Algorithms are created to test load speed and loading everything in less than one second is not only ideal but expected by end users.

by Tyler Garrett

When adding large images to the internet, often there’s a need to lower the quality, to improve website speeds, applications load faster, and you enjoy a better user experience. This script automatically manages the quality of your image, set to 75, and you can change the settings by updating the quality variable.

To remove the labeling created by DALL·E’s workflow, we can apply a quick python solution to solve the problem.

Below, you’ll find two scripts, one script helps you go from PNG to JPEG and trims the image, and the next python script will help you white label your Dalle image, plus convert PNG to WEBP!

We hope this allows you a quicker path to using DALL·E designs in your future.

To begin, you’ll need a directory of images and your computer turned on.

import os
from PIL import Image

# Set the directory containing the image files
directory = "C:/Users/ityle/Downloads/Edit"

# Set the output quality (0-100)
quality = 75

# Set the pixel trim size
trim = 16

# Get a list of the files in the directory
files = os.listdir(directory)

# Iterate through the files
for file in files:
  # Check if the file is a PNG
  if file.endswith(".png"):
    # Open the image file
    im = Image.open(os.path.join(directory, file))

    # Convert the image to JPEG
    im = im.convert("RGB")

    # Crop the bottom 16 pixels off the image
    width, height = im.size
    im = im.crop((0, 0, width, height-trim))

    # Lower the image quality
    im.save(os.path.join(directory, "modified_" + file.replace(".png", ".jpg")), "JPEG", quality=quality)

You will need to edit the file directory to ensure you’re aiming at the correct folder. This script applies modified to the beginning of any changed images and also helps improve the quality of the image, to lower the sizes from 2mb to 100kb.

Removing the DALLE logo is now a quick process and you’re back to using these amazing graphics in no time.

Moving from PNG to Webp with Dalle image

While we enjoy the previous script, we found the range on the file output was 100kb to 140kb, and this can generate a somewhat slow image for internet loading speeds.

Below, find code to help you convert png to webp, which is Googles image compression file format that is sweeping the web.

import os
from PIL import Image

# Set the directory containing the image files
directory = "C:/Users/ityle/xyz"

# Set the pixel trim sizes
trim = 16 # bottom trim exactly sized for dalle logo
trim_top = 300  # New trim for the top

# Get a list of the files in the directory
files = os.listdir(directory)

# Start with quality 100 and decrease to 1
start_quality = 100
end_quality = 1

# Store file paths, sizes, and quality settings
file_info = []

# Iterate through the files
for file in files:
    # Check if the file is a PNG
    if file.endswith(".png"):
        print(f"Processing {file}...")

        # Open the image file
        im = Image.open(os.path.join(directory, file))

        # Trim the top part of the image
        width, height = im.size
        im = im.crop((0, trim_top, width, height - trim))

        # Loop through quality settings
        for quality in range(start_quality, end_quality - 1, -1):
            # Save the image with the current quality setting
            webp_filename = os.path.join(directory, f"{quality}_q_" + file.replace(".png", ".webp"))
            im.save(webp_filename, "WebP", quality=quality)

            # Get the file size
            file_size = os.path.getsize(webp_filename)

            # Store file path, size, and quality
            file_info.append((webp_filename, file_size, quality))

            # Print information
            print(f"Quality: {quality}, File: {webp_filename}, Size: {file_size} bytes")

# Find the file closest to X KB
closest_file = min(file_info, key=lambda x: abs(x[1] - 15000))

# Delete all other generated WebP files
for webp_file, _, _ in file_info:
    if webp_file != closest_file[0]:
        os.remove(webp_file)
        print(f"Deleted {webp_file}")

print(f"Closest file to 15KB: {closest_file[0]}, Size: {closest_file[1]} bytes, Quality: {closest_file[2]}")

In this script we add a feature to trim both top and bottom, we recommend trimming the image vertically to improve the load speeds even greater. We have transitioned to this python script because it allows us to save on the image sizes and improved our overall design workflow.

Now, our website loads faster than ever before. Most importantly the First Content Paint loads in less than 1 second and that is a good metric for a website! Websites that load fast tend to keep end users longer.

If you have any questions about the python script, we recommend you contact our data engineering consulting team!

Using Python for Named Entity Recognition (NER), A NLP Subtask

Using Python for Named Entity Recognition (NER), A NLP Subtask

Named Entity Recognition (NER) is a subtask within natural language processing (NLP) with the objective of recognizing and organizing named entities in text.

Think of a persons name, a company name, or a place. The ne_chunk() function in the nltk.chunk module represents a technique for executing named entity recognition in Python, making use of the Natural Language Toolkit (NLTK) library.

The ne_chunk() function processes a list of POS-tagged tokens as input and produces a tree of named entities as output. It represents the tree as a nested list of tuples, where each tuple signifies a named entity and includes the entity’s label along with a list of the words containing the entity.

For instance, the label for geographical location is “GPE,” and it represents the named entity “New York,” such as in the tuple (“GPE”, “New York”).

Named entity recognition in the ne_chunk() function relies on a rule-based approach, signifying the utilization of a set of hand-crafted rules for the identification and categorization of named entities. These rules consider the POS tags assigned to the words in the text and the contextual information surrounding them. For instance, if a proper noun (NNP) appears following the word “of,” it is likely to represent an organization’s name.

A video related to Named Entity Recognition on youtube.

One of the main advantages of the ne_chunk() function is its simplicity and ease of use. Know its minimal setup and you will add it to your NLP pipelines with ease.

However, the rule-based approach also imposes some rules. Dependence on the quality and coverage of the rules affects the accuracy of named entity recognition and is subject to influences from text variations, such as synonyms or aliases. Furthermore, the ne_chunk() function can solely identify a limited set of named entities and lacks support for fine-grained entity types, such as job titles or product names.

Another limitation of ne_chunk() pertains to its indifference to the context in which named entities appear, an aspect crucial for disambiguating entities and comprehending their significance.

In spite of these limitations, the ne_chunk() function can still prove valuable for fundamental named entity recognition tasks, encompassing the extraction of names of individuals, organizations, and locations from unstructured text. Moreover, it can serve as a preliminary step for developing more advanced NER systems or for tasks where stringent accuracy requirements are not essential.

Overall, the ne_chunk() function offers a straightforward and user-friendly approach to conducting named entity recognition in Python using NLTK. It necessitates minimal configuration and effortless integration into existing NLP pipelines

To incorporate named entity recognition (NER) into the existing code, you can employ the ne_chunk() function from the nltk.chunk module, which accepts a list of POS-tagged tokens as input and yields a tree of named entities.

Example of how to use the ne_chunk() function

# Import the ne_chunk function from the nltk.chunk module
from nltk import ne_chunk

# Perform named entity recognition on the filtered tokens
named_entities = ne_chunk(filtered_tokens)

# Print the named entities
print(named_entities)

This script uses the ne_chunk() function to perform named entity recognition on the filtered tokens, which are the tokens that have been filtered by POS tags. The function returns a tree of named entities, which you can print to see the recognized entities.

You can also use nltk.ne_chunk() function which take the POS tagged tokens and return the tree of named entities, this function uses the maxent classifier to classify the words.

# Perform named entity recognition on the filtered tokens
named_entities = nltk.ne_ch
Python Code to Begin Part-of-Speech Tagging Using a Web Scrapped Website

Python Code to Begin Part-of-Speech Tagging Using a Web Scrapped Website

Part-of-speech tagging, also known as POS tagging or grammatical tagging, is a method of annotating words in a text with their corresponding grammatical categories, such as noun, verb, adjective, adverb, and sometimes this is referred to as data mining. This process is important for natural language processing (NLP) tasks such as text classification, machine translation, and information retrieval.

There are two main approaches to POS tagging: rule-based and statistical. Rule-based tagging uses a set of hand-written rules to assign POS tags to words, while statistical tagging uses machine learning algorithms to learn the POS tag of a word based on its context.

Statistical POS tagging is more accurate and widely used because it can take into account the context in which a word is used and learn from a large corpus of annotated text. The most common machine learning algorithm used for POS tagging is the Hidden Markov Model (HMM), which uses a set of states and transition probabilities to predict the POS tag of a word.

One of the most popular POS tagging tools is the Natural Language Toolkit (NLTK) library in Python, which provides a set of functions for tokenizing, POS tagging, and parsing text. NLTK also includes a pre-trained POS tagger based on the Penn Treebank POS tag set, which is a widely used standard for POS tagging.

In addition to NLTK, other popular POS tagging tools include the Stanford POS Tagger, the OpenNLP POS Tagger, and the spaCy library.

POS tagging is an important step in many NLP tasks, and it is used as a pre-processing step for other NLP tasks such as named entity recognition, sentiment analysis, and text summarization. It is a crucial step in understanding the meaning of text, as the POS tags provide important information about the syntactic structure of a sentence.

In conclusion, Part-of-Speech tagging is a technique that assigns grammatical category to words in a text, which is important for natural language processing tasks. Statistical approach is more accurate and widely used, and there are several libraries and tools available to perform POS tagging. It serves as a pre-processing step for other NLP tasks and it is crucial in understanding the meaning of text.

Using NLTK for the First Time

Here’s a quick walkthrough to allow you to begin POS tagging.

First, you’ll want to install NLTK completely.

NLTK is an open source software. The source code is distributed under the terms of the Apache License Version 2.0. The documentation is distributed under the terms of the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States license. The corpora are distributed under various licenses, as documented in their respective README files.

Quote from; https://github.com/nltk/nltk/wiki/FAQ

If you have pycharm available or a python IDE, begin by opening the terminal and running.

pip install nltk

Next you want to use their downloader.

Here’s the python to run next. It will open their downloader on your computer.

import nltk
nltk.download()

The following window will open.

Go ahead and download everything.

Here is an example of a Python script that uses the Natural Language Toolkit (NLTK) library to perform part-of-speech tagging on the text scraped from a website:

Find the code from the youtube video above, here on github, explained line by line below.

import requests
from bs4 import BeautifulSoup
import nltk

# Work-around for mod security, simulates you being a real user

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:55.0) Gecko/20100101 Firefox/55.0',
}

# Scrape the website's HTML
url = "https://dev3lop.com"
page = requests.get(url,  headers=headers)
soup = BeautifulSoup(page.content, "html.parser")

# Extract the text from the website
text = soup.get_text()

# Tokenize the text
tokens = nltk.word_tokenize(text)

# Perform part-of-speech tagging on the tokens
tagged_tokens = nltk.pos_tag(tokens)

# Print the tagged tokens
print(tagged_tokens)

This script uses the requests library to scrape the HTML of the website specified in the url variable. It then uses the BeautifulSoup library to extract the text from the HTML. The text is tokenized using the word_tokenize() function from NLTK, and then part-of-speech tagging is performed on the tokens using the pos_tag() function. The resulting list of tagged tokens is then printed to the console.

Filtering out common words

If you’re digging deeper, you may want to see what “NN” for nouns, “VB” for verbs, and “JJ” for adjectives are in usage.

We can quickly filter out the POS tags that are not useful for our analysis, such as punctuation marks or common function words like “is” or “the”. For example, you can use a list comprehension to filter out the POS tags that are not in a certain list of POS tags that you are interested in analyzing:

# List of POS tags to include in the analysis
include_pos = ["NN", "VB", "JJ"]

# Filter the tagged tokens to include only the specified POS tags
filtered_tokens = [(token, pos) for token, pos in tagged_tokens if pos in include_pos]

# Print the filtered tokens
print(filtered_tokens)

Counting occurrences

# Count filtered tokens
token_counts = Counter(filtered_tokens)

# Print counts
print(token_counts)

Final output will look like the following;

Now that you’re done counting occurrences, you can inspect the print of token_counts and notice this method also helped you sort the information from largest to smallest. We hope this lesson on Part-of-Speech Tagging using a Web Scrapped Website is a solution you’re able to take into consideration when generating your next python data pipeline!

If you need assistance creating these tools, you can count on our data engineering consulting services to help elevate your python engineering needs!

Collaboration Across the Company: Driving Reliability, Performance, Scalability, and Observability in Your Database System

Collaboration Across the Company: Driving Reliability, Performance, Scalability, and Observability in Your Database System

Partnering with teams across the company to drive reliability, performance, scalability, and observability of the database system is essential for ensuring the smooth operation of the system. In this article, we will discuss the benefits of partnering with other teams and the steps that you can take to do this effectively.

  1. Benefits of partnering with other teams

Partnering with other teams across the company can bring a number of benefits for your database system. For example, working with the development team can help you ensure that the system is designed to meet the needs of the business, while working with the operations team can help you ensure that the system is well-maintained and that issues are resolved quickly. Additionally, working with teams such as security and compliance can ensure that the system is secure and compliant with relevant regulations.

  1. Identifying the teams you need to partner with

The first step in partnering with other teams is to identify the teams that you need to partner with. This will depend on the specific requirements of your system, but some common teams that you may need to partner with include:

  • Development teams: These teams are responsible for designing and building the system.
  • Operations teams: These teams are responsible for maintaining and running the system.
  • Security and compliance teams: These teams are responsible for ensuring that the system is secure and compliant with relevant regulations.
  • Business teams: These teams are responsible for ensuring that the system meets the needs of the business.
  1. Building relationships with the teams

Once you have identified the teams that you need to partner with, the next step is to build relationships with them. This will involve working closely with the teams, getting to know the team members, and building trust. Additionally, it’s important to establish a clear set of goals and expectations, as well as a plan for how you will work together.

  1. Communicating effectively

Effective communication is key to partnering with other teams. This will involve setting up regular meetings and check-ins, as well as establishing clear lines of communication. Additionally, it’s important to ensure that everyone is aware of the status of the system and any issues that may arise.

  1. Continuously monitoring and improving

Finally, it’s important to continuously monitor and improve the partnerships that you have established. This will involve analyzing the performance of the partnerships and looking for areas where improvements can be made. Additionally, it’s important to keep the lines of communication open and to ensure that everyone is aware of the status of the system and any issues that may arise.

In conclusion, partnering with teams across the company to drive reliability, performance, scalability, and observability of the database system is essential for ensuring the smooth operation of the system. By identifying the teams that you need to partner with, building relationships with them, communicating effectively, and continuously monitoring and improving the partnerships, you can ensure that your database system is able to meet the needs of the business, and that issues are resolved quickly and efficiently.

Creating an Efficient System for Addressing High-Priority Issues: Building a Tooling Chain

Creating an Efficient System for Addressing High-Priority Issues: Building a Tooling Chain

Building a tooling chain to help diagnose operational issues and address high-priority issues as they arise is crucial for ensuring the smooth operation of any system. In this article, we will discuss the steps that you can take to build a tooling chain that can help you quickly identify and resolve issues as they arise.

  1. Identifying the tools you need

The first step in building a tooling chain is to identify the tools that you will need. This will depend on the specific requirements of your system, but some common tools that are used for diagnosing operational issues include:

  • Monitoring tools: These tools can be used to track the performance of your system and to identify any issues that may be occurring.
  • Logging tools: These tools can be used to collect and analyze log data from your system, which can be used to identify and troubleshoot issues.
  • Performance analysis tools: These tools can be used to analyze the performance of your system, which can be used to identify bottlenecks and other issues.
  1. Integrating the tools

Once you have identified the tools that you will need, the next step is to integrate them into a cohesive tooling chain. This will involve setting up the tools so that they can work together and share data, as well as configuring them so that they can be used effectively.

  1. Building an alerting system

An important part of building a tooling chain is building an alerting system. This will involve setting up the tools so that they can send alerts when specific conditions are met. For example, you may set up an alert to be sent when the system’s CPU usage exceeds a certain threshold.

  1. Establishing a triage process

Once you have built your tooling chain, it’s important to establish a triage process. This will involve setting up a process for identifying, prioritizing, and resolving issues as they arise. This will typically involve creating a set of procedures for identifying and resolving issues, as well as creating a team that is responsible for managing the triage process.

  1. Continuously monitoring and improving

Finally, it’s important to continuously monitor and improve your tooling chain. This will involve analyzing the performance of the tools and the triage process, and looking for areas where improvements can be made. Additionally, it’s important to keep the tools up to date and to ensure that they are configured correctly.

In conclusion, building a tooling chain to help diagnose operational issues and address high-priority issues as they arise is crucial for ensuring the smooth operation of any system. By identifying the tools that you will need, integrating them into a cohesive tooling chain, building an alerting system, establishing a triage process, and continuously monitoring and improving your tooling chain, you can ensure that your system is able to quickly identify and resolve issues as they arise.

Streamlining Your Database Management: Best Practices for Design, Improvement, and Automation

Streamlining Your Database Management: Best Practices for Design, Improvement, and Automation

Designing, improving, and automating processes like database provision, schema migration, and capacity planning can be a challenging task, but with the right approach, it can be made much simpler. In this article, we will explore some best practices and tools that can help you design, improve, and automate these processes.

  1. Designing processes

The first step in designing processes is to understand the requirements of the system. This includes understanding the data that will be stored, the number of users, and the expected load on the system. Once you have a good understanding of the requirements, you can start designing the processes.

It’s important to keep in mind that the processes should be designed to be as simple and efficient as possible. This means that they should be easy to understand and maintain, and they should be designed to minimize the number of steps required to complete a task.

  1. Improving processes

Once the processes have been designed, it’s important to continuously monitor and improve them. This can be done by analyzing the performance of the system and looking for areas where improvements can be made. Common areas for improvement include reducing the number of steps required to complete a task, optimizing the performance of the system, and reducing the amount of manual work required.

  1. Automating processes

Automating processes can significantly improve the efficiency and reliability of your system. This can be done by using tools like configuration management tools, which can be used to automate the provisioning and configuration of your system. Additionally, you can use tools like database migration tools, which can be used to automate the process of migrating data between different database systems.

  1. Capacity Planning

Capacity planning is an important step in ensuring that your system is able to handle the expected load. This involves determining the amount of resources required to support the system, and then scaling the system accordingly. This can be done by monitoring the performance of the system, and then making adjustments as needed.

In conclusion, designing, improving, and automating processes like database provision, schema migration, and capacity planning can be a challenging task, but with the right approach, it can be made much simpler. By understanding the requirements of the system, designing simple and efficient processes, continuously monitoring and improving the processes, and automating the processes, you can ensure that your system is able to handle the expected load and provide a high level of performance.