Effortlessly Fetch Public Holiday Data with DuckDB and ERPL-Web Extension

Learn how to effortlessly fetch and analyze public holiday data using DuckDB and the ERPL-Web extension, streamlining your data management process with simple and customizable SQL queries.
Holidays
ERPL WEB
DuckDb
Author
Affiliation

Simon Müller

Published

July 13, 2024

Summary

DuckDB is a high-performance SQL OLAP database system known for its efficiency in handling analytical queries. However, accessing and analyzing public holiday data across different countries and date ranges can be challenging. By leveraging the ERPL-Web extension, you can seamlessly fetch and process this data within DuckDB.

We demonstrate how to retrieve public holiday data for Germany, showcasing the ease and power of this approach. This example highlights how DuckDB and ERPL-Web can streamline your data workflows by connecting your SQL environment with external data sources efficiently.

Introduction

Managing public holiday data can be cumbersome, especially when dealing with multiple countries and varying date ranges. However, with the power of DuckDB and the erpl-web extension, you can easily retrieve and analyze holiday data with just a few lines of SQL. In this post, we will walk you through how to create a function that fetches public holiday data using DuckDB’s erpl-web extension, making your data handling process more efficient and streamlined.

Why DuckDB?

DuckDB is a high-performance, in-process SQL OLAP database management system. It is designed to support analytical query workloads and can be embedded into applications, making it an ideal choice for handling data-intensive tasks. With its robust SQL capabilities and seamless integration with various extensions, DuckDB stands out as a versatile tool for data professionals.

Introducing the ERPL-Web Extension

The erpl-web extension enhances DuckDB’s functionality by enabling web requests directly within SQL queries. This is particularly useful for fetching data from APIs, such as the Open Holidays API, without leaving your SQL environment. By leveraging this extension, you can pull in external data, process it, and analyze it all within DuckDB.

Fetching Public Holiday Data

Let’s dive into how you can create a function to fetch public holiday data for a specified date range and country using the erpl-web extension.

Step-by-Step Guide

  1. Install the ERPL-Web Extension: Ensure that you have the erpl-web extension installed and loaded in your DuckDB environment. Here you find the tutorial for installing this extension.
INSTALL 'erpl_web' FROM 'http://get.erpl.io';
LOAD 'erpl_web'; 
  1. Create the SQL Function: Use the provided SQL function to fetch and process public holiday data.

Here’s the SQL function:

WITH german_holidays AS (
    SELECT
        UNNEST(FROM_JSON(CAST(content AS JSON), '["JSON"]'), recursive := true) AS content
    FROM http_get(
        'https://openholidaysapi.org/PublicHolidays?countryIsoCode=DE&languageIsoCode=DE&validFrom=2021-01-01&validTo=2023-06-30'
    )
)
SELECT
    (content::JSON->>'startDate')::DATE AS date,
    (content::JSON->>'nationwide')::BOOLEAN AS nationwide,
    CAST(UNNEST(FROM_JSON(content::JSON->'name', '["JSON"]'), recursive := true)::JSON->>'text' AS VARCHAR) AS name,
    UNNEST(FROM_JSON(content::JSON->'subdivisions', '["JSON"]'), recursive := true)::JSON->>'code' AS subdivisions
FROM german_holidays
WHERE (content::JSON->>'nationwide')::BOOLEAN;

Customizing the Query

To customize the query for different countries and date ranges, modify the parameters in the URL within the http_get function. For example:

WITH german_holidays AS (
    SELECT
        UNNEST(FROM_JSON(CAST(content AS JSON), '["JSON"]'), recursive := true) AS content
    FROM http_get(
        'https://openholidaysapi.org/PublicHolidays?countryIsoCode=US&languageIsoCode=EN&validFrom=2022-01-01&validTo=2022-12-31'
    )
)
SELECT
    (content::JSON->>'startDate')::DATE AS date,
    (content::JSON->>'nationwide')::BOOLEAN AS nationwide,
    CAST(UNNEST(FROM_JSON(content::JSON->'name', '["JSON"]'), recursive := true)::JSON->>'text' AS VARCHAR) AS name,
    UNNEST(FROM_JSON(content::JSON->'subdivisions', '["JSON"]'), recursive := true)::JSON->>'code' AS subdivisions
FROM german_holidays
WHERE (content::JSON->>'nationwide')::BOOLEAN;

Benefits

  • Efficiency: Streamlines the process of fetching and processing public holiday data.

  • Flexibility: Easily adjust the date range and country parameters to suit your needs.

  • Integration: Seamlessly integrates with other DuckDB functionalities, allowing for comprehensive data analysis.

Conclusion

With DuckDB and the erpl-web extension, managing public holiday data becomes a breeze. This powerful combination allows you to efficiently fetch, process, and analyze data, making your workflow more streamlined and effective. Whether you’re a developer, data analyst, or just someone who loves working with data, DuckDB with the erpl-web extension offers a robust solution for your data needs.

Try it out today and see how DuckDB can transform your data handling processes!