Excel-Parquet Integration: Mastering Data Analysis with DuckDB

Unlock the Power of Excel for Parquet Data Analysis through DuckDB Connectivity.
Excel
Parquet
DuckDB
Author
Affiliation

Simon Müller

Published

January 15, 2024

Summary

  • The article guides on integrating Excel with Parquet files using DuckDB, highlighting the efficiency of DuckDB for large data sets and how it surpasses Excel’s normal data handling limits.
  • It includes step-by-step instructions for installing the DuckDB ODBC driver, configuring Excel, setting up the ODBC connection, and suggests using Power Query Desktop for data transformation.

Introduction

DuckDB, an open-source database engine optimized for OLAP, offers a lightweight, low-dependency solution for large-scale data analysis, reminiscent of Microsoft Access. Excelling in handling Parquet files with robust SQL support, DuckDB extends Excel’s data processing capabilities, enabling analysis of datasets beyond Excel’s usual limits. This blog post demonstrates the straightforward process of connecting Excel with Parquet files using DuckDB, unlocking new possibilities in data handling and analysis.

ODBC Installation

To begin querying Parquet files with DuckDB, you must first install the DuckDB ODBC driver. The specific release you’ll need for this setup is DuckDB Release 0.9.2. The installation process is as follows:

  1. Download the Driver: Visit the DuckDB official releases page, e.g. on Github (DuckDB Release 0.9.2), and locate current release. Ensure you download the ODBC driver specific to this version.

  2. Run the Installer: Once downloaded, run the installer file. This process usually requires administrative privileges, so you might need to have administrator access on your machine.

  3. Follow the Setup Instructions: The installer typically guides you through a series of prompts. Follow these instructions carefully to ensure the driver is correctly installed.

  4. Configure the ODBC Data Source: After installation, open the ODBC Data Source Administrator on your computer. Here, you will add a new data source using the DuckDB driver. Fill in the required fields, such as the name of the data source and the path to your DuckDB database file.

  5. Testing the Connection: Once configured, it’s a good practice to test the connection to ensure everything is set up correctly.

  6. Integration with Excel: After successful installation and configuration, you can then use Excel to connect to DuckDB through this ODBC driver. Set up a connection in Excel by selecting the DuckDB data source you’ve just configured.

Remember, always check the system requirements and compatibility of the ODBC driver with your operating system and Excel version to ensure smooth operation.

Configuration of Excel

To connect Excel with DuckDB using the ODBC driver, follow these steps:

  1. Open Excel: Launch Microsoft Excel on your computer.

  2. Go to Data Tab: In Excel, navigate to the ‘Data’ tab on the Ribbon interface.

  3. Get Data: Click on ‘Get Data’. In the dropdown menu, choose ‘From Other Sources’, and then select ‘From ODBC’.

  4. Choose the DuckDB DSN: A dialog box will appear listing available Data Source Names (DSNs). Select the DSN corresponding to the DuckDB ODBC driver you installed earlier. This DSN was set up when you configured the ODBC Data Source during the driver installation.

Excel Data Connector

Excel Data Connector

Configuration of the ODBC Connection

  1. Configure ODBC: You need to set `database=:memory:`. Write down your SQL-query in the SQL block linking to the parquet file you may want to analyze in Excel. Multiple Parquet files could be joined using the WITH statement and/or additional filters may be added to this query according to your needs.

ODBC Connection using DuckDB

ODBC Connection using DuckDB

Power Query Desktop for Data Transformation / Parametrization

Data Transformation: After defining the ODBC connection a data transformation step in Power Query Desktop could be applied. But this is out of the cope of this post.

Transformations using Power Query

Transformations using Power Query

Remember, the exact steps may slightly vary depending on the version of Excel you are using. Additionally, ensure that the DuckDB ODBC driver is properly installed and configured before attempting to connect through Excel.

Happy Analyzing!