Load an ERP Table

ERPL makes it easy to find, explain and query ERP tables directly from DuckDB. In the following document we show you how.

Introduction

SAP ERP essentially keeps all of its data in a relational schema. This means that all the data is stored in tables and the relationships between the tables are defined by foreign keys. This is a very common approach in the database world and is also used by many other ERP systems. The main difference between SAP ERP and other ERP systems is that SAP ERP is highly customizable. This means that the tables and their relationships can be changed by the customer. In this document we show you

  • How to show you available ERP tables, their columns and data types
  • How to get texts and other metadata for the columns
  • How to query ERP tables directly from DuckDB

Get yourself prepared

Required authority objects (RFC)

To query a table, far reaching access has to be granted to the user issuing the query. For more information, refer our documentation which designated authority objects (RFC) must be created.

First and foremost you need to install DuckDB and the ERPL extension. DuckDB as various different installation options, please follow the instructions on the DuckDB website to get yourself prepared. For our scenario down below we recommend to install the Python or R bindings as well. You can check if this worked well with the following commands:

import duckdb
cursor = duckdb.connect()
print(cursor.execute('SELECT 42').fetchall())
library("DBI")
con = dbConnect(duckdb::duckdb(), ":memory:")
dbWriteTable(con, "iris", iris)
dbGetQuery(con, 'SELECT "Species", MIN("Sepal.Width") FROM iris GROUP BY "Species"')

For the ERPL extension please follow the instructions in the installation section of our documentation to get yourself prepared.

List all available ERP tables

A typical SAP ERP system contains thousands of tables. To find the table you are looking for, you can use the sap_show_tables function. This function returns a list of all available tables in the SAP ERP system. The function has the following signature:

SELECT * FROM sap_show_tables()

This will return a (long) list of tables. To find a specific table, you can supply a search string to the function. The search string is matched against the table name and the table description. For example, to find all tables starting with FLIGHT you can use the following query:

SELECT * FROM sap_show_tables('*SPFL*')

The star operator * is a wildcard and matches any number of characters. Supplying no search string is equvivalent to *. The result of the query will look like this:

┌──────────────────┬────────────────────────────────────┬─────────┐
│    table_name    │                text                │  class  │
│     varchar      │              varchar               │ varchar │
├──────────────────┼────────────────────────────────────┼─────────┤
│ ACMHEXSV_SPFLI   │ Generated Table for View           │ VIEW    │
│ DEMO_SCARR_SPFLI │ Generated Table for View           │ VIEW    │
│ DEMO_SPFLI       │ Generated Table for View           │ VIEW    │
│ SADL_V_SPFLI     │ Generated Table for View           │ VIEW    │
│ SPFL_HISTORY_SQL │ Generated Table for View           │ VIEW    │
│ SPFL_HIST_TS_SQL │ Generated Table for View           │ VIEW    │
│ DAAG_SPFLI       │ Flight Schedule                    │ TRANSP  │
│ WDR_TEST_SPFLI3  │ Flight Schedule                    │ TRANSP  │
│ SPFLI            │ Flight schedule                    │ TRANSP  │
│ ACMHEXDB_SPFLI   │ ACMHEX: Flight Schedules (DBTable) │ TRANSP  │
│ VSADLSB_AU_SPFLI │ Generated Table for View           │ VIEW    │
├──────────────────┴────────────────────────────────────┴─────────┤
│ 11 rows                                               3 columns │
└─────────────────────────────────────────────────────────────────┘

For more details on the sap_show_tables function, please refer to the RFC reference.

If you are not familiar with the internal naming scheme of SAP tables you can also use the sap_show_tables function to find tables by their description text. For example, to find all tables containing the words Flight Schedule you can use the following query:

SELECT * FROM sap_show_tables(TEXT='*Flight Schedule*')

Which will return a pretty similar result shown already above.

┌─────────────────┬────────────────────────────────────┬─────────┐
│   table_name    │                text                │  class  │
│     varchar     │              varchar               │ varchar │
├─────────────────┼────────────────────────────────────┼─────────┤
│ ACMHEXDB_SPFLI  │ ACMHEX: Flight Schedules (DBTable) │ TRANSP  │
│ DAAG_SPFLI      │ Flight Schedule                    │ TRANSP  │
│ WDR_TEST_SPFLI3 │ Flight Schedule                    │ TRANSP  │
└─────────────────┴────────────────────────────────────┴─────────┘

Describe a single ERP table

Now that you are found the table it is time to get some more information about it. There two possible pieces of information we make available to you:

  • The columns (in SAP terminology they are called fields) of the table.
  • The references of this table to other tables.

All of this information is saved in SAP in the so called data dictionary. The data dictionary is a central place where all the metadata of the tables is stored. To get access to this information from SAPGUI one could use the DDICT transaction code.

Get the columns of a table

For this you can use the sap_describe_fields function. This function returns a list of all columns of the table, their data types and some other metadata. The function has the following signature:

SELECT * FROM sap_describe_fields('SPFLI')

This will return a list of all columns of the table SPFLI. The result of the query will look like this:

┌─────────┬─────────┬───────────┬────────────────────────────────────┬──────────┬─────────┬──────────┬─────────────┬───────────┬───────────┬──────────┐
│   pos   │ is_key  │   field   │                text                │ sap_type │ length  │ decimals │ check_table │ ref_table │ ref_field │ language │
│ varchar │ varchar │  varchar  │              varchar               │ varchar  │ varchar │ varchar  │   varchar   │  varchar  │  varchar  │ varchar  │
├─────────┼─────────┼───────────┼────────────────────────────────────┼──────────┼─────────┼──────────┼─────────────┼───────────┼───────────┼──────────┤
│ 0001    │ X       │ MANDT     │ Client                             │ CLNT     │ 000003  │ 000000   │ T000        │           │           │ E        │
│ 0002    │ X       │ CARRID    │ Airline Code                       │ CHAR     │ 000003  │ 000000   │ SCARR       │           │           │ E        │
│ 0003    │ X       │ CONNID    │ Flight Connection Number           │ NUMC     │ 000004  │ 000000   │             │           │           │ E        │
│ 0004    │         │ COUNTRYFR │ Country Key                        │ CHAR     │ 000003  │ 000000   │ SGEOCITY    │           │           │ E        │
│ 0005    │         │ CITYFROM  │ Departure city                     │ CHAR     │ 000020  │ 000000   │ SGEOCITY    │           │           │ E        │
│ 0006    │         │ AIRPFROM  │ Departure airport                  │ CHAR     │ 000003  │ 000000   │ SAIRPORT    │           │           │ E        │
│ 0007    │         │ COUNTRYTO │ Country Key                        │ CHAR     │ 000003  │ 000000   │ SGEOCITY    │           │           │ E        │
│ 0008    │         │ CITYTO    │ Arrival city                       │ CHAR     │ 000020  │ 000000   │ SGEOCITY    │           │           │ E        │
│ 0009    │         │ AIRPTO    │ Destination airport                │ CHAR     │ 000003  │ 000000   │ SAIRPORT    │           │           │ E        │
│ 0010    │         │ FLTIME    │ Flight time                        │ INT4     │ 000010  │ 000000   │             │           │           │ E        │
│ 0011    │         │ DEPTIME   │ Departure time                     │ TIMS     │ 000006  │ 000000   │             │           │           │ E        │
│ 0012    │         │ ARRTIME   │ Arrival time                       │ TIMS     │ 000006  │ 000000   │             │           │           │ E        │
│ 0013    │         │ DISTANCE  │ Distance                           │ QUAN     │ 000009  │ 000004   │             │ SPFLI     │ DISTID    │ E        │
│ 0014    │         │ DISTID    │ Mass unit of distance (kms, miles) │ UNIT     │ 000003  │ 000000   │             │           │           │ E        │
│ 0015    │         │ FLTYPE    │ Flight type                        │ CHAR     │ 000001  │ 000000   │             │           │           │ E        │
│ 0016    │         │ PERIOD    │ Arrival n day(s) later             │ INT1     │ 000003  │ 000000   │             │           │           │ E        │
├─────────┴─────────┴───────────┴────────────────────────────────────┴──────────┴─────────┴──────────┴─────────────┴───────────┴───────────┴──────────┤
│ 16 rows                                                                                                                                  11 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

As the result is moderately long we don’t provide any filters as part of the function and rely on DuckDB to filter the result. For more details on the sap_describe_fields function, please refer to the RFC reference.

Query an ERP table

The core functionality of ERPL table access is of course the ability to query ERP tables directly from DuckDB and fetch the results. For this you can use the sap_read_table function. This function returns a list of all columns of the table, their data types and some other metadata. This function directly calls the SAP RFC sap_read_table and returns the result as a table. However it tries to handle most of the shortcomings of this function for you.

For more details on the sap_read_table function, please refer to the RFC reference.

The function has the following signature:

SELECT * FROM sap_read_table('SPFLI')

This will fetch all rows without any filter from the table SPFLI. However for most cases instead of the basic you want to use the extended version of the function. This typicalls referes to:

  • Selecting only a subset of columns
  • Adding a where condition to filter out rows already on the SAP side
  • Limiting the number of rows returned

For all of the above mentioned cases you have typically two options:

  • Either you specify the restrictions directly as a parameter of sap_read_table function. This makes it more explicit what is happening.
  • Or you rely on so called predicate pushdown of DuckDB, which will use normal SQL syntax, but tries to execute as much as possible on the SAP side. Technically, this means nothing more, than filling the parametes of the sap_read_table function with the information from the SQL query.

How to select specific columns?

To select specific columns you can use the COLUMNS parameter. This parameter takes a list of columns as a string. For example, to select only the columns MANDT, CARRID, CITYFROM and CITYTO you can use the following query:

SELECT * FROM sap_read_table('SPFLI', COLUMNS=['MANDT', 'CARRID', 'CITYFROM', 'CITYTO'])

which is equivalent to:

SELECT MANDT, CARRID, CITYFROM, CITYTO FROM sap_read_table('SPFLI')

Internally ERPL uses a DuckDB mechanism called projection pushdown to rewrite the second query to a function call similar to the first one. This means that only the selected columns are loaded from the SAP table.

How to add a where condition?

To restrict rows you can use the WHERE parameter. This parameter takes a list of conditions as a string. For example, to select only the rows where the CARRID is LH you can use the following query:

SELECT * FROM sap_read_table('SPFLI', FILTER='CARRID EQ ''LH''')

which is equivalent to:

SELECT * FROM sap_read_table('SPFLI') WHERE CARRID = 'LH'

to learn more about the exact syntax of the FILTER parameter please refer to the documentation of the RFC_READ_TABLE function.

Again the two queries are equivalent due the application of the filter pushdown mechanism of DuckDB. This means that SQL WHERE clauses are translated into the FILTER parameter of the sap_read_table function.

How to limit the number of rows?

To limit the number of rows you can use the MAX_ROWS parameter. This parameter takes a number as an integer. For example, to select only the first 5 rows you can use the following query:

SELECT * FROM sap_read_table('SPFLI', MAX_ROWS=5)

which is equivalent to:

SELECT * SAP_READ_TABLE('KNA1') LIMIT 5

Controlling the degree of parallelism

One of the unique features of ERPL’s sap_read_table is that it loads row-blocks of each column in parallel. This is done to speed up the loading of the data. The degree of parallelism can be controlled by the THREADS parameter. By default the parallel loading is turned off. Be careful with this parameter, as it can put significant load on the SAP system. (And SAP may be expensive, however is not a data crunching system. :-P)

We load the big table in Python / Jupyter to give you an impression of the difference:

%time con.sql("SELECT count(*) as N FROM sap_read_table('DD03L', THREADS=10)").show()

This leads to the following output:

┌─────────┐
│    N    │
│  int64  │
├─────────┤
│ 1819533 │
└─────────┘

CPU times: user 12 s, sys: 49.2 ms, total: 12 s
Wall time: 51.2 s

And in comparison with threads disabled:

%time con.sql("SELECT count(*) as N FROM sap_read_table('DD03L', THREADS=10)").show()
┌─────────┐
│    N    │
│  int64  │
├─────────┤
│ 1819533 │
└─────────┘

CPU times: user 12 s, sys: 49.2 ms, total: 12 s
Wall time: 51.2 s

Example: Join two SAP tables with a CSV file

The actual SQL query joins the three tables and performs the following operations: - Retrieves flight details from SFLIGHT using ERPL’s sap_read_table, aliasing it as f. - Again using ERPL’s sap_read_table we join SPFLI (aliased as s) on MANDT, CARRID, and CONNID to get the flight’s city of origin and destination. - Incorporates two instances of an external weather data CSV file (loaded via https from GitHub ), w_from and w_to, matching on flight date and respective cities’ country and name for departure and arrival. - Rounds the temperature data to one decimal place for readability. - Orders the results by CARRIER_ID, CONNECTION_ID, and FLIGHT_DATE. - Limits the output to the first 25 rows for a concise view.

The output of this query will provide a comprehensive view of the flights, including their departure and arrival cities, and the corresponding temperatures, thus offering valuable insights for flight operations analysis.

To make this query work, you first have to install and load the DuckDB httpfs extension. This extension allows you to load data from the internet directly into DuckDB. For installation instructions see the DuckDB documentation.

The query is as follows:

SELECT 
  f.CARRID,
  f.CONNID,
  f.FLDATE,
  s.CITYFROM as CITY_FROM,
  ROUND(w_from.TEMPERATURE, 1) as TEMP_FROM,
  s.CITYTO as CITY_TO,
  ROUND(w_to.TEMPERATURE, 1) as TEMP_TO,
  FROM sap_read_table('SFLIGHT') AS f
  JOIN sap_read_table('SPFLI') AS s 
      ON (f.MANDT = s.MANDT AND f.CARRID = s.CARRID AND f.CONNID = s.CONNID)
  JOIN 'WEATHER.csv' AS w_from
      ON (f.FLDATE = w_from.FLDATE AND s.COUNTRYFR = w_from.COUNTRY AND s.CITYFROM = w_from.CITY)
  JOIN 'https://raw.githubusercontent.com/DataZooDE/erpl/master/examples/WEATHER.csv' AS w_to
      ON (f.FLDATE = w_to.FLDATE AND s.COUNTRYTO = w_to.COUNTRY AND s.CITYTO = w_to.CITY)
  ORDER BY 1, 2, 3
  LIMIT 25

This leads to a result combining SAP and data from GitHub, cool … isn’t it?

┌─────────┬─────────┬────────────┬───────────┬───────────┬───────────────┬─────────┐
│ CARRID  │ CONNID  │   FLDATE   │ CITY_FROM │ TEMP_FROM │    CITY_TO    │ TEMP_TO │
│ varchar │ varchar │    date    │  varchar  │  double   │    varchar    │ double  │
├─────────┼─────────┼────────────┼───────────┼───────────┼───────────────┼─────────┤
│ AA      │ 0017    │ 2016-11-15 │ NEW YORK  │      18.4 │ SAN FRANCISCO │    26.3 │
│ AA      │ 0017    │ 2017-02-03 │ NEW YORK  │      20.0 │ SAN FRANCISCO │    18.1 │
│ AA      │ 0017    │ 2017-04-24 │ NEW YORK  │      23.5 │ SAN FRANCISCO │    10.3 │
│ AA      │ 0017    │ 2017-07-13 │ NEW YORK  │      24.7 │ SAN FRANCISCO │     9.4 │
│ AA      │ 0017    │ 2017-10-01 │ NEW YORK  │      15.8 │ SAN FRANCISCO │    12.5 │
│ AA      │ 0017    │ 2017-12-20 │ NEW YORK  │      16.0 │ SAN FRANCISCO │    23.4 │
│ AZ      │ 0555    │ 2016-11-15 │ ROME      │      18.6 │ FRANKFURT     │    22.0 │
│ AZ      │ 0555    │ 2017-02-03 │ ROME      │      33.0 │ FRANKFURT     │    24.7 │
│ AZ      │ 0555    │ 2017-04-24 │ ROME      │      15.7 │ FRANKFURT     │    29.9 │
│ AZ      │ 0555    │ 2017-07-13 │ ROME      │      20.6 │ FRANKFURT     │    24.6 │
│ ·       │  ·      │     ·      │  ·        │        ·  │  ·            │      ·  │
│ ·       │  ·      │     ·      │  ·        │        ·  │  ·            │      ·  │
│ ·       │  ·      │     ·      │  ·        │        ·  │  ·            │      ·  │
│ AZ      │ 0789    │ 2017-07-13 │ TOKYO     │      23.9 │ ROME          │    20.6 │
│ AZ      │ 0789    │ 2017-10-01 │ TOKYO     │      18.2 │ ROME          │    28.6 │
│ AZ      │ 0789    │ 2017-12-20 │ TOKYO     │      22.6 │ ROME          │    29.7 │
│ DL      │ 0106    │ 2016-11-13 │ NEW YORK  │      18.5 │ FRANKFURT     │    24.6 │
│ DL      │ 0106    │ 2017-02-01 │ NEW YORK  │      13.4 │ FRANKFURT     │    21.9 │
│ DL      │ 0106    │ 2017-04-22 │ NEW YORK  │      24.1 │ FRANKFURT     │    13.1 │
│ DL      │ 0106    │ 2017-07-11 │ NEW YORK  │       9.0 │ FRANKFURT     │    20.0 │
│ DL      │ 0106    │ 2017-09-29 │ NEW YORK  │      14.4 │ FRANKFURT     │    18.1 │
│ DL      │ 0106    │ 2017-12-18 │ NEW YORK  │      21.9 │ FRANKFURT     │    15.4 │
│ JL      │ 0407    │ 2016-11-17 │ TOKYO     │       6.2 │ FRANKFURT     │    24.5 │
├─────────┴─────────┴────────────┴───────────┴───────────┴───────────────┴─────────┤
│ 25 rows (20 shown)                                                     7 columns │
└──────────────────────────────────────────────────────────────────────────────────┘