SAP Data in Your Python Analytics Workflows

Python Connector for SAP ERP Data Connectivity using duckdb. Integrate SAP ERP with popular Python tools like Pandas, SQLAlchemy, and Dash.

Extract, transfer, and load (ETL) SAP Data in Python using RFC

The ERPL DuckDB Extension for SAP revolutionizes the development of ETL applications and pipelines in Python, specifically designed for SAP data.

Harnessing the power of Python’s comprehensive module ecosystem, the ERPL DuckDB Extension facilitates quick deployment and smoother system integration. Our DuckDB Connector for SAP is your key to crafting high-performance applications and pipelines focused on extracting, transforming, and loading SAP data. This guide illustrates the process of connecting to SAP using the ERPL DuckDB Extension in Python, coupled with pandas, to manage SAP data effectively and efficiently.

Our SAP Connector is engineered for advanced, optimized data processing, guaranteeing superior performance with real-time SAP data in Python. It adeptly navigates complex SQL queries to SAP, strategically directing supported SQL operations like filters and aggregations straight to SAP. For operations not inherently supported by SAP, such as certain SQL functions and JOIN operations, our connector leverages the embedded SQL engine to process these on the client side, ensuring comprehensive and efficient data handling.

Query SAP table and join with local data

A first example demonstrates how to join two SAP tables with an external table. We’ll be using the ABAP Flight Reference Scenario, specifically joining the SFLIGHT and SPFLI tables which contain flight and flight schedule details respectively, with an external table WEATHER that holds weather information. We will extract flight information and associated temperatures at departure and arrival cities.

This example can also be found in the example (SAP Flight Demo Python) folder of our ERP extension.

Import DuckDB & load ERPL extension

This section offers a step-by-step guide on integrating DuckDB with the ERPL extension for Python-based SAP data analytics. It includes instructions on importing DuckDB, configuring connections, and details on installing and loading the ERPL extension. An example script illustrates the process, emphasizing the activation of the ERPL Trampoline Extension for efficient SAP data handling. Additionally, the section links to the ERPL installation guide, providing detailed instructions for setting up the required environment.

For detailed installation instructions, please refer to the ERPL Installation Guide.

import duckdb

con = duckdb.connect(config={"allow_unsigned_extensions": "true"})
con.sql("SET custom_extension_repository = 'http://get.erpl.io';")
con.install_extension("erpl")
con.load_extension("erpl")
con.sql("""
SET sap_ashost = 'localhost';
SET sap_sysnr = '00';
SET sap_user = 'DEVELOPER';
SET sap_password = 'ABAPtr1909';
SET sap_client = '001';
SET sap_lang = 'EN';
""")

You should get the following message:

-- Loading ERPL Trampoline Extension. --
(The purpose of the extension is to extract dependencies and load the ERPL implementation)
Saving ERPL SAP dependencies to '/home/jr/.duckdb/extensions/v0.9.2/linux_amd64' and loading them ... done
ERPL extension extracted and saved to /home/jr/.duckdb/extensions/v0.9.2/linux_amd64.
ERPL implementation extension installed from /home/jr/.duckdb/extensions/v0.9.2/linux_amd64/erpl_impl.duckdb_extension.
ERPL implementation extension loaded. For instructions how to use it visit https://erpl.io
con.sql("PRAGMA sap_rfc_ping")
┌─────────┐
│   msg   │
varchar
├─────────┤
│ PONG    │
└─────────┘

If the loading of the extension was successful, we can find the exportet functions in the list of duckdb_functions()

con.sql("SELECT * FROM duckdb_functions() WHERE function_name LIKE '%sap%';")
┌───────────────┬─────────────┬──────────────────────┬───────────────┬───┬──────────┬──────────────┬─────────┐
│ database_name │ schema_name │    function_name     │ function_type │ … │ internal │ function_oid │ example │
varcharvarcharvarcharvarchar    │   │ boolean  │    int64     │ varchar
├───────────────┼─────────────┼──────────────────────┼───────────────┼───┼──────────┼──────────────┼─────────┤
system        │ main        │ sap_read_table       │ table         │ … │ true1415NULL
system        │ main        │ sap_describe_fields  │ table         │ … │ true1413NULL
system        │ main        │ sap_show_tables      │ table         │ … │ true1411NULL
system        │ main        │ sap_rfc_describe_f…  │ table         │ … │ true1409NULL
system        │ main        │ sap_rfc_search_fun…  │ table         │ … │ true1407NULL
system        │ main        │ sap_rfc_search_group │ table         │ … │ true1405NULL
system        │ main        │ sap_rfc_invoke       │ table         │ … │ true1403NULL
system        │ main        │ sap_rfc_ping         │ pragma        │ … │ true1401NULL
system        │ main        │ sap_rfc_function_d…  │ pragma        │ … │ true1417NULL
system        │ main        │ sap_rfc_set_trace_…  │ pragma        │ … │ true1419NULL
system        │ main        │ sap_rfc_set_trace_…  │ pragma        │ … │ true1421NULL
system        │ main        │ sap_rfc_set_maximu…  │ pragma        │ … │ true1423NULL
system        │ main        │ sap_rfc_set_maximu…  │ pragma        │ … │ true1425NULL
system        │ main        │ sap_rfc_set_ini_path │ pragma        │ … │ true1427NULL
system        │ main        │ sap_rfc_reload_ini…  │ pragma        │ … │ true1429NULL
├───────────────┴─────────────┴──────────────────────┴───────────────┴───┴──────────┴──────────────┴─────────┤
15 rows                                                                               14 columns (7 shown) │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Explore the schema of the relevant tables

The ERPL extension provides the method sap_describe_fields to explore the data dictionary schema of the respective table. For exploring local data we also can use the DESCRIBE command to get the fields of e.g. a CSV-file.

con.sql("SELECT * FROM sap_describe_fields('SFLIGHT');")
┌─────────┬─────────┬────────────┬──────────────────────┬───┬─────────────┬───────────┬───────────┬──────────┐
│   pos   │ is_key  │   field    │         text         │ … │ check_table │ ref_table │ ref_field │ language │
varcharvarcharvarcharvarchar        │   │   varcharvarcharvarcharvarchar
├─────────┼─────────┼────────────┼──────────────────────┼───┼─────────────┼───────────┼───────────┼──────────┤
0001    │ X       │ MANDT      │ Client               │ … │ T000        │           │           │ E        │
0002    │ X       │ CARRID     │ Airline Code         │ … │ SCARR       │           │           │ E        │
0003    │ X       │ CONNID     │ Flight Connection …  │ … │ SPFLI       │           │           │ E        │
0004    │ X       │ FLDATE     │ Flight date          │ … │             │           │           │ E        │
0005    │         │ PRICE      │ Airfare              │ … │             │ SFLIGHT   │ CURRENCY  │ E        │
0006    │         │ CURRENCY   │ Local currency of …  │ … │ SCURX       │           │           │ E        │
0007    │         │ PLANETYPE  │ Aircraft Type        │ … │ SAPLANE     │           │           │ E        │
0008    │         │ SEATSMAX   │ Maximum Capacity i…  │ … │             │           │           │ E        │
0009    │         │ SEATSOCC   │ Occupied Seats in …  │ … │             │           │           │ E        │
0010    │         │ PAYMENTSUM │ Total of current b…  │ … │             │ SFLIGHT   │ CURRENCY  │ E        │
0011    │         │ SEATSMAX_B │ Maximum Capacity i…  │ … │             │           │           │ E        │
0012    │         │ SEATSOCC_B │ Occupied Seats in …  │ … │             │           │           │ E        │
0013    │         │ SEATSMAX_F │ Maximum Capacity i…  │ … │             │           │           │ E        │
0014    │         │ SEATSOCC_F │ Occupied Seats in …  │ … │             │           │           │ E        │
├─────────┴─────────┴────────────┴──────────────────────┴───┴─────────────┴───────────┴───────────┴──────────┤
14 rows                                                                               11 columns (8 shown) │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
con.sql("SELECT * FROM sap_describe_fields('SPFLI');")
┌─────────┬─────────┬───────────┬──────────────────────┬───┬──────────┬─────────────┬───────────┬───────────┬──────────┐
│   pos   │ is_key  │   field   │         text         │ … │ decimals │ check_table │ ref_table │ ref_field │ language │
varcharvarcharvarcharvarchar        │   │ varcharvarcharvarcharvarcharvarchar
├─────────┼─────────┼───────────┼──────────────────────┼───┼──────────┼─────────────┼───────────┼───────────┼──────────┤
0001    │ X       │ MANDT     │ Client               │ … │ 000000   │ T000        │           │           │ E        │
0002    │ X       │ CARRID    │ Airline Code         │ … │ 000000   │ SCARR       │           │           │ E        │
0003    │ X       │ CONNID    │ Flight Connection …  │ … │ 000000   │             │           │           │ E        │
0004    │         │ COUNTRYFR │ Country Key          │ … │ 000000   │ SGEOCITY    │           │           │ E        │
0005    │         │ CITYFROM  │ Departure city       │ … │ 000000   │ SGEOCITY    │           │           │ E        │
0006    │         │ AIRPFROM  │ Departure airport    │ … │ 000000   │ SAIRPORT    │           │           │ E        │
0007    │         │ COUNTRYTO │ Country Key          │ … │ 000000   │ SGEOCITY    │           │           │ E        │
0008    │         │ CITYTO    │ Arrival city         │ … │ 000000   │ SGEOCITY    │           │           │ E        │
0009    │         │ AIRPTO    │ Destination airport  │ … │ 000000   │ SAIRPORT    │           │           │ E        │
0010    │         │ FLTIME    │ Flight time          │ … │ 000000   │             │           │           │ E        │
0011    │         │ DEPTIME   │ Departure time       │ … │ 000000   │             │           │           │ E        │
0012    │         │ ARRTIME   │ Arrival time         │ … │ 000000   │             │           │           │ E        │
0013    │         │ DISTANCE  │ Distance             │ … │ 000004   │             │ SPFLI     │ DISTID    │ E        │
0014    │         │ DISTID    │ Mass unit of dista…  │ … │ 000000   │             │           │           │ E        │
0015    │         │ FLTYPE    │ Flight type          │ … │ 000000   │             │           │           │ E        │
0016    │         │ PERIOD    │ Arrival n day(s) l…  │ … │ 000000   │             │           │           │ E        │
├─────────┴─────────┴───────────┴──────────────────────┴───┴──────────┴─────────────┴───────────┴───────────┴──────────┤
16 rows                                                                                         11 columns (9 shown) │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
con.sql("DESCRIBE SELECT * FROM 'WEATHER.csv'")
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  nullkeydefault │  extra  │
varcharvarcharvarcharvarcharvarcharvarchar
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ FLDATE      │ DATE        │ YES     │ NULLNULLNULL
│ COUNTRY     │ VARCHAR     │ YES     │ NULLNULLNULL
│ CITY        │ VARCHAR     │ YES     │ NULLNULLNULL
│ TEMPERATURE │ DOUBLE      │ YES     │ NULLNULLNULL
│ CONDITION   │ VARCHAR     │ YES     │ NULLNULLNULL
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘

Join tables query

The actual SQL query joins the three tables and performs the following operations:

  • Retrieves flight details from SFLIGHT using ERPL’s sap_read_table, aliasing it as f.
  • Again using ERPL’s sap_read_table we join SPFLI (aliased as s) on MANDT, CARRID, and CONNID to get the flight’s city of origin and destination.
  • Incorporates two instances of an external weather data CSV file, w_from and w_to, matching on flight date and respective cities’ country and name for departure and arrival.
  • Rounds the temperature data to one decimal place for readability.
  • Orders the results by CARRIER_ID, CONNECTION_ID, and FLIGHT_DATE.
  • Limits the output to the first 25 rows for a concise view.

The output of this query will provide a comprehensive view of the flights, including their departure and arrival cities, and the corresponding temperatures, thus offering valuable insights for flight operations analysis.

con.sql("""
SELECT 
  f.CARRID,
  f.CONNID,
  f.FLDATE,
  s.CITYFROM as CITY_FROM,
  ROUND(w_from.TEMPERATURE, 1) as TEMP_FROM,
  s.CITYTO as CITY_TO,
  ROUND(w_to.TEMPERATURE, 1) as TEMP_TO,
  FROM sap_read_table('SFLIGHT') AS f
  JOIN sap_read_table('SPFLI') AS s 
      ON (f.MANDT = s.MANDT AND f.CARRID = s.CARRID AND f.CONNID = s.CONNID)
  JOIN 'WEATHER.csv' AS w_from
      ON (f.FLDATE = w_from.FLDATE AND s.COUNTRYFR = w_from.COUNTRY AND s.CITYFROM = w_from.CITY)
  JOIN 'WEATHER.csv' AS w_to
      ON (f.FLDATE = w_to.FLDATE AND s.COUNTRYTO = w_to.COUNTRY AND s.CITYTO = w_to.CITY)
  ORDER BY 1, 2, 3
  LIMIT 25
""")
┌─────────┬─────────┬────────────┬───────────┬───────────┬──────────┬─────────┐
│ CARRID  │ CONNID  │   FLDATE   │ CITY_FROM │ TEMP_FROM │ CITY_TO  │ TEMP_TO │
varcharvarchardatevarchardoublevarchardouble
├─────────┼─────────┼────────────┼───────────┼───────────┼──────────┼─────────┤
│ LH      │ 04002016-11-18 │ FRANKFURT │      11.7NEW YORK │    21.5
│ LH      │ 04002017-02-06 │ FRANKFURT │      18.5NEW YORK │    13.8
│ LH      │ 04002017-04-27 │ FRANKFURT │      18.2NEW YORK │    19.9
│ LH      │ 04002017-04-28 │ FRANKFURT │      22.3NEW YORK │    18.7
│ LH      │ 04002017-05-28 │ FRANKFURT │      21.5NEW YORK │    23.1
│ LH      │ 04002017-07-16 │ FRANKFURT │      20.7NEW YORK │    28.1
│ LH      │ 04002017-10-04 │ FRANKFURT │      13.2NEW YORK │    27.1
│ LH      │ 04002017-12-23 │ FRANKFURT │      27.6NEW YORK │    19.4
└─────────┴─────────┴────────────┴───────────┴───────────┴──────────┴─────────┘

Call BAPIS directly

In a second example, we call BAPIs of the flight scenario directly via RFC.

Explore the API of a BAPI

First we search relevant BAPIS stemming from the flights example.

con.sql("SELECT * FROM sap_rfc_search_function(FUNCNAME='BAPI_FLIGHT*') ORDER BY 1")
```

````sql
┌───────────────────────────────┬────────────────────┬─────────┬─────────┬─────────────────────────────┐
│           FUNCNAME            │     GROUPNAME      │  APPL   │  HOST   │            STEXT            │
varcharvarcharvarcharvarcharvarchar
├───────────────────────────────┼────────────────────┼─────────┼─────────┼─────────────────────────────┤
│ BAPI_FLIGHT_CHECKAVAILIBILITY │ SAPBC_BAPI_SFLIGHT │         │         │ Check flight availability   │
│ BAPI_FLIGHT_GETDETAIL         │ SAPBC_BAPI_SFLIGHT │         │         │ Find details about a flight │
│ BAPI_FLIGHT_GETLIST           │ SAPBC_BAPI_SFLIGHT │         │         │ Find list of flights        │
│ BAPI_FLIGHT_SAVEREPLICA       │ SAPBC_BAPI_SFLIGHT │         │         │ Save replicated flight data
└───────────────────────────────┴────────────────────┴─────────┴─────────┴─────────────────────────────┘

With sap_rfc_invoke (as the function name implies) we can invoke functions and get return values.

con.sql("SELECT * FROM sap_rfc_invoke('BAPI_FLIGHT_GETLIST', path='/FLIGHT_LIST')")
┌───────────┬───────────────────┬───────────┬────────────┬───┬────────────┬───────────────┬─────────┬──────────┐
│ AIRLINEID │      AIRLINE      │ CONNECTID │ FLIGHTDATE │ … │  ARRDATE   │     PRICE     │  CURR   │ CURR_ISO │
varcharvarcharvarchardate    │   │    datedecimal(12,4) │ varcharvarchar
├───────────┼───────────────────┼───────────┼────────────┼───┼────────────┼───────────────┼─────────┼──────────┤
│ LH        │ Lufthansa         │ 04002016-11-18 │ … │ 2016-11-18666.0000 │ EUR     │ EUR      │
│ LH        │ Lufthansa         │ 04002017-02-06 │ … │ 2017-02-06666.0000 │ EUR     │ EUR      │
│ LH        │ Lufthansa         │ 04002017-04-27 │ … │ 2017-04-27666.0000 │ EUR     │ EUR      │
│ LH        │ Lufthansa         │ 04002017-04-28 │ … │ 2017-04-28666.0000 │ EUR     │ EUR      │
│ LH        │ Lufthansa         │ 04002017-05-28 │ … │ 2017-05-28666.0000 │ EUR     │ EUR      │
│ LH        │ Lufthansa         │ 04002017-07-16 │ … │ 2017-07-16666.0000 │ EUR     │ EUR      │
│ LH        │ Lufthansa         │ 04002017-10-04 │ … │ 2017-10-04666.0000 │ EUR     │ EUR      │
│ LH        │ Lufthansa         │ 04002017-12-23 │ … │ 2017-12-23666.0000 │ EUR     │ EUR      │
│ AA        │ American Airlines │ 00172016-11-15 │ … │ 2016-11-15422.9400 │ USD     │ USD      │
│ AA        │ American Airlines │ 00172017-02-03 │ … │ 2017-02-03422.9400 │ USD     │ USD      │
│ ·         │       ·           │  ·        │     ·      │ · │     ·      │          ·    │  ·      │  ·       │
│ ·         │       ·           │  ·        │     ·      │ · │     ·      │          ·    │  ·      │  ·       │
│ ·         │       ·           │  ·        │     ·      │ · │     ·      │          ·    │  ·      │  ·       │
│ JL        │ Japan Airlines    │ 04072017-04-26 │ … │ 2017-04-26106136.0000 │ JPY     │ JPY      │
│ JL        │ Japan Airlines    │ 04072017-07-15 │ … │ 2017-07-15106136.0000 │ JPY     │ JPY      │
│ JL        │ Japan Airlines    │ 04072017-10-03 │ … │ 2017-10-03106136.0000 │ JPY     │ JPY      │
│ JL        │ Japan Airlines    │ 04072017-12-22 │ … │ 2017-12-22106136.0000 │ JPY     │ JPY      │
│ JL        │ Japan Airlines    │ 04082016-11-18 │ … │ 2016-11-19106136.0000 │ JPY     │ JPY      │
│ JL        │ Japan Airlines    │ 04082017-02-06 │ … │ 2017-02-07106136.0000 │ JPY     │ JPY      │
│ JL        │ Japan Airlines    │ 04082017-04-27 │ … │ 2017-04-28106136.0000 │ JPY     │ JPY      │
│ JL        │ Japan Airlines    │ 04082017-07-16 │ … │ 2017-07-17106136.0000 │ JPY     │ JPY      │
│ JL        │ Japan Airlines    │ 04082017-10-04 │ … │ 2017-10-05106136.0000 │ JPY     │ JPY      │
│ JL        │ Japan Airlines    │ 04082017-12-23 │ … │ 2017-12-24106136.0000 │ JPY     │ JPY      │
├───────────┴───────────────────┴───────────┴────────────┴───┴────────────┴───────────────┴─────────┴──────────┤
94 rows (20 shown)                                                                      14 columns (8 shown) │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

With the method sap_rfc_describe_function('$BAPI_NAME') one can explore the API of a RFC function. The method returns four columns:

  • NAME: Containing the full name of the BAPI, typically this is the same as the argument of the function.
  • IMPORT: Contains a list with description of all input types. Have a look especially at the required flag. This parameters have to be provided.
  • EXPORT: Also list with export parameters.
  • CHANGING: Are so called in/out parameters, which can be input as well as output.
  • TABLES: This are parameters in form of tables (which are lists of structs in DuckDB). Tables can also have in/out direction.

In the previous example we used the path parameter of the invoke function to select the FLIGHT_LIST table.

from IPython.display import JSON

desc = con.sql("SELECT * FROM sap_rfc_describe_function('BAPI_FLIGHT_GETLIST')").df()
#display(desc.iloc[0])
display(JSON(desc["IMPORT"][0], root="IMPORT"))
display(JSON(desc["TABLES"][0], root="TABLES"))

con.sql("SELECT * FROM sap_rfc_describe_function('BAPI_FLIGHT_GETDETAIL')").fetchone()[1]
[{'NAME': 'AIRLINEID',
  'TYPE': 'RFCTYPE_CHAR',
  'DIRECTION': 'RFC_IMPORT',
  'LENGTH': 3,
  'DECIMALS': 0,
  'DEFAULTVALUE': '',
  'PARAMETERTEXT': 'Airline Code',
  'OPTIONAL': False},
 {'NAME': 'CONNECTIONID',
  'TYPE': 'RFCTYPE_NUM',
  'DIRECTION': 'RFC_IMPORT',
  'LENGTH': 4,
  'DECIMALS': 0,
  'DEFAULTVALUE': '',
  'PARAMETERTEXT': 'Flight connection code',
  'OPTIONAL': False},
 {'NAME': 'FLIGHTDATE',
  'TYPE': 'RFCTYPE_DATE',
  'DIRECTION': 'RFC_IMPORT',
  'LENGTH': 8,
  'DECIMALS': 0,
  'DEFAULTVALUE': '',
  'PARAMETERTEXT': 'Departure date',
  'OPTIONAL': False}]
res = con.sql("SELECT * FROM sap_rfc_invoke('BAPI_FLIGHT_GETDETAIL', {'AIRLINEID': 'LH', 'CONNECTIONID': '0400', 'FLIGHTDATE': '2016-11-18'::DATE })").df()
res.to_dict(orient="records")[0]

{'ADDITIONAL_INFO': {'FLIGHTTIME': 444,
  'DISTANCE': Decimal('6162.0000'),
  'UNIT': 'KM',
  'UNIT_ISO': 'KMT',
  'PLANETYPE': 'A340-600',
  'FLIGHTTYPE': ''},
 'AVAILIBILITY': {'ECONOMAX': 330,
  'ECONOFREE': 10,
  'BUSINMAX': 30,
  'BUSINFREE': 0,
  'FIRSTMAX': 20,
  'FIRSTFREE': 0},
 'FLIGHT_DATA': {'AIRLINEID': 'LH',
  'AIRLINE': 'Lufthansa',
  'CONNECTID': '0400',
  'FLIGHTDATE': datetime.date(2016, 11, 18),
  'AIRPORTFR': 'FRA',
  'CITYFROM': 'FRANKFURT',
  'AIRPORTTO': 'JFK',
  'CITYTO': 'NEW YORK',
  'DEPTIME': datetime.time(10, 10),
  'ARRTIME': datetime.time(11, 34),
  'ARRDATE': datetime.date(2016, 11, 18),
  'PRICE': Decimal('666.0000'),
  'CURR': 'EUR',
  'CURR_ISO': 'EUR'},
 'EXTENSION_IN': [],
 'EXTENSION_OUT': [],
 'RETURN': [{'TYPE': 'S',
   'ID': 'BC_IBF',
   'NUMBER': '000',
   'MESSAGE': 'Method was executed',
   'LOG_NO': '',
   'LOG_MSG_NO': '000000',
   'MESSAGE_V1': '',
   'MESSAGE_V2': '',
   'MESSAGE_V3': '',
   'MESSAGE_V4': '',
   'PARAMETER': '',
   'ROW': 0,
   'FIELD': '',
   'SYSTEM': 'A4HCLNT001'}]}