Replicate mass data from SAP to DuckDB via ODP

ERPL implements a client for the SAP ODP services making it possible to replicate mass data from SAP to DuckDB. This works for ERP, S/4HANA, BW and ODS views systems.

Introduction

Operational Data Provisioning (ODP) is a SAP framework especially designed to suppert data analytics and data replication. It is a framework that allows you to replicate mass data from SAP to a target system. The target system can be a SAP system or a non-SAP system. The cool thing about ODP that it is on the one hand deeply integrated into the SAP system and on the other hand very efficient.

ODP Overview (Source: https://community.sap.com/t5/technology-blogs-by-members/operational-data-provisioning-simplified/ba-p/13483334)

ODP Overview (Source: https://community.sap.com/t5/technology-blogs-by-members/operational-data-provisioning-simplified/ba-p/13483334)

Our ERPL package acts as a ODP RFC client, much in the same way a SAP BW system would. This enables you to query and/or replicate the following SAP objects:

SAP Source Objects ODP Context Description
- ERP Extractors and Data Sources SAPI DataSource Extractors in ECC
- CDS Views (including S/4 Views) ABAP_CDS ODP interface to ABAP Core Data Services
- DSO / aDSO
- CompositeProvider
- InfoObjects
- Query as InfoProvider
BW ODP enabled objects in BW/4HANA
- CompositeProvider
- InfoCubes
- MultiProviders
- InfoSets
BW ODP enabled objects in BW 7.5
- SAP Tables
- Cluster tables
- Pool tables
[SLT~$ALIAS$] SAP Landscape Transformation Replication Server (SLT) CDC
- Analysis Views
- Calculation Views
- Associated Attribute Views
HANA ODP via Netweaver to HANA Information Views

For a good introduction to the topic have a look at the SAP Community Blog or the SAP Wiki: Operational Data Provisioning (ODP) and Delta Queue (ODQ).

Work in progress

This section is still work in progress. We are working hard to provide you with the best possible experience. If you have any questions, please do not hesitate to contact us.

Get yourself prepared

First and foremost you need to install DuckDB and the ERPL extension. DuckDB as various different installation options, please follow the instructions on the DuckDB website to get yourself prepared. For our scenario down below we recommend to install the Python or R bindings as well. You can check if this worked well with the following commands:

import duckdb
cursor = duckdb.connect()
print(cursor.execute('SELECT 42').fetchall())
library("DBI")
con = dbConnect(duckdb::duckdb(), ":memory:")
dbWriteTable(con, "iris", iris)
dbGetQuery(con, 'SELECT "Species", MIN("Sepal.Width") FROM iris GROUP BY "Species"')

For the ERPL extension please follow the instructions in the installation section of our documentation to get yourself prepared.

List available ODP contexts

One first step in the discovery process is to list the available ODP contexts. This can be done with the following command:

SELECT * FROM sap_odp_show_contexts()

which in our ABAP trial system returns the following:

┌────────────────┬──────────────────────────────────┬─────────┐
│ technical_name │               text               │ release │
│    varchar     │             varchar              │ varchar │
├────────────────┼──────────────────────────────────┼─────────┤
│ ABAP_CDS       │ ABAP Core Data Services          │ SBC754  │
│ BW             │ SAP NetWeaver Business Warehouse │ SBW754  │
│ HANA           │ HANA Information Views           │         │
│ SAPI           │ DataSources/Extractors           │ SDE754  │
└────────────────┴──────────────────────────────────┴─────────┘

List available ODP objects

After having found a suitable ODP context, you can list the available ODP objects with the following command:

SELECT * FROM sap_odp_show('ABAP_CDS') WHERE technical_name LIKE '%EPM%' AND SEMANTICS NOT IN ('T');

which in our ABAP trial system returns the following:

┌────────────────────┬─────────────────────────────────────────────┬───────────┬────────────────────────┐
│   technical_name   │                    text                     │ semantics │     semantics_text     │
│      varchar       │                   varchar                   │  varchar  │        varchar         │
├────────────────────┼─────────────────────────────────────────────┼───────────┼────────────────────────┤
│ REPMDEP_VALEND_V$F │ REPM_EMP_DEPT_VALIDEND                      │ F         │ Transaction Data/Facts │
│ REPM_DEPHV2$H      │ Department Hierarchy View                   │ H         │ Hierarchy              │
│ REPM_DEPHV3$H      │ Department Hierarchy                        │ H         │ Hierarchy              │
│ REPM_DEPTD$P       │ Department Details                          │ P         │ Master Data/Attributes │
│ REPM_DEPTHV1$E     │ Department Hierarchy View                   │ E         │ E                      │
│ REPM_DV1$P         │ Department View                             │ P         │ Master Data/Attributes │
│ REPM_DV2$P         │ Department View                             │ P         │ Master Data/Attributes │
│ REPM_DV3$P         │ Department View                             │ P         │ Master Data/Attributes │
│ REPM_EMP$P         │ Employee View                               │ P         │ Master Data/Attributes │
│ REPM_EMPVALDATE$F  │ Employee Validity Dates                     │ F         │ Transaction Data/Facts │
│        ·           │            ·                                │ ·         │           ·            │
│        ·           │            ·                                │ ·         │           ·            │
│        ·           │            ·                                │ ·         │           ·            │
│ SEPM_ISOBISTTS$P   │ EPM Demo: Sales Order Billing Status        │ P         │ Master Data/Attributes │
│ SEPM_ISODLSTTS$P   │ EPM Demo: Sales Order Delivery Status       │ P         │ Master Data/Attributes │
│ SEPM_ISOI$P        │ EPM Demo: Sales Order Item                  │ P         │ Master Data/Attributes │
│ SEPM_ISOISL$P      │ EPM Demo: Sales Order Item Schedule Line    │ P         │ Master Data/Attributes │
│ SEPM_ISOLCSTTS$P   │ EPM Demo: Sales Order Life Cycle Status     │ P         │ Master Data/Attributes │
│ SEPM_ISOOASTTS$P   │ EPM Demo: Sales Order Overall Status        │ P         │ Master Data/Attributes │
│ SEPM_ISOPMETHOD$P  │ EPM Demo: Sales Order Payment Method Values │ P         │ Master Data/Attributes │
│ SEPM_ISOPTERMS$P   │ EPM Demo: Sales Order Payment Terms         │ P         │ Master Data/Attributes │
│ SEPM_ISUPP$P       │ EPM Demo: Business Partner                  │ P         │ Master Data/Attributes │
│ SEPM_IUOM$P        │ EPM Demo: Unit of Measure                   │ P         │ Master Data/Attributes │
├────────────────────┴─────────────────────────────────────────────┴───────────┴────────────────────────┤
│ 79 rows (20 shown)                                                                          4 columns │
└───────────────────────────────────────────────────────────────────────────────────────────────────────┘

There is also the SEARCH parameter that allows you to search for objects by their name or description. It also supports * as a wildcard.

Describe a specific ODP object

After having found a suitable ODP object, you can describe it with the following command. This is really a cental command to understand the structure of the object and to prepare the data extraction. The result is a pretty complex structure:

SELECT * FROM sap_odp_describe('ABAP_CDS', 'REPM_EMP$P')

The result is a columns of the query are the following:

  • technical_name: The technical name of the object
  • semantics: The semantics of the object, e.g. P for master data/attributes or F for transaction data/facts
  • semantics_text: The semantics of the object as text, e.g. Master Data/Attributes or Transaction Data/Facts
  • supports_full: Whether the object supports full extraction
  • supports_delta: Whether the object supports delta extraction
  • supports_real_time: Whether the object supports real-time extraction
  • parent_node: The parent node of the object, this is interested to find related views or queries
  • fields: This is a structure that contains the fields / columns of the object.
  • texts: This structure contains the texts of the fields / columns. This is interesting to understand the meaning of the fields.
  • delta_modes: In this structure you can find the delta modes that are supported by the object.

An example of the scalar columns look the following:

┌────────────────┬───────────┬────────────────────────┬───────────────┬───────────────┬────────────────┬────────────────────┐
│ technical_name │ semantics │     semantics_text     │  parent_node  │ supports_full │ supports_delta │ supports_real_time │
│    varchar     │  varchar  │        varchar         │    varchar    │    boolean    │    boolean     │      boolean       │
├────────────────┼───────────┼────────────────────────┼───────────────┼───────────────┼────────────────┼────────────────────┤
│ REPM_EMP       │ P         │ Master Data/Attributes │ BW-BCT-NW-NWD │ true          │ false          │ false              │
└────────────────┴───────────┴────────────────────────┴───────────────┴───────────────┴────────────────┴────────────────────┘

The operation unnest(fields, recursive:=true) leads to the following output, showing the columns of the object:

┌────────────────┬─────────────────────────────────────────────────┬───────────┬────────┬──────────┬─────────┬───────────┬────────────────┬─────────────────┐
│ technical_name │                      text                       │ abap_type │ length │ decimals │   key   │ mandatory │ delta_relevant │ reference_field │
│    varchar     │                     varchar                     │  varchar  │ int64  │  int64   │ boolean │  boolean  │    boolean     │     varchar     │
├────────────────┼─────────────────────────────────────────────────┼───────────┼────────┼──────────┼─────────┼───────────┼────────────────┼─────────────────┤
│ EMPLOYEEID     │ EPM: Employee ID                                │ CHAR      │     10 │        0 │ true    │ false     │ false          │                 │
│ EMPLOYEEUUID   │ EPM: Generic Node Key                           │ RAW       │     16 │        0 │ false   │ false     │ false          │                 │
│ NAME           │ Employee Name                                   │ CHAR      │     83 │        0 │ false   │ false     │ false          │                 │
│ ODQ_CHANGEMODE │ Change Mode for a Data Record in the Delta      │ CHAR      │      1 │        0 │ false   │ false     │ false          │                 │
│ ODQ_ENTITYCNTR │ Number of Data Units (Data Records for Example) │ DEC       │     19 │        0 │ false   │ false     │ false          │                 │
└────────────────┴─────────────────────────────────────────────────┴───────────┴────────┴──────────┴─────────┴───────────┴────────────────┴─────────────────┘

If you want to get the texts, a similar unnest(texts, recursive:=true) operation can be used. The result is the following, where the empty technical name is the text of the object itself and the other technical names are the texts of the fields / columns of the object:

┌────────────────┬──────────┬─────────────────────┬─────────────┬──────────────────────────────────┐
│ technical_name │ language │     text_short      │ text_medium │            text_long             │
│    varchar     │ varchar  │       varchar       │   varchar   │             varchar              │
├────────────────┼──────────┼─────────────────────┼─────────────┼──────────────────────────────────┤
│                │ E        │                     │             │ Employee View                    │
│                │ D        │                     │             │ Mitarbeitersicht                 │
│ EMPLOYEEID     │ D        │ ID des Mitarbeiters │             │ EPM: Mitarbeiter-ID              │
│ EMPLOYEEUUID   │ D        │ Knotenschlüssel     │             │ EPM: Generischer Knotenschlüssel │
│ NAME           │ D        │                     │             │ Mitarbeitername                  │
│ EMPLOYEEID     │ E        │ Employee ID         │             │ EPM: Employee ID                 │
│ EMPLOYEEUUID   │ E        │ Node Key            │             │ EPM: Generic Node Key            │
│ NAME           │ E        │                     │             │ Employee Name                    │
└────────────────┴──────────┴─────────────────────┴─────────────┴──────────────────────────────────┘

Get a preview of a table

Sometimes it is useful to get a preview of the data that is available in the object. To peek in to the first rows (at max 1000) into the dataset one can use the following command:

SELECT *  FROM sap_odp_preview('ABAP_CDS', 'REPM_EMP$P', MAX_ROWS=10)
Note

Work in progress

Do a full load

Note

Work in progress

Subscribe and do a delta load

Note

Work in progress

Show exsting subscriptions and cursors

One can list the existing subscriptions and cursors with the following command:

SELECT * FROM sap_odp_show_subscriptions(ERPL_ONLY=false)

The ERPL_ONLY parameter can be used to filter the subscriptions and cursors that are created by ERPL. If you set it to false, you will see all subscriptions in your ODP context. The schema of the result looks the following:

┌────────────┬─────────────────┬─────────────────┬─────────────────┐
│ queue_name │ subscriber_type │ subscriber_name │ subscriber_proc │
│  varchar   │     varchar     │     varchar     │     varchar     │
├──────────────────────────────────────────────────────────────────┤
│                              0 rows                              │
└──────────────────────────────────────────────────────────────────┘

Similarly you can use sap_odp_show_cursors to list the existing cursors. A cursor is the pointer to the latest delta package. The following command lists the cursors that are created by ERPL for the subscriber SL:

con.sql("SELECT * FROM sap_odp_show_cursors(ERPL_ONLY=false, SUBSCRIBER_NAME='SL')")

The schema of the result looks the following:

┌────────────┬─────────────────┬───────────────┬───────────────┬───────────┬────────────────────┬──────────────┐
│ queue_name │ subscriber_proc │ subscriber_id │    pointer    │ is_closed │ is_delta_extension │ request_date │
│  varchar   │     varchar     │    varchar    │ decimal(23,9) │  boolean  │      boolean       │  timestamp   │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│                                                    0 rows                                                    │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘