HexaCluster Logo

Partitioning by Reference – Oracle vs PostgreSQL

Sometimes, when you are working on a migration to PostgreSQL, you can encounter features that do not exist in PostgreSQL. This especially happens when we do not have any extension to emulate the feature you are looking at. At HexaCluster, we often face these kind of situations. When it is possible to create an extension to provide the feature in PostgreSQL, we create it and everyone can benefit of this implementation. For example some extensions we have developed are:

However, this is not always possible for every feature. In this article, we are discussing one of such cases, related to the support for partitioning by reference when you are migrating from Oracle to PostgreSQL. This kind of partitioning is available in Oracle since version 11g, in addition to the common partitioning types that are also available in PostgreSQL such as RANGE, LIST and HASH.

Partitioning by reference - Oracle vs PostgreSQL

What is partitioning by reference?

Reference partitioning allows you to create a partition based on a column that is not in the table being partitioned, but rather is a foreign key reference to a different table. This can be useful for partitioning related tables in related ways, even if they do not share the same columns.

Let us see the following example:

CREATE TABLE orders (
        order_id           NUMBER(12),
        order_date         DATE,
        customer_id        NUMBER(6),
        order_status       NUMBER(2),
        order_total        NUMBER(8,2),
        CONSTRAINT orders_pk PRIMARY KEY(order_id)
)
PARTITION BY RANGE(order_date) (
        PARTITION Q1_2024 VALUES LESS THAN (TO_DATE('01-JAN-2024','DD-MON-YYYY')),
        PARTITION Q2_2024 VALUES LESS THAN (TO_DATE('01-FEB-2024','DD-MON-YYYY')),
        PARTITION Q3_2024 VALUES LESS THAN (TO_DATE('01-MAR-2024','DD-MON-YYYY')),
        PARTITION Q4_2024 VALUES LESS THAN (TO_DATE('01-APR-2024','DD-MON-YYYY'))
);

CREATE TABLE order_items (
        order_id           NUMBER(12) NOT NULL,
        line_item_id       NUMBER(3)  NOT NULL,
        product_id         NUMBER(6)  NOT NULL,
        unit_price         NUMBER(8,2),
        quantity           NUMBER(8),
        CONSTRAINT order_items_fk FOREIGN KEY(order_id) REFERENCES orders(order_id)
)
PARTITION BY REFERENCE(order_items_fk);



In this example, the ORDERS table includes column : ORDER_DATE to track the date on which an order was issued. The line items for orders may be stored in the ORDERS_ITEMS table too, but from a normalization perspective, it would not be appropriate to store the ORDER_DATE at the line-item level. However, if the ORDER_DATE column is not in the ORDER_ITEMS table, how can you partition ORDER_ITEMS by the order date ? This is where the Partitioning by Reference comes into the picture.

The ORDERS table is range-partitioned by the ORDER_DATE column on a monthly basis in this example. The ORDER_ITEMS table does not have a DATE datatype column but it has a foreign key on the ORDER_NUMBER column back to the ORDERS table.

By using the partitioning by reference method, the ORDER_ITEMS table will be partitioned by the same column that was used to partition the ORDERS table, even if that column (ORDER_DATE) is not in the ORDER_ITEMS table. The two tables will be partitioned in sync.


PostgreSQL solutions for partitioning by reference

Partitioning by reference doesn’t exist in PostgreSQL. Now, what are the possible workarounds ?

In this case there are 3 solutions available for PostgreSQL:

  • Do not partition the child table
  • Duplicating the partition key column from the parent table into the child table and apply the same partitioning.
  • Hash partitioning of the child table on the column hosting the foreign key.

Let us now discuss about each of these methods and understand the pros and cons.

Free Oracle to PostgreSQL Migration Assessment
Interested in migrating from Oracle or SQL Server to PostgreSQL?
Sign up for our Free Migration Assessment!

- No cost evaluation
- Smooth transition to PostgreSQL.
Please enable JavaScript in your browser to complete this form.
Type of Database Migration

Method 1 : No partitioning

When you are migrating from Oracle to PostgreSQL using Ora2Pg, the default behavior of Ora2Pg during table and partition export is : Not partitioning the child table. This can be considered as one of the approaches but not an always recommended approach.

While migrating using Ora2Pg, the configuration file setting must be "PARTITION_BY_REFERENCE none".

The resulting code is the following:

CREATE TABLE orders (
        order_id bigint NOT NULL,
        order_date timestamp(0),
        customer_id integer,
        order_status smallint,
        order_total double precision
) PARTITION BY RANGE (order_date) ;
ALTER TABLE orders ADD PRIMARY KEY (order_id,order_date);

CREATE TABLE order_items (
        order_id bigint NOT NULL,
        line_item_id smallint NOT NULL,
        product_id integer NOT NULL,
        unit_price double precision,
        quantity integer
);
ALTER TABLE order_items ADD CONSTRAINT order_items_fk FOREIGN KEY (order_id) REFERENCES orders(order_id);



The problem with this solution is the loss of performance especially when the number of records in ORDER_ITEMS is very high. In this case, we will lose the true benefits of Partitioning in PostgreSQL such as simplified archival and retention techniques, and other partitioning benefits.

Method 2 : Duplication of column into the table to be partitioned.

The second solution consists of duplicating the column of the referenced table into the table to be partitioned by reference. This column can be used as the partition key by applying the same type of partitioning as the referenced table.

The resulting code for Parent Tables appears as follows:

CREATE TABLE orders (
        order_id bigint NOT NULL,
        order_date timestamp(0),
        customer_id integer,
        order_status smallint,
        order_total double precision
) PARTITION BY RANGE (order_date) ;

ALTER TABLE orders ADD PRIMARY KEY (order_id,order_date);

CREATE TABLE order_items (
        order_id bigint NOT NULL,
        line_item_id smallint NOT NULL,
        product_id integer NOT NULL,
        unit_price double precision,
        quantity integer,
        order_date timestamp(0) NOT NULL -- *** column duplicated ***
) PARTITION BY RANGE (order_date) ;

ALTER TABLE order_items ADD CONSTRAINT order_items_fk FOREIGN KEY (order_id, order_date) REFERENCES orders(order_id, order_date);



Now, to create the Partitions for the Parent tables considered in our example, following statements can be used:

CREATE TABLE orders_part1 PARTITION OF orders
FOR VALUES FROM (MINVALUE) TO (' 2023-01-01 00:00:00');
CREATE TABLE orders_part2 PARTITION OF orders
FOR VALUES FROM (' 2023-01-01 00:00:00') TO (' 2023-02-01 00:00:00');
CREATE TABLE orders_part3 PARTITION OF orders
FOR VALUES FROM (' 2023-02-01 00:00:00') TO (' 2023-03-01 00:00:00');
CREATE TABLE orders_part4 PARTITION OF orders
FOR VALUES FROM (' 2023-03-01 00:00:00') TO (' 2023-04-01 00:00:00');

CREATE TABLE order_items_part1 PARTITION OF order_items
FOR VALUES FROM (MINVALUE) TO (' 2023-01-01 00:00:00');
CREATE TABLE order_items_part2 PARTITION OF order_items
FOR VALUES FROM (' 2023-01-01 00:00:00') TO (' 2023-02-01 00:00:00');
CREATE TABLE order_items_part3 PARTITION OF order_items
FOR VALUES FROM (' 2023-02-01 00:00:00') TO (' 2023-03-01 00:00:00');
CREATE TABLE order_items_part4 PARTITION OF order_items
FOR VALUES FROM (' 2023-03-01 00:00:00') TO (' 2023-04-01 00:00:00');



This is what could closely correspond to the partitioning carried out under Oracle and is the most efficient for performance.

On the other hand, more disk space is used due to the duplication of the partition key. Note that any addition or deletion of a "Partition" on the referenced table must also be applied to the table thus partitioned, there is no automation available to achieve the same.

It will also be necessary to apply a filter on orders.order_date = order_items.order_date so that the partition key is used on the child table. Otherwise all partitions of the ORDER_ITEMS table will be affected and the queries might start showing a poor performance. So it requires modifying the queries to make sure to touch the appropriate partitions only.

test=# EXPLAIN (COSTS off) SELECT a.*,b.* FROM orders a 
JOIN order_items b 
ON (a.order_id = b.order_id AND a.order_date = b.order_date)
WHERE a.order_date = '2023-01-15 00:00:00';
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Hash Join
   Hash Cond: (b.order_id = a.order_id)
   ->  Seq Scan on order_items_part2 b
         Filter: (order_date = '2023-01-15 00:00:00'::timestamp without time zone)
   ->  Hash
         ->  Bitmap Heap Scan on orders_part2 a
               Recheck Cond: (order_date = '2023-01-15 00:00:00'::timestamp without time zone)
               ->  Bitmap Index Scan on orders_part2_pkey
                     Index Cond: (order_date = '2023-01-15 00:00:00'::timestamp without time zone)
(9 lines)



While migrating using Ora2Pg to migrate your Oracle databases to PostgreSQL, the configuration file setting must be "PARTITION_BY_REFERENCE duplicate", to let Ora2Pg apply this solution automatically.

Method 3 : Hash partitioning

The third solution consists of creating a HASH type partitioning on the table to distribute the records in a certain number of partitions and thus limit the effect on performance loss of the first solution.

Considering the above example of ORDER_ITEMS table, we shall create the child table using following syntax for Hash Partitioning :

CREATE TABLE order_items (
        order_id bigint NOT NULL,
        line_item_id smallint NOT NULL,
        product_id integer NOT NULL,
        unit_price double precision,
        quantity integer
) PARTITION BY HASH (order_id) ;



Now, the partitions for this child table can be created using following syntax :

CREATE TABLE order_items_part1 PARTITION OF order_items
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE order_items_part2 PARTITION OF order_items
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE order_items_part3 PARTITION OF order_items
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE order_items_part4 PARTITION OF order_items
FOR VALUES WITH (MODULUS 4, REMAINDER 3);



This method is not very useful because:

  1. We lose the foreign key because we don’t have the order_date column in the child table and the foreign key must refer to the two columns of the parent table. A primary key for a Partitioned table is a combination of the original primary key and the partition key.
  2. We need to use the child table hash partition key in the filter too.
  3. Increase in the number of HASH partitions leads to a broader distribution of the data.
test=# EXPLAIN (COSTS off) SELECT a.*,b.* 
FROM orders a 
JOIN order_items b 
ON (a.order_id = b.order_id) 
WHERE a.order_date = '2023-01-15 00:00:00' AND a.order_id=150;
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Nested Loop
   ->  Index Scan using orders_part2_pkey on orders_part2 a
         Index Cond: ((order_id = 150) AND (order_date = '2023-01-15 00:00:00'::timestamp without time zone))
   ->  Seq Scan on order_items_part3 b
         Filter: (order_id = 150)
(5 lines)


 Conclusion

Our team at HexaCluster, is happy that you will enjoy this migration facility that no other migration tool can offer. You could save your time on migrating Oracle partition by reference using Ora2Pg, as version 24.2 will do the job for you based on the workaround you choose using the configuration settings discussed above. Remember to use the configuration directive PARTITION_BY_REFERENCE in the file : ora2pg.conf.

Seeking specialized PostgreSQL support or Database Migrations assistance ? Get experts advice for effective data management, query tuning, database migrations and database optimization. Click here for personalized expertise to elevate your database performance and meet your unique business needs.

Subscribe to our Newsletters and Stay tuned for more interesting topics.

 Contact Us Today!

Please enable JavaScript in your browser to complete this form.
Machine Learning Services
PostgreSQL Consulting Services
Migration Services

Author

  • Gilles Darold

    Gilles Darold is the CTO of HexaCluster. Gilles is one of the Major PostgreSQL Contributors and the creator of Ora2Pg, pgBadger, and many more popular PostgreSQL tools and extensions. His leadership has enabled HexaCluster in contributing to 50 plus popular PostgreSQL extensions and also create multiple PostgreSQL tools and extensions. Gilles is an expert in all the popular programming languages and is always passionate about contributing to PostgreSQL.

Add A Comment

We're glad you have chosen to leave a comment. Please keep in mind that all comments are moderated according to our privacy policy, and all links are nofollow. Do NOT use keywords in the name field. Let's have a personal and meaningful conversation.

Our PostgreSQL Services

Our Machine Learning Services

Share article

HexaCluster Logo

Get in Touch

© 2023 Company. All Rights Reserved. Privacy Policy.