Using Partitioned Outer Join to Fill Gaps in Sparse Data

In this tutorial you learn how to use the new SQL join syntax in the Oracle Database 10g SQL to fill gaps in sparse data.

Approximately 30 minutes

This tutorial covers the following topics:

Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.

Oracle Database 10g Partitioned Outer Join Clause Overview

Data is usually stored in sparse form. That is, if no value exists for a given time, no row exists in the fact table. However, time series calculations can be performed most easily when data is dense along the time dimension. This is because dense data fills a consistent number of rows for each period, which in turn makes it simple to use the analytic windowing functions with physical offsets. Refer to Chapter 21: Data Warehousing Guide for more information.

To overcome the problem of sparsity, you can use a partitioned outer join to fill the gaps in a time series. Such a join extends the conventional outer join syntax by applying the outer join to each logical partition defined in a query. The Oracle database logically partitions the rows in your query based on the expression you specify in the PARTITION BY clause. The result of a partitioned outer join is a UNION of the outer joins of each of the groups in the logically partitioned table with the table on the other side of the join.

Note that you can use this type of join to fill the gaps in any dimension, not just the time dimension. In this tutorial, you will focus on the time dimension because it is the dimension most frequently used as a basis for comparisons.

1.

Perform the Installing Oracle Database 10g on Windows tutorial.

2.

Download and unzip outer_j.zip into your working directory (i.e. c:\wkdir)

The syntax for partitioned outer join extends the ANSI SQL JOIN clause with the phrase PARTITION BY followed by an expression list. The expressions in the list specify the group to which the outer join is applied. Following are the two forms of syntax typically used for partitioned outer join:

SELECT select_expression
FROM   table_reference
  PARTITION BY (expr [, expr ]... )
  RIGHT OUTER JOIN table_reference
SELECT select_expression
FROM   table_reference
  LEFT OUTER JOIN table_reference
  PARTITION BY {expr [,expr ]...)

Note that FULL OUTER JOIN is not supported with a partitioned outer join. Refer to the Oracle Database 10g SQL Reference for further information regarding syntax and restrictions.

A typical situation with a sparse dimension is shown in the following example, which computes the weekly sales and year-to-date sales for the product "Bounce" for weeks 20–30 in 2000 and 2001:

SELECT
SUBSTR(p.Prod_Name,1,15) Product_Name,
t.Calendar_Year Year,
t.Calendar_Week_Number Week,
SUM(Amount_Sold) Sales
FROM Sales s, Times t, Products p
WHERE s.Time_id = t.Time_id AND
s.Prod_id = p.Prod_id AND
p.Prod_name IN ('Bounce') AND
t.Calendar_Year IN (2000,2001) AND
t.Calendar_Week_Number BETWEEN 20 AND 30
GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number;
PRODUCT_NAME          YEAR       WEEK      SALES
--------------- ---------- ---------- ----------
Bounce 2000 20 801
Bounce 2000 21 4062.24
Bounce 2000 22 2043.16
Bounce 2000 23 2731.14
Bounce 2000 24 4419.36
Bounce 2000 27 2297.29
Bounce 2000 28 1443.13
Bounce 2000 29 1927.38
Bounce 2000 30 1927.38
Bounce 2001 20 1483.3
Bounce 2001 21 4184.49
Bounce 2001 22 2609.19
Bounce 2001 23 1416.95
Bounce 2001 24 3149.62
Bounce 2001 25 2645.98
Bounce 2001 27 2125.12
Bounce 2001 29 2467.92
Bounce 2001 30 2620.17
18 rows selected.

In this example, you would expect 22 rows of data (11 weeks each from 2 years) if the data were dense. However you see only 18 rows because weeks 25 and 26 are missing in 2000, and weeks 26 and 28 are missing in 2001.

Gaps in time series make calculations, such as year-over-year comparisons, difficult to compute. When there are no gaps, you can compare data by referring from one row to another row a fixed distance away using the analytic functions LEAD() and LAG(). For example, if you retrieve month-level data and want to refer to data from 12 months ago, it is convenient to access data 12 rows before the current value. You cannot reliably use the LEAD() and LAG() functions when the number of rows per period (or whatever other dimension is used as the divider) is inconsistent.

How can you fill in the gaps in the preceding example with a partitioned outer join?

You can take the sparse data of our query above and do a partitioned outer join with a dense set of time data. In the query shown below, the original query is aliased as v and the data retrieved from the times table is aliased as t. Here you see 22 rows because there are no gaps in the series. The four added rows each have 0 as their sales value set to 0 by using the NVL() function..

SELECT Product_Name, t.Year, t.Week, NVL(Sales,0) dense_sales
FROM
(SELECT
SUBSTR(p.Prod_Name,1,15) Product_Name,
t.Calendar_Year Year,
t.Calendar_Week_Number Week,
SUM(Amount_Sold) Sales
FROM Sales s, Times t, Products p
WHERE s.Time_id = t.Time_id AND
s.Prod_id = p.Prod_id AND
p.Prod_name IN ('Bounce') AND
t.Calendar_Year IN (2000,2001) AND
t.Calendar_Week_Number BETWEEN 20 AND 30
GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number
) v
PARTITION BY (v.Product_Name)
RIGHT OUTER JOIN
(SELECT DISTINCT
Calendar_Week_Number Week,
Calendar_Year Year
FROM Times
WHERE Calendar_Year in (2000, 2001)
AND Calendar_Week_Number BETWEEN 20 AND 30
) t

ON (v.week = t.week AND v.Year = t.Year)
ORDER BY t.year, t.week;
PRODUCT_NAME          YEAR       WEEK DENSE_SALES
--------------- ---------- ---------- -----------
Bounce 2000 20 801
Bounce 2000 21 4062.24
Bounce 2000 22 2043.16
Bounce 2000 23 2731.14
Bounce 2000 24 4419.36
Bounce 2000 25 0
Bounce 2000 26 0
Bounce 2000 27 2297.29
Bounce 2000 28 1443.13
Bounce 2000 29 1927.38
Bounce 2000 30 1927.38
Bounce 2001 20 1483.3
Bounce 2001 21 4184.49
Bounce 2001 22 2609.19
Bounce 2001 23 1416.95
Bounce 2001 24 3149.62
Bounce 2001 25 2645.98
Bounce 2001 26 0
Bounce 2001 27 2125.12
Bounce 2001 28 0
Bounce 2001 29 2467.92
Bounce 2001 30 2620.17
22 rows selected.

Note that in the query above a WHERE condition for weeks between 20 and 30 is placed in the inline view for the time dimension. This step reduces the number of rows handled by the outer join, which saves processing time.

How do you combine this technique with analytic SQL functions to obtain cumulative sales for the desired weeks?

1.

Start a SQL *Plus session. Select Start > Programs > Oracle-OraDB10g_home > Application Development > SQL Plus.

(Note: This tutorial assumes you have an c:\wkdir folder. If you do not, you will need to create one and unzip the contents of outer_j.zip into this folder. While executing the scripts, paths are specified)

Move your mouse over this icon to see the image

2.

Log in as the SH user. Enter SH as the User Name and SH as the Password. Then click OK.

Move your mouse over this icon to see the image

3.

From your SQL * Plus session, execute the following script.

@c:\wkdir\fg.sql

The fg.sql script contains the following:

SELECT Product_Name, t.Year, t.Week, Sales, Weekly_ytd_sales
 FROM
  (SELECT
    SUBSTR(p.Prod_Name,1,15) Product_Name,
    t.Calendar_Year Year,
    t.Calendar_Week_Number Week,
    NVL(SUM(Amount_Sold),0) Sales,
    SUM(SUM(Amount_Sold)) OVER
      (PARTITION BY p.Prod_Name, t.Calendar_Year
      ORDER BY t.Calendar_Week_Number) Weekly_ytd_sales
    FROM Sales s, Times t, Products p
    WHERE s.Time_id = t.Time_id AND
      s.Prod_id = p.Prod_id AND
      p.Prod_name IN ('Bounce') AND
      t.Calendar_Year IN (2000,2001)  AND
      t.Calendar_Week_Number BETWEEN  20 AND 30
    GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number
   )  v
  PARTITION BY  (v.Product_Name)
  RIGHT OUTER JOIN
   (SELECT DISTINCT
      Calendar_Week_Number Week,
      Calendar_Year Year
    FROM Times
      WHERE Calendar_Year in (2000, 2001)
   ) t
   ON (v.week = t.week AND v.Year = t.Year)
WHERE t.Week BETWEEN 20 AND 30
ORDER BY 1, 2, 3;

Move your mouse over this icon to see the image

In this query, the weekly year-to-date sales are calculated alongside the weekly sales. The NULL values that the partitioned outer join inserts in making the time series dense are handled in the usual way: the SUM function treats them as 0s.

 

There are queries in which a partitioned outer join will return rows with NULL values, but you may want those rows to hold the most recent non-NULL value in the series. That is, if you want to have NULLs replaced with the first non-NULL value you see as you scan upward in a column.

Inventory tables, which track quantity of units available for various products, are a common case that needs such output. Inventory tables are sparse: like sales tables, they need to only store a row for a product when there is an event. For a sales table the event is a sale, and for the inventory table, the event is a change in quantity available for a product. If you make the inventory's time dimension dense, you want to see a quantity value for each day. The value to output is the most recent non-NULL value. Note that this differs from the prior example with cumulative sales. In that query, the cumulative sum calculation treats NULLs as 0s, so it presents correct values. That approach cannot work with inventory and similar tables because the value to place in rows with NULLs is not a sum.

Here an example is presented of partitioned outer join with an inventory table. It replaces NULLs with the nearest non-NULL value.

1.

First, create a small inventory table with two products, each product having entries for two days. The "bottle" product has 10 units in stock on April 1 and 8 units on April 6. The "can" product has 15 units in stock on April 1 and 11 units on April 4. Execute the following SQL*Plus script:

@c:\wkdir\ci.sql

The ci.sql script contains the following:

CREATE TABLE inventory (
         time_id DATE,
         product VARCHAR2(10),
         quant NUMBER);
INSERT INTO inventory VALUES
         (TO_DATE('01/04/01', 'DD/MM/YY'), 'bottle', 10);
INSERT INTO inventory VALUES
         (TO_DATE('06/04/01', 'DD/MM/YY'), 'bottle', 8);
INSERT INTO inventory VALUES
         (TO_DATE('01/04/01', 'DD/MM/YY'), 'can', 15);
INSERT INTO inventory VALUES
         (TO_DATE('04/04/01', 'DD/MM/YY'), 'can', 11);

Move your mouse over this icon to see the image

 

2.

Now you will use a partitioned outer join to see the quantity available for each product on each day of the range April 1 through April 7. If you use a partitioned outer join to query this table without considering the rows with NULL values, the results are misleading. Execute the following SQL*Plus script:

@c:\wkdir\nn.sql

The nn.sql script contains the following:

SELECT times.time_id, product, quant
FROM inventory
PARTITION BY (product)
  RIGHT OUTER JOIN times
  ON (times.time_id = inventory.time_id)
WHERE times.time_id BETWEEN TO_DATE('01/04/01', 'DD/MM/YY')
  AND TO_DATE('07/04/01', 'DD/MM/YY')
ORDER BY  2,1;
Move your mouse over this icon to see the image

The results above are not what you wanted—you know that the quantities available for bottle and can in the NULL-value rows were simply the most recent non-NULL value. For example, on April 2 through 5 for bottle, you want to see the quantity 10.

 

3.

To show the desired results, you want to take advantage of a new keyword added to the FIRST_VALUE and LAST_VALUE functions in Oracle Database 10g. You can specify IGNORE NULLS in the argument list of either of these functions and they will return the closest non-NULL value. Execute the following SQL*Plus script:

@c:\wkdir\nn2.sql

The nn2.sql script contains the following:

WITH v1 AS
(SELECT time_id
 FROM times
 WHERE times.time_id BETWEEN
  TO_DATE('01/04/01', 'DD/MM/YY')
  AND TO_DATE('07/04/01', 'DD/MM/YY'))
 SELECT product, time_id, quant quantity,
     LAST_VALUE(quant IGNORE NULLS)
  OVER (PARTITION BY product ORDER BY time_id)
    repeated_quantity
 FROM
  (SELECT product, v1.time_id, quant
   FROM inventory PARTITION BY (product)
     RIGHT OUTER JOIN v1
     ON (v1.time_id = inventory.time_id))
ORDER BY 1, 2;

Move your mouse over this icon to see the image

 

In the next task, you use the outer join feature to compare values across time periods. Specifically, you will calculate a year-over-year sales comparison at the week level. The query will return on the same row, for each product, the year-to-date sales for each week of 2001 with that of 2000.

1.

To improve readability of the query and focus on the partitioned outer join, use a WITH clause to start the query. Execute the following SQL*Plus script:

@c:\wkdir\pp.sql

The pp.sql script contains the following:

WITH v AS
  (SELECT
     p.Prod_Name Product_Name,
     t.Calendar_Year Year,
     t.Calendar_Week_Number Week,
     SUM(Amount_Sold) Sales
   FROM Sales s, Times t, Products p
   WHERE s.Time_id = t.Time_id AND
         s.Prod_id = p.Prod_id AND
         p.Prod_name in ('Y Box') AND
         t.Calendar_Year in (2000,2001) AND
         t.Calendar_Week_Number BETWEEN 30 AND 40
   GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number
  )
SELECT substr(Product_Name,1,12) Prod, 
  Year, 
  Week, 
  Sales, 
  Weekly_ytd_sales,
  Weekly_ytd_sales_prior_year
FROM
  (SELECT --Start of year_over_year sales
     Product_Name, Year, Week, Sales, Weekly_ytd_sales,
     LAG(Weekly_ytd_sales, 1) OVER
    (PARTITION BY Product_Name, Week ORDER BY Year)
     Weekly_ytd_sales_prior_year
   FROM
     (SELECT --Start of dense_sales
        v.Product_Name Product_Name,
        t.Year Year,
        t.Week Week,
        NVL(v.Sales,0) Sales,
        SUM(NVL(v.Sales,0)) OVER
       (PARTITION BY v.Product_Name, t.Year
        ORDER BY t.week) weekly_ytd_sales
      FROM v 
        PARTITION BY (v.Product_Name) 
      RIGHT OUTER JOIN
       (SELECT DISTINCT
          Calendar_Week_Number Week,
          Calendar_Year Year
        FROM Times
        WHERE Calendar_Year IN (2000, 2001)
       ) t
   ON (v.week = t.week AND v.Year = t.Year)
      ) dense_sales
      ) year_over_year_sales
WHERE Year = 2001 AND
      Week BETWEEN 30 AND 40 
ORDER BY 1, 2, 3;

Move your mouse over this icon to see the image

In the FROM clause of the in-line view DENSE_SALES, a partitioned outer join of aggregate view v and time view t is used to fill gaps in the sales data along the time dimension. The output of the partitioned outer join is then processed by the analytic function SUM ... OVER to compute the weekly year-to-date sales (the "weekly_ytd_sales" column). Thus, the view DENSE_SALES computes the year-to-date sales data for each week, including those missing in the aggregate view s.

The in-line view YEAR_OVER_YEAR_SALES then computes the year ago weekly year-to-date sales using the LAG function. The LAG function labeled "weekly_ytd_sales_prior_year" specifies a PARTITION BY clause that pairs rows for the same week of years 2000 and 2001 into a single partition. An offset of 1 is passed to the LAG function to get the weekly year-to-date sales for the prior year.

The outermost query block selects data from YEAR_OVER_YEAR_SALES with the condition yr = 2001, and thus the query returns, for each product, its weekly year-to-date sales in the specified weeks of years 2001 and 2000.

 

Whereas the previous example showed you a way to create comparisons for a single time level, it is even more useful to handle multiple time levels in a single query. For example, you can compare sales versus the prior period at the year, quarter, month, and day levels.

For the next task, you create a query that performs a year-over-year comparison of year-to-date sales for all levels of our time hierarchy.

Several steps are needed to perform this task. The goal is a single query with comparisons at the day, week, month, quarter, and year levels. You will use a materialized view MV_PROD_TIME that holds a hierarchical cube of sales aggregated across TIMES and PRODUCTS. Along with the materialized view, you will create a view on top of it. Also, you create a view of the time dimension to use as an edge of the cube. The time edge is a partition outer joined to the sparse data in the materialized view.

For more information regarding hierarchical cubes, see the chapter titled "SQL for Aggregation in Data Warehouses" in the Data Warehousing Reference Guide.

1.

Create the materialized view. Note that the query is limited to only two products to keep processing time short. Execute the following SQL*Plus script:

@c:\wkdir\cm1.sql

The cm1.sql script contains the following:

CREATE MATERIALIZED VIEW mv_prod_time
REFRESH COMPLETE ON DEMAND
AS
SELECT
  (CASE
   WHEN ((GROUPING(calendar_year)=0 )
     AND (GROUPING(calendar_quarter_desc)=1 ))
   THEN (TO_CHAR(calendar_year) || '_0')
   WHEN ((GROUPING(calendar_quarter_desc)=0 )
     AND (GROUPING(calendar_month_desc)=1 ))
   THEN (TO_CHAR(calendar_quarter_desc) || '_1')
   WHEN ((GROUPING(calendar_month_desc)=0 )
     AND (GROUPING(t.time_id)=1 ))
   THEN (TO_CHAR(calendar_month_desc) || '_2')
   ELSE (TO_CHAR(t.time_id) || '_3') END) Hierarchical_Time,
  calendar_year year,
  calendar_quarter_desc quarter,
  calendar_month_desc month,
  t.time_id day,
  prod_category cat,
  prod_subcategory subcat,
  p.prod_id prod,
  GROUPING_ID(prod_category, prod_subcategory, p.prod_id,
    calendar_year, calendar_quarter_desc,
    calendar_month_desc,t.time_id) gid,
  GROUPING_ID(prod_category, prod_subcategory, p.prod_id)  gid_p,
  GROUPING_ID(calendar_year, calendar_quarter_desc,
    calendar_month_desc, t.time_id) gid_t,
  SUM(amount_sold) s_sold,
  COUNT(amount_sold) c_sold,
  COUNT(*) cnt
FROM SALES s, TIMES t, PRODUCTS p
WHERE s.time_id = t.time_id AND
  p.prod_name in ('Bounce', 'Y Box')   AND
  s.prod_id = p.prod_id
GROUP BY
  ROLLUP(calendar_year, calendar_quarter_desc,
         calendar_month_desc, t.time_id),
  ROLLUP(prod_category, prod_subcategory, p.prod_id);

Move your mouse over this icon to see the image

Because the materialized view is limited to two products, it has just over 2200 rows. Note that the Hierarchical_Time column contains string representations of time from all levels of the time hierarchy. The CASE expression used for the Hierarchical_Time column appends a marker (_0, _1, ...) to each date string to denote the time level of the value. _0 represents the year level, _1 is quarters, _2 is months, and _3 is day. Note that the GROUP BY clause is a concatenated ROLLUP that specifies the roll-up hierarchy for the time and product dimensions. The GROUP BY clause determines the hierarchical cube contents.

 

2.

Create a view CUBE_PROD_TIME with the same definition as the materialized view MV_PROD_TIME. Execute the following SQL*Plus script:

@c:\wkdir\cv1.sql

The cv1.sql script contains the following:

CREATE OR REPLACE VIEW cube_prod_time
  AS
  SELECT
    (CASE
     WHEN ((GROUPING(calendar_year)=0 )
       AND (GROUPING(calendar_quarter_desc)=1 ))
     THEN (TO_CHAR(calendar_year) || '_0')
     WHEN ((GROUPING(calendar_quarter_desc)=0 )
       AND (GROUPING(calendar_month_desc)=1 ))
     THEN (TO_CHAR(calendar_quarter_desc) || '_1')
     WHEN ((GROUPING(calendar_month_desc)=0 )
       AND (GROUPING(t.time_id)=1 ))
     THEN (TO_CHAR(calendar_month_desc) || '_2')
     ELSE (TO_CHAR(t.time_id) || '_3') END) Hierarchical_Time,
     calendar_year year,
     calendar_quarter_desc quarter,
     calendar_month_desc month,
     t.time_id day,
     prod_category cat,
     prod_subcategory subcat,
     p.prod_id prod,
  GROUPING_ID(prod_category, prod_subcategory, p.prod_id,
    calendar_year, calendar_quarter_desc, calendar_month_desc,
    t.time_id) gid,
  GROUPING_ID(prod_category, prod_subcategory, p.prod_id) gid_p, 
  GROUPING_ID(calendar_year, calendar_quarter_desc,
    calendar_month_desc, t.time_id) gid_t,
    SUM(amount_sold) s_sold,
    COUNT(amount_sold) c_sold,
    COUNT(*) cnt
  FROM SALES s, TIMES t, PRODUCTS p
  WHERE s.time_id = t.time_id AND
    p.prod_name IN ('Bounce', 'Y Box') AND
    s.prod_id = p.prod_id
  GROUP BY 
  ROLLUP(calendar_year, calendar_quarter_desc, 
         calendar_month_desc, t.time_id),
  ROLLUP(prod_category, prod_subcategory, p.prod_id);

Move your mouse over this icon to see the image

 

3.

You create a view EDGE_TIME which is a complete set of date values. EDGE_TIME is the source for filling time gaps with a partitioned outer join. The HIERARCHICAL_TIME column in EDGE_TIME will be used in a partitioned join with the HIERARCHICAL_TIME column in the CUBE_PROD_TIME view. Execute the following SQL*Plus script:

@c:\wkdir\cv2.sql

The cv2.sql script contains the following:

CREATE OR REPLACE VIEW edge_time
AS
SELECT
  (CASE
   WHEN ((GROUPING(calendar_year)=0 )
     AND (GROUPING(calendar_quarter_desc)=1 ))
   THEN (TO_CHAR(calendar_year) || '_0')
   WHEN ((GROUPING(calendar_quarter_desc)=0 )
     AND (GROUPING(calendar_month_desc)=1 ))
   THEN (TO_CHAR(calendar_quarter_desc) || '_1')
   WHEN ((GROUPING(calendar_month_desc)=0 )
     AND (GROUPING(time_id)=1 ))
   THEN (TO_CHAR(calendar_month_desc) || '_2')
   ELSE (TO_CHAR(time_id) || '_3') END) Hierarchical_Time,
   calendar_year yr,
   calendar_quarter_number qtr_num,
   calendar_quarter_desc qtr,
   calendar_month_number mon_num,
   calendar_month_desc mon,
   time_id - TRUNC(time_id, 'YEAR') + 1 day_num,
   time_id day,
   GROUPING_ID(calendar_year, calendar_quarter_desc,
     calendar_month_desc, time_id) gid_t
FROM TIMES
GROUP BY ROLLUP
(calendar_year,
(calendar_quarter_desc, calendar_quarter_number),
(calendar_month_desc, calendar_month_number), time_id);

Move your mouse over this icon to see the image

 

4.

You now have the required elements for the comparison query. You can obtain period-to-period comparison calculations at all time levels. It requires applying analytic functions to a hierarchical cube with dense data along the time dimension. Some of the calculations you can achieve for each time level are:

  • Sum of sales for prior period at all levels of time
  • Variance in sales over prior period
  • Sum of sales in the same period a year ago at all levels of time
  • Variance in sales over the same period last year

The following example performs all four of these calculations. It uses a partitioned outer join of the CUBE_PROD_TIME and EDGE_TIME views to create an in-line view of dense data called DENSE_CUBE_PROD_TIME. The query then uses the LAG function in the same way as the prior single-level example. The outer WHERE clause specifies time at three levels: the days of August 2001, the entire month, and the entire third quarter of 2001. Note that the last two rows of the results contain the month-level and quarter-level aggregations.

Execute the following SQL*Plus script:

@c:\wkdir\mt.sql

The mt.sql script contains the following:

SELECT
  substr(prod,1,4) prod, substr(Hierarchical_Time,1,12) ht, 
  sales,
  sales_prior_period,
  sales - sales_prior_period variance_prior_period,
  sales_same_period_prior_year,
  sales - sales_same_period_prior_year variance_same_period_p_year
FROM
  (SELECT cat, subcat, prod, gid_p, gid_t, Hierarchical_Time,
     yr, qtr, mon, day, sales,
     LAG(sales, 1) OVER (PARTITION BY gid_p, cat, subcat, prod,
     gid_t ORDER BY yr, qtr, mon, day) 
     sales_prior_period,
     LAG(sales, 1) OVER (PARTITION BY gid_p, cat, subcat, prod,
     gid_t, qtr_num, mon_num, day_num ORDER BY yr)
     sales_same_period_prior_year
   FROM
    (SELECT c.gid, c.cat, c.subcat, c.prod, c.gid_p, 
     t.gid_t, t.yr, t.qtr, t.qtr_num, t.mon, t.mon_num,
     t.day, t.day_num, t.Hierarchical_Time, NVL(s_sold,0) sales
     FROM cube_prod_time c 
     PARTITION BY (gid_p, cat, subcat, prod)
     RIGHT OUTER JOIN edge_time t
    ON ( c.gid_t = t.gid_t AND c.Hierarchical_Time = t.Hierarchical_Time)
    ) dense_cube_prod_time 
    )          -- side by side current,prior and prior year sales
WHERE prod IN (139) AND gid_p=0 AND -- 1 product and product level data
  ( (mon IN ('2001-08' ) AND gid_t IN (0, 1) ) OR -- day and month data
  ( qtr IN ('2001-03' ) AND gid_t IN (3) ) ) -- quarter level data 
ORDER BY day;

Move your mouse over this icon to see the image

Move your mouse over this icon to see the image

In many OLAP tasks, it is helpful to define custom members in a dimension. For instance, you might define a specialized time period for analyses. You can use a partitioned outer join to temporarily add a member to a dimension. Note that the new SQL MODEL clause introduced in Oracle Database 10g is suitable for creating more complex scenarios involving new members in dimensions. See the tutorial titled "SQL for Modeling" in the Data Warehousing Reference Guide for more information on this topic.

In this exercise, you define a new member for the TIME dimension. You create a 13th member of the month level in the TIME dimension. This 13th month is defined as the summation of the sales for each product in the first month of each quarter of year 2001. You build this solution using the views and tables created in the prior example.

1.

Create a view with the new member added to the appropriate dimension. The view uses a UNION ALL operation to add the new member. To query using the custom member, use a CASE expression and a partitioned outer join. Execute the following SQL*Plus script:

@c:\wkdir\cv3.sql

The cv3.sql script contains the following:

CREATE OR REPLACE VIEW time_c  AS
   (SELECT *
      FROM edge_time
   UNION ALL
   SELECT '2001-13_2', 2001, 5, '2001-05', 13, '2001-13', null, null,
     8 -- <gid_of_mon>
     FROM DUAL);

In the statement shown, the TIME_C view is defined by performing a UNION ALL of the EDGE_TIME view (defined in the prior example) and the user-defined 13th month. UNION ALL specifies the attributes for a 13th month member by doing a SELECT from the DUAL table. Note that the grouping id, column gid_t, is set to 8, and the quarter number is set to 5.

Move your mouse over this icon to see the image

 

2.

The in-line view of the query shown below performs a partitioned outer join of CUBE_PROD_TIME with TIME_C. This step creates sales data for the 13th month at each level of product aggregation. In the main query, the analytic function SUM is used with a CASE expression to compute the 13th month, which is defined as the summation of the first month's sales of each quarter. Execute the following SQL*Plus script:

@c:\wkdir\cv4.sql

The cv4.sql script contains the following:

SELECT * from
(
SELECT substr(cat,1,12) cat, substr(subcat,1,12)  subcat,
       substr(prod,1,9) prod,  mon, mon_num,
       SUM(CASE WHEN mon_num IN (1, 4, 7, 10)
            THEN s_sold
            ELSE NULL
            END)
       OVER (PARTITION BY gid_p, prod, subcat, cat, yr) sales_month_13
FROM
   (SELECT c.gid, c.prod, c.subcat, c.cat,  gid_p,
        t.gid_t , t.day, t.mon, t.mon_num,
        t.qtr, t.yr, NVL(s_sold,0) s_sold
    FROM cube_prod_time c
    PARTITION BY   (gid_p, prod, subcat, cat)
    RIGHT OUTER JOIN time_c  t ON
      (c.gid_t = t.gid_t AND c.Hierarchical_Time = t.Hierarchical_Time)
   )
)
WHERE  mon_num=13;

Move your mouse over this icon to see the image

The SUM function used in generating these results had a CASE statement to limit the data to months 1, 4, 7, and 10 within each year. Because of the tiny data set, with just two products, the roll-up values of the results are necessarily repetitions of lower-level aggregations. For a more realistic set of roll-up values, you can include more products from the "Game Console" and "Y Box Games" subcategories in the underlying materialized view.

 

In this tutorial, you learned how to:

Fill gaps in data using analytic SQL functions
Replace NULLS with the nearest non-NULL value
Perform a period-to-period comparison
Create a custom member in a dimension

Back to Topic List