Nov 17 2010

Product aggregate function

Category: SQLRadoslav Golian @ 4:52 am

Few months ago I needed an product aggregate function. It’s strange that Oracle does not have such functionality. I think it’s pretty trivial.
Here is it is:

CREATE OR REPLACE TYPE product_impl AS OBJECT
(
  product NUMBER,

  STATIC FUNCTION ODCIAggregateInitialize(ctx IN OUT product_impl) RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateIterate(SELF  IN OUT product_impl,
                                       VALUE IN NUMBER) RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateMerge(SELF IN OUT product_impl,
                                     ctx2 IN product_impl) RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateTerminate(SELF        IN OUT product_impl,
                                         returnvalue OUT NUMBER,
                                         flags       IN NUMBER) RETURN NUMBER
)
/

CREATE OR REPLACE TYPE BODY product_impl IS
  STATIC FUNCTION ODCIAggregateInitialize(ctx IN OUT product_impl) RETURN NUMBER IS
  BEGIN
    ctx := product_impl(1);
    RETURN ODCIConst.Success;
  END ODCIAggregateInitialize;

  MEMBER FUNCTION ODCIAggregateIterate(SELF  IN OUT product_impl,
                                       VALUE IN NUMBER) RETURN NUMBER IS
  BEGIN
    IF VALUE IS NOT NULL THEN
      SELF.product := SELF.product * VALUE;
    END IF;
    RETURN ODCIConst.Success;
  END ODCIAggregateIterate;

  MEMBER FUNCTION ODCIAggregateMerge(SELF IN OUT product_impl,
                                     ctx2 IN product_impl) RETURN NUMBER IS
  BEGIN
    SELF.product := SELF.product * ctx2.product;
    RETURN ODCIConst.Success;
  END ODCIAggregateMerge;

  MEMBER FUNCTION ODCIAggregateTerminate(SELF        IN OUT product_impl,
                                         returnvalue OUT NUMBER,
                                         flags       IN NUMBER) RETURN NUMBER IS
  BEGIN
    returnvalue := SELF.product;
    RETURN ODCIConst.Success;
  END ODCIAggregateTerminate;
END;
/

CREATE OR REPLACE FUNCTION product(x IN NUMBER) RETURN NUMBER
PARALLEL_ENABLE
AGGREGATE USING product_impl;
/

DROP TABLE prod_test;

CREATE TABLE prod_test(val number);

INSERT INTO prod_test VALUES (1);

INSERT INTO prod_test VALUES (2); 

INSERT INTO prod_test VALUES (3); 

INSERT INTO prod_test VALUES (4);

COMMIT;

SELECT product(val) FROM prod_test;

PRODUCT(VAL)
------------
          24

I have never mention it, but a user-defined aggregate function may be used also as an analytic function. This is an example for n! computation:

SELECT
  level,
  product(level) over(order by level) n_factorial
FROM dual
CONNECT BY LEVEL < 100;

     LEVEL N_FACTORIAL
---------- -----------
         1           1
         2           2
         3           6
         4          24
         5         120
         6         720
         7        5040
         8       40320
         9      362880
        10     3628800

There is another way how to compute a product without any additional aggregate function. This is possible by using a 2 simple logarithm rules. Let’s use natural logarithms:

  1. ln(x*y) = ln(x)+ln(y)
  2. x = exp(ln(x))

So to compute product we need only the SUM aggregate/analytic function LN function and EXP function:

x1*x2*…*xn =(rule 2) exp(ln(x1*x2*…*xn)) = (rule 1) exp(ln(x1)+ln(x2)+…+ln(xn))

SELECT
  n,
  exp(n_log_sum) n_factorial
FROM (SELECT
            LEVEL n,
            SUM(ln(LEVEL)) over(ORDER BY LEVEL) n_log_sum
          FROM dual
          CONNECT BY LEVEL <= 10);

         N N_FACTORIAL
---------- -----------
         1           1
         2           2
         3           6
         4          24
         5         120
         6         720
         7        5040
         8       40320
         9      362880
        10     3628800

A word of caution here: You should test this logarithmic approach! I didn’t do it. It may be less precise, because you are computing natural logarithms and also you are adding this real numbers together. And also you should test the performance of this.
Maybe I’ll do it in some of my next post.

UPDATE [13.1.2011]
Of course, logarithm is not defined for non-positive values, so see this article how to implement product which is aware of this.

Tags: ,


Nov 17 2010

Bitcount aggregate function

Category: Oracle,SQLRadoslav Golian @ 2:26 am

Finally, a blog post after ages :) . Recently I received a new feedback on my two years old article on bitwise or aggregate function, where a visitor is looking for a bitcount aggregate function. This was a nice exercise for me. After a little of googling I have found a website with plenty of bitwise operations implementations. I’ve chosen an easy Keringhan’s approach. Here is the function:


CREATE OR REPLACE TYPE bitcount_impl AS OBJECT
(
  bitcount NUMBER,

  STATIC FUNCTION ODCIAggregateInitialize(ctx IN OUT bitcount_impl) RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateIterate(SELF  IN OUT bitcount_impl,
                                       VALUE IN NUMBER) RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateMerge(SELF IN OUT bitcount_impl,
                                     ctx2 IN bitcount_impl) RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateTerminate(SELF        IN OUT bitcount_impl,
                                         returnvalue OUT NUMBER,
                                         flags       IN NUMBER) RETURN NUMBER
)
/

CREATE OR REPLACE TYPE BODY bitcount_impl IS
  STATIC FUNCTION ODCIAggregateInitialize(ctx IN OUT bitcount_impl) RETURN NUMBER IS
  BEGIN
    ctx := bitcount_impl(0);
    RETURN ODCIConst.Success;
  END ODCIAggregateInitialize;

  MEMBER FUNCTION ODCIAggregateIterate(SELF  IN OUT bitcount_impl,
                                       VALUE IN NUMBER) RETURN NUMBER IS
    val number := value;
  BEGIN
    WHILE val > 0 LOOP
      val := bitand(val, val-1);
      SELF.bitcount := SELF.bitcount + 1;
    END LOOP;
    RETURN ODCIConst.Success;
  END ODCIAggregateIterate;

  MEMBER FUNCTION ODCIAggregateMerge(SELF IN OUT bitcount_impl,
                                     ctx2 IN bitcount_impl) RETURN NUMBER IS
  BEGIN
    SELF.bitcount := SELF.bitcount + ctx2.bitcount;
    RETURN ODCIConst.Success;
  END ODCIAggregateMerge;

  MEMBER FUNCTION ODCIAggregateTerminate(SELF        IN OUT bitcount_impl,
                                         returnvalue OUT NUMBER,
                                         flags       IN NUMBER) RETURN NUMBER IS
  BEGIN
    returnvalue := SELF.bitcount;
    RETURN ODCIConst.Success;
  END ODCIAggregateTerminate;
END;
/

CREATE OR REPLACE FUNCTION bitcountagg(x IN NUMBER) RETURN NUMBER
PARALLEL_ENABLE
AGGREGATE USING bitcount_impl;
/

DROP TABLE bitcount_test;

CREATE TABLE bitcount_test(val number);

INSERT INTO bitcount_test VALUES (5);  -- 0101

INSERT INTO bitcount_test VALUES (1);  -- 0001

INSERT INTO bitcount_test VALUES (9);  -- 1001

INSERT INTO bitcount_test VALUES (12); -- 1100

COMMIT;

SELECT bitcountagg(val) FROM bitcount_test;

BITCOUNTAGG(VAL)
----------------
               7

Tags: , ,


Jan 01 2009

Parsing CSV string

Category: PL/SQL,SQL,TuningRadoslav Golian @ 10:17 pm

Parsing a CSV string is something very trivial, but often many people choose an inefficient approach usually based on this pseudo-code:

loop
  if the CSV list is empty then break
  copy the first value to some variable
  remove the first value from the CSV list
end loop

There’s lots of unnecessary work hidden in the last step of this algorithm. Removing of the first value from the list implicates copying of the entire tail of the list. This could be very inefficient especially when we are parsing large CSV list. Over and over again, I saw developers take this approach not only in PL/SQL, but also in the other languages. Even Tom did it :) . I have to confess, I did it too, many many years ago :) .

Correct algorithm should be based on moving a pointer (offset) along the list, it should look like this:

set the offset to the start of the CSV list
loop
  find the next comma starting the search at the actual offset
  if no comma is found then break
  copy the value between the offset and the comma position to some variable
  update offset to the value of (the comma position + 1)
end loop;

As you can see, there’s no unnecessary copying, just moving the offset.

Let’s test these two implementations with DBMS_PROFILER package. If you don’t have DBMS_PROFILER package installed on your database, then you can take a look at this article.

Firstly, we’ll create a PL/SQL procedure based on the first algorithm:

CREATE OR REPLACE PROCEDURE parse_csv_wrong(i_csv_list IN VARCHAR2) IS
  l_comma_pos  PLS_INTEGER;
  l_csv_list   VARCHAR2(32767);
  l_dummy_num  NUMBER;
BEGIN
  l_csv_list := i_csv_list || ',';

  LOOP
    EXIT WHEN l_csv_list IS NULL;

    l_comma_pos := instr(l_csv_list, ',');
    l_dummy_num := to_number(substr(l_csv_list, 1, l_comma_pos - 1));

    l_csv_list := substr(l_csv_list, l_comma_pos + 1);
  END LOOP;
END parse_csv_wrong;
/

Line 6: We can add a sentinel comma to simplify code.

Line 9: Exit when the list is empty.

Line 11: Find the first comma.

Line 12: Extract the first value.

Line 14: Remove the first value.

Secondly, we’ll create a PL/SQL procedure that does not use copying, but advancing the offset:

CREATE OR REPLACE PROCEDURE parse_csv_right(i_csv_list IN VARCHAR2) IS
 l_offset     PLS_INTEGER;
 l_comma_pos  PLS_INTEGER;
 l_csv_list   VARCHAR2(32767);
 l_dummy_num  NUMBER;
BEGIN
  l_csv_list := i_csv_list || ',';
  l_offset := 1;

  LOOP
    l_comma_pos := instr(l_csv_list, ',', l_offset);
    EXIT WHEN l_comma_pos = 0;

    l_dummy_num := to_number(substr(l_csv_list, l_offset, l_comma_pos - l_offset));
    l_offset := l_comma_pos + 1;
  END LOOP;
END parse_csv_right;
/

Line 7: We can add a sentinel comma to simplify code.

Line 8: Set offset to the start of the list.

Line 11: Find the next comma, starting the search from current offset.

Line 12: Exit when comma is not found.

Line 14: Extract the current value.

Line 15: Advance offset.

At the end, we’ll create an anonymous PL/SQL block, which populates CSV list with 6002 values, starts profiler and calls both procedures passing created CSV list as a value for the input parameter.

DECLARE
  l_csv_list   VARCHAR2(32767);
BEGIN
  l_csv_list := '';
  for i in 1000..7000 loop
    l_csv_list := l_csv_list || i || ',';
  end loop;
  l_csv_list := l_csv_list || 7001;

  dbms_profiler.start_profiler(run_comment=>'csv parsing test');

  parse_csv_wrong(l_csv_list);
  parse_csv_right(l_csv_list);

  dbms_profiler.stop_profiler;
END;
/

I ran the test 5 times and I used this select to identify runids.

SELECT runid,
           to_char(run_date, 'DD.MM.YYYY HH24:MI:SS') run_date,
           run_comment,
           run_total_time
FROM    plsql_profiler_runs
ORDER BY
           runid;

Now, let’s take a look at the results:

SELECT u.runid,
       u.unit_type,
       u.unit_name,
       sum(d.total_time) / 1000 microsec
FROM   plsql_profiler_units u
       INNER JOIN plsql_profiler_data d ON (u.runid = d.runid AND u.unit_number = d.unit_number)
WHERE  u.runid between 53 and 57
AND    u.unit_name LIKE 'PARSE_CSV%'
GROUP BY
       u.runid,
       u.unit_type,
       u.unit_name
ORDER BY
       u.runid,
       u.unit_name
 RUNID UNIT_TYPE            UNIT_NAME                          MICROSEC
------ -------------------- -------------------------------- ----------
    53 PROCEDURE            PARSE_CSV_RIGHT                       35790
    53 PROCEDURE            PARSE_CSV_WRONG                      212523
    54 PROCEDURE            PARSE_CSV_RIGHT                       36408
    54 PROCEDURE            PARSE_CSV_WRONG                      214583
    55 PROCEDURE            PARSE_CSV_RIGHT                       35547
    55 PROCEDURE            PARSE_CSV_WRONG                      213146
    56 PROCEDURE            PARSE_CSV_RIGHT                       35832
    56 PROCEDURE            PARSE_CSV_WRONG                      214468
    57 PROCEDURE            PARSE_CSV_RIGHT                       35585
    57 PROCEDURE            PARSE_CSV_WRONG                      213242

10 rows selected.

What a difference! Procedure PARSE_CSV_RIGHT is 6 times faster then procedure PARSE_CSV_WRONG. We can easily identify lines, where procedures spend most of the time.  I’ll show this for runid 53.

SELECT u.unit_name,
 d.total_occur,
 d.total_time / 1000 microsec,
 substr(s.text, 1, 60) plsql_code
FROM   plsql_profiler_units u
 INNER JOIN plsql_profiler_data d ON (u.runid = d.runid AND u.unit_number = d.unit_number)
 INNER JOIN all_source s ON (s.owner = u.unit_owner AND s.type = u.unit_type AND s.name = u.unit_name AND s.line = d.line#)
WHERE  u.runid = 53
AND    u.unit_name LIKE 'PARSE_CSV%'
ORDER BY
 u.unit_number, d.line#;
UNIT_NAME        TOTAL_OCCUR   MICROSEC PLSQL_CODE
---------------- ----------- ---------- -------------------------------------------------------------
PARSE_CSV_WRONG            0          4 PROCEDURE parse_csv_wrong(i_csv_list IN VARCHAR2) IS
PARSE_CSV_WRONG            1         85   l_csv_list := i_csv_list || ',';
PARSE_CSV_WRONG         6003       6851     EXIT WHEN l_csv_list IS NULL;
PARSE_CSV_WRONG         6002      10532     l_comma_pos := instr(l_csv_list, ',');
PARSE_CSV_WRONG         6002      11269     l_dummy_num := to_number(substr(l_csv_list, 1, l_comma_p
PARSE_CSV_WRONG         6002     183778     l_csv_list := substr(l_csv_list, l_comma_pos + 1);
PARSE_CSV_WRONG            0          4 END parse_csv_wrong;
PARSE_CSV_RIGHT            0          5 PROCEDURE parse_csv_right(i_csv_list IN VARCHAR2) IS
PARSE_CSV_RIGHT            1         63   l_csv_list := i_csv_list || ',';
PARSE_CSV_RIGHT            1          1   l_offset := 1;
PARSE_CSV_RIGHT         6003       9806     l_comma_pos := instr(l_csv_list, ',', l_offset);
PARSE_CSV_RIGHT         6003       7243     EXIT WHEN l_comma_pos = 0;
PARSE_CSV_RIGHT         6002      11144     l_dummy_num := to_number(substr(l_csv_list, l_offset, l_
PARSE_CSV_RIGHT         6002       7523     l_offset := l_comma_pos + 1;
PARSE_CSV_RIGHT            1          5 END parse_csv_right;

15 rows selected.

In the procedure PARSE_CSV_WRONG, most amount of the time was spent by removing the first value. In the procedure PARSE_CSV_RIGHT, the time was spent by extracting the values from the list.

Tags: ,


May 31 2008

Bitwise OR Aggregate Function

Category: Oracle,SQLRadoslav Golian @ 5:46 pm

A few months ago I was working on an application which used bits to store some flags in one number. In this approach the bit value in the number indicates whether a flag is on (1) or off (0). In my case a flag was tied with a column of a front-end table. If the flag was off then the column didn’t appear on the front-end.

Example:

flag     : flag3 flag2 flag1 flag0
state    :   off   on   off    on
bit value:     0    1     0     1

01012 (binary) = 510 (decadic), so the number 5 will be stored in the database.

In this case, the columns tied with the flags flag0 and flag2 will appear on the front-end and the columns tied with the flags flag1 and flag3 won’t. We can use the BITAND function to find out whether the flag is set or not. The flag flagi in the number X is set iff BITAND(X, 2i) = 1.

I had to merge some tables vertically, in that application. The appearance of a column in the merged table depended on the result of the bitwise OR operations. If the result of the operation table1_flagi OR table2_flagi … OR tablen_flagi was equal to 1, then the column columni appeared in the merged table. I had to find all columns visible in the merged table.

There is no BITOR function in the Oracle database for the integer types. But it can be easily implemented: BITOR(N1, N2) = N1 + N2 – BITAND(N1, N2). This is common approach to compute the bitwise OR.

The idea is simple: In exactly one (arbitrary) number, we have to unset those bits, which are set in both numbers and then simply use the addition operation.

Example:

   1001    1000    1001
OR 0101  + 0101  + 0100
   ----    ----    ----
   1101    1101    1101

We unset the last bit in exactly one number, because it is the only one common bit for both numbers:

BITAND(1001, 0101) = 1

The next step to be done is implementation of bitwise OR aggregate function using user-defined aggregates interface:

CREATE OR REPLACE TYPE bitor_impl AS OBJECT
(
  bitor NUMBER,

  STATIC FUNCTION ODCIAggregateInitialize(ctx IN OUT bitor_impl) RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateIterate(SELF  IN OUT bitor_impl,
                                       VALUE IN NUMBER) RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateMerge(SELF IN OUT bitor_impl,
                                     ctx2 IN bitor_impl) RETURN NUMBER,

  MEMBER FUNCTION ODCIAggregateTerminate(SELF        IN OUT bitor_impl,
                                         returnvalue OUT NUMBER,
                                         flags       IN NUMBER) RETURN NUMBER
)
/

CREATE OR REPLACE TYPE BODY bitor_impl IS
  STATIC FUNCTION ODCIAggregateInitialize(ctx IN OUT bitor_impl) RETURN NUMBER IS
  BEGIN
    ctx := bitor_impl(0);
    RETURN ODCIConst.Success;
  END ODCIAggregateInitialize;

  MEMBER FUNCTION ODCIAggregateIterate(SELF  IN OUT bitor_impl,
                                       VALUE IN NUMBER) RETURN NUMBER IS
  BEGIN
    SELF.bitor := SELF.bitor + VALUE - bitand(SELF.bitor, VALUE);
    RETURN ODCIConst.Success;
  END ODCIAggregateIterate;

  MEMBER FUNCTION ODCIAggregateMerge(SELF IN OUT bitor_impl,
                                     ctx2 IN bitor_impl) RETURN NUMBER IS
  BEGIN
    SELF.bitor := SELF.bitor + ctx2.bitor - bitand(SELF.bitor, ctx2.bitor);
    RETURN ODCIConst.Success;
  END ODCIAggregateMerge;

  MEMBER FUNCTION ODCIAggregateTerminate(SELF        IN OUT bitor_impl,
                                         returnvalue OUT NUMBER,
                                         flags       IN NUMBER) RETURN NUMBER IS
  BEGIN
    returnvalue := SELF.bitor;
    RETURN ODCIConst.Success;
  END ODCIAggregateTerminate;
END;
/

The last step is definition of the bitwise OR aggregate function. This definition is tied with the object bitor_impl, that implements the aggregate function. I implemented the ODCIAggregateMerge method in this object, therefore I can allow parallel execution by using clause PARALLEL_ENABLE.

CREATE OR REPLACE FUNCTION bitoragg(x IN NUMBER) RETURN NUMBER
PARALLEL_ENABLE
AGGREGATE USING bitor_impl;
/

The aggregate function in action:

SQL> DROP TABLE bitor_test;
Table dropped
SQL> CREATE TABLE bitor_test(table_name varchar2(31), flags number);
Table created
SQL> INSERT INTO bitor_test VALUES ('table1', 5);  -- 0101
1 row inserted
SQL> INSERT INTO bitor_test VALUES ('table2', 1);  -- 0001
1 row inserted
SQL> INSERT INTO bitor_test VALUES ('table3', 9);  -- 1001
1 row inserted
SQL> INSERT INTO bitor_test VALUES ('table4', 12); -- 1100
1 row inserted
SQL> COMMIT;
Commit complete
SQL> SELECT bitoragg(flags) FROM bitor_test;
BITORAGG(FLAGS)
---------------
13
SQL>

Simple calculation shows us the correctness of this result:

510 OR 110 OR 910 OR 1210 = 01012 OR 00012 OR 10012 OR 11002 = 11012 = 1310

Tags: