Problem solve Get help with specific problems with your technologies, process and projects.

Everything you need to know about the Oracle hash function

Our expert covers the entire Oracle hash functionality from 'HASHKEYS' and 'HASH IS' to the Random Oracle Model.

Dear Sir,

I am a Basis consultant and since you are certified in the OCP database I wanted to ask you a database-related question. Please tell me, in detail, what is a hash function? Also, send me your email address so that I can contact you whenever I am in trouble.


Here is the detailed answer to your question:

This method is available only when using cost-based optimization. Hash Join builds a hash table from the rows in the driving table and uses the same hash formula on each row of the driven table to find matches.


HASHKEYS create a hash cluster and specify the number of hash values for a hash cluster. In a hash cluster, Oracle stores together rows that have the same HASHKEYS value. The hash value for a row is the value returned by the cluster's hash function.

Oracle rounds up the HASHKEYS value to the nearest prime number to obtain the actual number of hash values. The minimum value for this parameter is two. If you omit both the INDEX clause and the HASHKEYS parameter, Oracle creates an indexed cluster by default.

When you create a hash cluster, Oracle immediately allocates space for the cluster based on the values of the SIZE and HASHKEYS parameters. For more information on how Oracle allocates space for clusters:


SINGLE TABLE specifies that the cluster is a type of hash cluster containing only one table. This clause can provide faster access to rows than would result if the table were not part of a cluster.

Restriction: Only one table can be present in the cluster at a time. However, you can drop the table and create a different table in the same cluster.


HASH IS specifies an expression to be used as the hash function for the hash cluster. The expression:

  • Must evaluate to a positive value
  • Must contain at least one column with referenced columns of any datatype as long as the entire expression evaluates to a number of scale 0. For example: NUM_COLUMN * length (VARCHAR2_COLUMN)
  • Cannot reference user-defined PL/SQL functions
  • Cannot evaluate to a constant
  • Cannot contain a subquery
  • Cannot contain columns qualified with a schema or object name (other than the cluster name)

If you omit the HASH IS clause, Oracle uses an internal hash function for the hash cluster.

The cluster key of a hash column can have one or more columns of any datatype. Hash clusters with composite cluster keys or cluster keys made up of non-integer columns must use the internal hash function.


The conversion of a column's primary key value to a database page number on which the row will be stored. Retrieval operations that specify the key column value use the same hashing algorithm and can locate the row directly. Hashing provides fast retrieval for data that contains a unique key value.

Hash function

A formula that is applied to each value of a table column or a combination of several columns, called the index key, to get the address of the area in which the row should be stored. When locating data, the database uses the hash function again to get the data's location.

The Random Oracle Model -- What Does It Mean?

The following is a typical example of the kind of thing that can be proven in the Random Oracle Model:

"Full domain hash is secure against an adaptive chosen message attack with respect to existential forgeries in the Random Oracle Model, provided that it infeasible to invert a trapdoor one-way permutation, f."

The proof is actually a reduction: given a hypothetical forging algorithm for full domain hash, which has some specified probability of success, we construct an inverting algorithm, which inverts the function f on randomly chosen points, again with some specified probability of success. If we believe that it is infeasible to invert f, then we have a contradiction, and we conclude that the hypothesized forging algorithm cannot exist.

So far, this is all pretty standard, and not too difficult to follow. However, the proof is taking place in the "Random Oracle Model" which means that some additional assumptions are being made, and the proof of security is valid only if these assumptions are valid.

The Random Oracle Model is usually described by saying that the hash function (which is used in full domain hash in our example) is modeled as a random function. Of course, when full domain hash is actually used in practice, a particular hash function must be specified, and so the assumption used in the Random Oracle Model is not valid.

Indeed, there is a particular protocol which has been proved secure in the Random Oracle Model, but which becomes insecure whenever the hash function used in the protocol is specified (this is a result of Canetti, Goldreich and Halevi). As a result, the Random Oracle Model is somewhat controversial. Shoup has gone so far as to characterize proofs in the Random Oracle Model as "heuristic proofs". (It is not clear what the phrase "heuristic proof" means, but it is undoubtedly not good.)

The correct way to interpret a proof of security for a protocol P in the Random Oracle Model is to view it as a proof of security against certain types of attacks on the protocol P. More precisely, the proof shows that the protocol P is secure against what might be termed "hash-generic" attacks. This means that any attack which treats the hash function as a random function will not be successful -- regardless of whether the hash function is actually a random function. In other words, it is better to think of a proof in the Random Oracle Model as a proof in which we make an assumption about the attacking algorithm rather than an assumption about the hash function -- which, as we already mentioned, cannot be valid.

I suggest the term "hash-generic" attack to suggest an analogy with algorithms in the generic group model. Suppose we consider the discrete logarithm problem as a typical example. An algorithm to solve the DLP in a generic group is allowed to access group elements only via a random encoding algorithm, which encodes elements of the cyclic group (Z_n,+) injectively as random bit-strings, say. A "group-generic" algorithm is not allowed to make any assumptions about encodings of group elements other than those elements that it has queried through a group oracle. This is analogous to a "hash-generic" algorithm that is only allowed to obtain values of a hash function via a random hash oracle; the algorithm is not permitted to make any assumptions about values of the hash function at any points other than the ones at which it queried the oracle.

It was proven by Shoup that the DLP in (Z_n,+) cannot be solved by a group-generic algorithm in time less than the square root of n. This does not prove anything about the difficulty of solving the DLP in any specific group by a non-generic algorithm. For example, the index calculus algorithm can solve the DLP in (Z_p,*) faster than a generic algorithm can. In the case of appropriate elliptic curve groups, the only known algorithms for the DLP are generic algorithms. Shoup's result on the complexity of generic algorithms tells us that any improvement to the complexity of solving the DLP in elliptic curves must come from non-generic algorithms. The complicated nature of the encoding used in elliptic curves (with respect to the group operation) can be taken as evidence (not a proof, of course) that the DLP in elliptic curves is hard.

A security proof in the Random Oracle Model must be interpreted in a similar light. As mentioned above, a security proof establishes that a protocol is secure against hash-generic algorithms. It is of course possible that an algorithm can break the protocol for some particular hash functions (or even for all possible hash functions) by somehow taking advantage of how the hash function is computed. The proof in the Random Oracle Model can be taken as evidence of security when the random oracle is replaced by a particular hash function. This was the "thesis" proposed by Bellare and Rogaway when they introduced the Random Oracle Model, and it should be stressed that no practical protocol proven secure in the Random Oracle Model has been broken when used with a "good" hash function, such as SHA-1 (the protocol used in the Canetti-Goldreich-Halevi result was not a natural protocol for a "reasonable" cryptographic application; rather, it was designed explicitly for the purposes of their proof).

You can contact me at [email protected]

SAP BASIS/ BI Consultant
[email protected]

Dig Deeper on SAP trends, strategy and ERP market share