Soundex function Oracle

Here again let’s learn about one of the very interesting functions of Oracle PL/SQL called Soundex. From the name itself your brains might have started churning. Is it something related to sound? What it has to do with PL/SQL? How can I use it? Oh! don’t worry, Let’s find out…

What is SOUNDEX function?

Soundex function returns character strings that sounds alike in English. In other words the input string and returned strings are phonetically equivalent to each other. For example, “pick”, “peek” and “pic” are spelled differently but are phonetically similar (sounds similar).

Why to use SOUNDEX?

Suppose your table contains various misspelled entries of Delhi city as “Delhi”, “Deli”, “Dehli” and “Delhy”. Your task is to retrieve all records with similar sounding names and correct them. Soundex function can be very useful to you here.

Read why Soundex does not work for numbers

Syntax

Example

Algorithm of Soundex function according to Oracle

  1. Retain the first letter of the string
  2. Remove all other occurrences of the following letters: a, e, h, i, o, u, w, y (or change it to zero ‘0’)
  3. Assign digits to the remaining letters (after the first) as follows:
    b, f, p, v = 1
    c, g, j, k, q, s, x, z = 2
    d, t = 3
    l = 4
    m, n = 5
    r = 6
  4. If two or more letters with the same number were adjacent in the original name (before step 1), or adjacent except for any intervening h and w, then omit all but the first.
  5. Replace first digit with the letter (as in Step 1)
  6. If string is less than 4 letters pad ‘0’ on the right. If it is more than 4 letters return only the first four positions.

Explanation of Soundex Algorithm by example

Suppose we have 3 phonetically similar strings as below:

String 1 = Pick
String 2 = Peek
String 3 = Pic

Let us see how soundex algorithm works for these strings:

  • Retain the first letter of the string

Retained letter of string 1: P
Retained letter of string 2: P
Retained letter of string 3: P

  • Remove all other occurrences of the following letters: a, e, h, i, o, u, w, y (or change it to zero ‘0’)

Result of string 1: Pck
Result of string 2: Pk
Result of string 3: Pc

  • Assign digits to the remaining letters (after the first) as follows:

Result of string 1: 122
Result of string 2: 12
Result of string 3: 12

  • If two or more letters with the same number were adjacent in the original name (before step 1), or adjacent except for any intervening h and w, then omit all but the first.

Result of string 1: 12 (As c and k are adjacent to each other and have same digit we can remove all but one digit)
Result of string 2: 12
Result of string 3: 12

  • Replace first digit with the letter that was retained in Step 1

Result of string 1: P2
Result of string 2: P2
Result of string 3: P2

  • If string is less than 4 letters pad ‘0’ on the right. If it is more than 4 letters return only the first four positions.

Result of string 1: P200
Result of string 2: P200
Result of string 3: P200

So, the result of all three strings are identical and using Soundex for any one of them will return all the three strings.

For more information you can read official Oracle documentation of Soundex function.

Did you liked the above post? Leave the comments below. Your suggestions and feedback are always welcome.

Subscribe here to stay updated on latest posts of your favourite website OracleMine.com.

Leave a Reply

Your email address will not be published. Required fields are marked *