Question : french accent insensitive SQL search

Hi

I am using a sybase 11.2 database

I need to do a search on a varchar(255) column where text can be multiple words which can contain accented french characters.

The user will type in non accented characters into a search text field on the web front end but the search must also match on accented characters, i.e. it must be accent insensitive.

The search SQL has a like in the where clause. I've experimented with soundex but can only get that to work in certain circumstances, e.g. if the user types in the first word of a sentence and the accent is contained in the first word. If accent is in second or third words it doesnt seem to work.

Anyone got any suggestions?

Paul

Answer : french accent insensitive SQL search

What they are talking about is the collating sequence that can be chosen at installation time.  The default is BINARY which means that the database is case sensitive and orders strings in sort and index operations in the same way they appear in the ASCII table.  There are a number of available collating sequences that can be used including some that are both case and accent insensitive.

There is a T-SQL function called COMPARE that allows you to compare two strings using one of the alternative collating sequences.  The problem with COMPARE is that it does a straight comparison; it does not allow a LIKE-clause style of operation.  There is another function (that compare uses under the covers) called SORTKEY.  It applies a collating sequence to a string on the fly which is exactly what you want, however, it gets to be a real performance hog for data sets greater than 1000 rows.

So, your two practical choices are:
 1) Adopt a new collating sequence for the entire database.  Use one like Latin-1 noaccent (id=54).  From then on, your comparisons will ignore the accents completely.  You can do some trickery with COMPARE or SORTKEY if you every do need to differentiate between accented and unaccented strings; probably a fairly rare occurance.
 2) Build the shadow column or helper table and use the SORTKEY function to strip the accents at insert time.

By far the easiest and probably the way you want to go is to change the collating sequence for the ASE server.  It is as transparent a change as you could hope for.  The caveats are important only if you are doing replication or have other ASE servers or applications in your organization.  The rule of thumb is to keep all of the servers on the same collating sequence.

The procedures for changing the sort order and/or character set are detailed in the System Administrators Guide, Chapter 7, Reconfiguring the character set, sort order, or message language, which is freely available for download as a PDF if you do not have local access to the ASE documentation set.

Regards,
Bill
Random Solutions  
 
programming4us programming4us