PostgreSQL case and accent insentitive SQL queries

3 minute read

A few days ago we were implementing a Spring Boot application which stored its data into a PostgreSQL database. As a requirement we needed to search into the database with LIKE operator in our SQL queries and at the same time perform case and accent insensitive searches. Since we are Greek, this is pretty typical for us because our letters have diacritics (“tonos”). So a query like the following

select *
from blablatable
where name like '%χρη%' ;

should return results for:

  • Χρήστος
  • ΧΡΗΣΤΟΣ
  • χΡήστος

The aforementioned SQL query works perfectly normal in a MySQL or MariaDB database with collation utf8_general_ci. However in PostgreSQL this is not the case. So after some searching and reading StackOverflow #11005036, we figured out how to make case and accent insensitive search SQL queries.

Fire up a PostgreSQL Docker container

We chose to use PostgreSQL inside a Docker container, so we fire up one:

docker run --name postgres \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=strongpassword \
  -e POSTGRES_USER=stronguser \
  -e POSTGRES_DB=airplanes \
  -d postgres:13.0

Connect to PostgreSQL using a Docker container

We used a temporary container to connect to our newly created database. You can skip this step and use another client such as pgAdmin or DBeaver.

docker run -it --rm postgres:13.0 psql -h host.docker.internal -d airplanes --user stronguser
Password for user stronguser:
<enter thy password>

Mind the host.docker.internal. This host name is used only in Docker for Windows to mark the host computer where Docker runs.

Create an extension in PostgreSQL

In order to use case and accent insensitive search SQL queries, we need to use a PostgreSQL extension. So, execute in your favourite PostgreSQL client:

CREATE EXTENSION unaccent;

Create a table and insert some data

Then we need to create a simple table in the default schema public, which will have some text data with accents:

create sequence public.BOBOS_SEQUENCE start 1 increment 1;

create table public.bobos (
    ID int8 not null DEFAULT  nextval('BOBOS_SEQUENCE'),
    name varchar(255),
    primary key (ID)
);

After that we populate it with some test data:

INSERT INTO public.bobos (name) VALUES('Τάκης, the best handmade gyros of Triandria');
INSERT INTO public.bobos (name) VALUES('Χρήστος');
INSERT INTO public.bobos (name) VALUES('ΧΡΗΣΤΟΣ');
INSERT INTO public.bobos (name) VALUES('χΡήστος');

SQL case and accent insensitive queries

Finally in order to perform case and accent insensitive SQL queries , execute the following statement:

select * 
from public.bobos 
where unaccent(LOWER(NAME)) like unaccent(LOWER('%χρη%')) ;

The results will be like the following:

id|name   |
--|-------|
 2|Χρήστος|
 3|ΧΡΗΣΤΟΣ|
 4|χΡήστος|

As you might have noticed, we need to use both unaccent() extension and LOWER() function in order to meet the desired result.

I hope you found this article useful and you do not waste any more time searching in the web. Have a nice day and love thy neighbour!

Comments