How to change PostgreSQL database encoding to UTF8
The default encoding of the template databases in PostgreSQL is set to SQL_ASCII. If this encoding has not been changed, then the new databases will be created using this template and hence will have the same encoding SQL_ASCII. Overall this should not be a big problem unless Unicode data is required to be saved in the database. In that case you will receive an error such as :
Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
In this blog, we will go through all steps in order to change the encoding of the database to UTF8. First of all one needs to know that encoding of already created database cannot be altered. Hence we need to drop and create a new one.
- Checking the current server encoding.
postgres@yourserver:~$ psql psql (10.7 (Ubuntu 10.7-0ubuntu0.18.04.1)) Type "help" for help. postgres=# SHOW SERVER_ENCODING; server_encoding ----------------- SQL_ASCII (1 row) postgres=#\c your_database; You are now connected to database "your_database" as user "postgres". your_database=# SHOW SERVER_ENCODING; server_encoding ----------------- SQL_ASCII (1 row)
In this case, as you can see, both the template database (the first result) and your_database is based on SQL_ASCII encoding.
There are two ways to proceed from here but before that its necessary to understand the process of CREATE DATABASE command. This command by default creates a copy of database from template1 which is the default template for creating any new database. This template has some preset settings already and ready for creating new databases. There is also another template which is free from all preset settings called template0 [ref.https://www.postgresql.org/docs/12/manage-ag-templatedbs.html].
Coming back to ways of creating new database, one can either create new database using template0 and applying encoding UTF8 to it. Or updating the template1 encoding to UTF8, thereby any future databases will also be based on UTF8 as well. From here on, we will follow the second solution which is better in my opinion.
- Create backup of your_database.
postgres@yourserver:~$ pg_dump your_database > dump.sql
OPTIONAL: On pg_dump, sequences by default are set to AS integer. This might cause issues for some applications (such as Django). To fix this, use
sed 's/AS integer//' dump.sql > altered_dump.sql
- Drop your database.
postgres@yourserver:~$ dropdb your_database
- Update the template1 encoding to UTF8.
postgres@yourserver:~$ psql psql (10.7 (Ubuntu 10.7-0ubuntu0.18.04.1)) Type "help" for help. postgres=# UPDATE pg_database SET datistemplate = FALSE WHERE datname = 'template1'; postgres=# DROP DATABASE template1; postgres=# CREATE DATABASE template1 WITH TEMPLATE = template0 ENCODING = 'UTF8'; postgres=# UPDATE pg_database SET datistemplate = TRUE WHERE datname = 'template1'; postgres=# \c template1; You are now connected to database "template1" as user "postgres". template1=# VACUUM FREEZE;
- Create a new database having the same name as before. This time it will use the newly updated template1 for creating the database.
postgres@yourserver:~$ createdb -E utf8 your_database
- Restore your data using your backup file (either the dump.sql or altered_dump.sql file)
postgres@yourserver:~$ psql your_database < dump.sql
That's it. Now you may check the encoding again for the new database by following the step 1 again .