English

Unix: Check and convert file enconding charset

This tutorial will show you how to quickly check and convert file encoding charsets on Unix based operational systems, such as Linux distros and Mac OS.

Check your file encoding

In order to check the current file encoding, use the command below, replacing <filename> by the desired file.

file -I <filename>

Example:

file -I test.csv
test.csv: text/plain; charset=iso-8859-1

Convert your file encoding

Now that you already know the encoding of your file, you can convert your source file to a new one with the desired encoding. In order to do so, run the command below replacing the parameters <source_encoding>, <destination_encoding>, <source_filename>, and <destination_filename>.

iconv -f <source_encoding> -t <destination_encoding> <source_file> > <destination_file>

Example:

iconv -f iso-8859-1 -t utf-8 test.csv > new_test.csv

The output file new_test.csv has the contents of the old file but with the desired encoding:

file -I new_test.csv
new_test.csv: text/plain; charset=utf-8

Useful tip

Files with charset US-ASCII are compatible with the UTF-8 charset, so in these cases, if you try to convert from US-ASCII to UTF-8 the output file will still be US-ASCII since no conversion is necessary.