Following a problem I had when trying to checkout old files from CVS repository I found out how to display the hex value of certain characters and how to convert them as well.
Most older filenames have been encoded with the character set ISO-8859-1(latin-1) or ISO-8859-2(European) or ISO-8859-15(European+EURO sign). Mopst new systems are working with the UTF-8.
Examples illustrate better:
Here is the filename I got from the old CVS repository:
ls *.jpg
Architektur�bersicht_Gesamt�berblick.jpg
If I look at the type of hexcode the filename contains:
ls *.jpg | hexdump -cb
0000000 A r c h i t e k t u r � b e r s
0000000 101 162 143 150 151 164 145 153 164 165 162 374 142 145 162 163
0000010 i c h t _ G e s a m t � b e r b
0000010 151 143 150 164 137 107 145 163 141 155 164 374 142 145 162 142
0000020 l i c k . j p g \n
0000020 154 151 143 153 056 152 160 147 012
0000029
We can see that the funny characters ‘�’ are having the Hex value ‘374’ which is the German ‘ü’ coded in ISO-8859-1. To be able to see it in a system which uses the locale UTF-8 you can pass it through the char code converter ‘recode’ and here is what I get:
ls *.jpg | recode ISO-8859-1..UTF-8
Architekturübersicht_Gesamtüberblick.jpg
Note: In Debian Linux system, the ‘recode’ tool can be installed with the command:
apt-get install recode