How to Convert EUC-KR Source Code to UTF-8

I found old STM32 firmware code. The files were in EUC-KR encoding. This is an old Korean text format. Modern tools use UTF-8. So I needed to change all files to UTF-8.

The Problem

My firmware project (STM32F10x chip)
358 source files (.c, .cpp, .h files)
Most files were EUC-KR or ISO-8859-1
I wanted to change everything to UTF-8

Solution: A Simple Script

I made a bash script. It uses iconv command. This command can change text encoding safely.

The Conversion Script

#!/bin/bash

# Change source files from EUC-KR to UTF-8

BASE_DIR="/path/to/your/project"
SUCCESS_COUNT=0
FAIL_COUNT=0
SKIP_COUNT=0

echo "=== Starting UTF-8 conversion ==="
echo "Working directory: $BASE_DIR"

# Find all source files
find "$BASE_DIR" -type f \( \
    -name "*.c" -o -name "*.cpp" -o -name "*.cc" -o \
    -name "*.h" -o -name "*.hpp" -o \
    -name "*.txt" -o -name "*.md" \
\) | while read -r file; do

    # Check current encoding
    ENCODING=$(file -b --mime-encoding "$file")

    # Skip if already UTF-8
    if [[ "$ENCODING" == "utf-8" || "$ENCODING" == "us-ascii" ]]; then
        echo "[SKIP] $file (already $ENCODING)"
        ((SKIP_COUNT++))
        continue
    fi

    # Convert EUC-KR to UTF-8
    if iconv -f EUC-KR -t UTF-8 "$file" > "$file.utf8.tmp" 2>/dev/null; then
        # Success: replace original file
        mv "$file.utf8.tmp" "$file"
        echo "[OK] $file ($ENCODING -> UTF-8)"
        ((SUCCESS_COUNT++))
    else
        # Failed: remove temp file
        rm -f "$file.utf8.tmp"
        echo "[FAIL] $file (conversion failed)"
        ((FAIL_COUNT++))
    fi
done

echo ""
echo "=== Conversion done ==="
echo "Success: $SUCCESS_COUNT"
echo "Failed: $FAIL_COUNT"
echo "Skipped: $SKIP_COUNT"

How to Use

Save the script

vi /tmp/convert_to_utf8.sh
# Paste the script above
# Change BASE_DIR to your project path

Make it executable

chmod +x /tmp/convert_to_utf8.sh

Run it (save log)

/tmp/convert_to_utf8.sh 2>&1 | tee /tmp/convert_utf8.log

Check results

# Count statistics
grep -c "^\[OK\]" /tmp/convert_utf8.log      # success
grep -c "^\[SKIP\]" /tmp/convert_utf8.log    # skipped
grep -c "^\[FAIL\]" /tmp/convert_utf8.log    # failed

# See failed files
grep "^\[FAIL\]" /tmp/convert_utf8.log

Verify encoding

# Check encoding after conversion
file -b --mime-encoding your_source_file.c

# Output: utf-8

Why This Script is Safe

1. Safe Conversion

First converts to temporary file (.utf8.tmp)
Only replaces original if successful
Original file is kept if conversion fails

2. No Duplicate Work

Uses file command to check current encoding
Skips files already in UTF-8 or ASCII

3. Clear Logs

Shows status for each file
Gives final statistics

4. Many File Types

C/C++ source: .c, .cpp, .cc, .h, .hpp
Documents: .txt, .md

Important Tips

Make a backup first: Save your project before converting

tar -czf project_backup_$(date +%Y%m%d).tar.gz /path/to/project

If using Git: Convert before committing

git status  # check status before
./convert_to_utf8.sh
git diff    # see what changed
git add .
git commit -m "Convert source files to UTF-8"

Text files only: The script only works on text files, not binary files

Other Uses

You can change the file types to convert:

# Python projects
-name "*.py" -o -name "*.pyx" -o -name "*.pyi"

# Java projects
-name "*.java" -o -name "*.xml" -o -name "*.properties"

# Web projects
-name "*.html" -o -name "*.css" -o -name "*.js" -o -name "*.jsx"

Conclusion

Using iconv and find together makes it easy to update old projects. This is very useful for embedded projects with old code. The script is safe and works well.

Tags: linux, encoding, utf8, euckr, bash, script, embedded

happycpu notepad

2026-05-13