2026-05-13

How to Convert EUC-KR Source Code to UTF-8

How to Convert EUC-KR Source Code to UTF-8

I found old STM32 firmware code. The files were in EUC-KR encoding. This is an old Korean text format. Modern tools use UTF-8. So I needed to change all files to UTF-8.

The Problem

  • My firmware project (STM32F10x chip)
  • 358 source files (.c, .cpp, .h files)
  • Most files were EUC-KR or ISO-8859-1
  • I wanted to change everything to UTF-8

Solution: A Simple Script

I made a bash script. It uses iconv command. This command can change text encoding safely.

The Conversion Script

#!/bin/bash

# Change source files from EUC-KR to UTF-8

BASE_DIR="/path/to/your/project"
SUCCESS_COUNT=0
FAIL_COUNT=0
SKIP_COUNT=0

echo "=== Starting UTF-8 conversion ==="
echo "Working directory: $BASE_DIR"

# Find all source files
find "$BASE_DIR" -type f \( \
    -name "*.c" -o -name "*.cpp" -o -name "*.cc" -o \
    -name "*.h" -o -name "*.hpp" -o \
    -name "*.txt" -o -name "*.md" \
\) | while read -r file; do

    # Check current encoding
    ENCODING=$(file -b --mime-encoding "$file")

    # Skip if already UTF-8
    if [[ "$ENCODING" == "utf-8" || "$ENCODING" == "us-ascii" ]]; then
        echo "[SKIP] $file (already $ENCODING)"
        ((SKIP_COUNT++))
        continue
    fi

    # Convert EUC-KR to UTF-8
    if iconv -f EUC-KR -t UTF-8 "$file" > "$file.utf8.tmp" 2>/dev/null; then
        # Success: replace original file
        mv "$file.utf8.tmp" "$file"
        echo "[OK] $file ($ENCODING -> UTF-8)"
        ((SUCCESS_COUNT++))
    else
        # Failed: remove temp file
        rm -f "$file.utf8.tmp"
        echo "[FAIL] $file (conversion failed)"
        ((FAIL_COUNT++))
    fi
done

echo ""
echo "=== Conversion done ==="
echo "Success: $SUCCESS_COUNT"
echo "Failed: $FAIL_COUNT"
echo "Skipped: $SKIP_COUNT"

How to Use

  1. Save the script
vi /tmp/convert_to_utf8.sh
# Paste the script above
# Change BASE_DIR to your project path
  1. Make it executable
chmod +x /tmp/convert_to_utf8.sh
  1. Run it (save log)
/tmp/convert_to_utf8.sh 2>&1 | tee /tmp/convert_utf8.log
  1. Check results
# Count statistics
grep -c "^\[OK\]" /tmp/convert_utf8.log      # success
grep -c "^\[SKIP\]" /tmp/convert_utf8.log    # skipped
grep -c "^\[FAIL\]" /tmp/convert_utf8.log    # failed

# See failed files
grep "^\[FAIL\]" /tmp/convert_utf8.log
  1. Verify encoding
# Check encoding after conversion
file -b --mime-encoding your_source_file.c

# Output: utf-8

Why This Script is Safe

1. Safe Conversion

  • First converts to temporary file (.utf8.tmp)
  • Only replaces original if successful
  • Original file is kept if conversion fails

2. No Duplicate Work

  • Uses file command to check current encoding
  • Skips files already in UTF-8 or ASCII

3. Clear Logs

  • Shows status for each file
  • Gives final statistics

4. Many File Types

  • C/C++ source: .c, .cpp, .cc, .h, .hpp
  • Documents: .txt, .md

Important Tips

  1. Make a backup first: Save your project before converting

    tar -czf project_backup_$(date +%Y%m%d).tar.gz /path/to/project
  2. If using Git: Convert before committing

    git status  # check status before
    ./convert_to_utf8.sh
    git diff    # see what changed
    git add .
    git commit -m "Convert source files to UTF-8"
  3. Text files only: The script only works on text files, not binary files

Other Uses

You can change the file types to convert:

# Python projects
-name "*.py" -o -name "*.pyx" -o -name "*.pyi"

# Java projects
-name "*.java" -o -name "*.xml" -o -name "*.properties"

# Web projects
-name "*.html" -o -name "*.css" -o -name "*.js" -o -name "*.jsx"

Conclusion

Using iconv and find together makes it easy to update old projects. This is very useful for embedded projects with old code. The script is safe and works well.


Tags: linux, encoding, utf8, euckr, bash, script, embedded

댓글 없음:

댓글 쓰기

How to Convert EUC-KR Source Code to UTF-8

How to Convert EUC-KR Source Code to UTF-8 I found old STM32 firmware code. The files were in EUC-KR encoding. This is an old Korean text f...