Linux下转换文件编码

有的时候查看文件格式的时候会发现有趣的事情

[root@feel ~]# file system.cfg
system.cfg: Non-ISO extended-ASCII text, with CRLF, NEL line terminators

with CRLF, NEL line terminators可以判断是windwos的文件

linux文本格式为:ASCII text

windows文本格式为:ASCII text,with CRLF line terminators

通过cat -v 文件名命令查看文本内容,行尾有^M符号

yum install dos2unix -y

dos2unix system.cfg

可以变更为ASCII text

通过enca判断文件编码

yum install enca

[root@feel ~]# enca –list languages
belarusian: CP1251 IBM866 ISO-8859-5 KOI8-UNI maccyr IBM855 KOI8-U
bulgarian: CP1251 ISO-8859-5 IBM855 maccyr ECMA-113
czech: ISO-8859-2 CP1250 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
estonian: ISO-8859-4 CP1257 IBM775 ISO-8859-13 macce baltic
croatian: CP1250 ISO-8859-2 IBM852 macce CORK
hungarian: ISO-8859-2 CP1250 IBM852 macce CORK
lithuanian: CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
latvian: CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
polish: ISO-8859-2 CP1250 IBM852 macce ISO-8859-13 ISO-8859-16 baltic CORK
russian: KOI8-R CP1251 ISO-8859-5 IBM866 maccyr
slovak: CP1250 ISO-8859-2 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
slovene: ISO-8859-2 CP1250 IBM852 macce CORK
ukrainian: CP1251 IBM855 ISO-8859-5 CP1125 KOI8-U maccyr
chinese: GBK BIG5 HZ
none:

[root@feel ~]# enca -L chinese system.cfg
Unrecognized encoding

enca会显示是编码

[root@feel ~]# enca -L czech system.cfg
Kamenicky encoding; KEYBCS2

iconv –list 查看iconv支持的编码

可以写个脚本一个一个测试编码

脚本如下

#!/bin/bash

iconv –list | sed ‘s/\/\/$//’ | sort > encodings.list
for a in cat encodings.list; do
printf “$a “
iconv -f $a -t UTF-8 system.cfg > /dev/null 2>&1 && echo “ok: $a” || echo “fail: $a”
done | tee result.txt

跑脚本后查看result.txt里的ok 就可以了。

这样就能转换好文件编码了。

iconv -f 源编码 -t UTF-8 转换前文件名 > 转换后文件名

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注