LINUX教程：shell腳本示例：批量比較多個文件的內容是否相同

作者：VEPHP 時間 2017-10-04

《LINUX教程：shell腳本示例：批量比較多個文件的內容是否相同》要點：
本文介紹了LINUX教程：shell腳本示例：批量比較多個文件的內容是否相同，希望對您有用。如果有疑問，可以聯系我們。

要比擬兩個文件的內容是否完全一致,可以簡單地使用diff命令.例如：

diff file1 file2 &>/dev/null;echo $?

但是diff命令只能給定兩個文件參數,因此無法一次性比擬多個文件(目錄也被當作文件),而且diff比擬非文本類文件或者極大的文件時效率極低.

這時可以使用md5sum來實現,相比diff的逐行比擬,md5sum的速度快的多的多.

md5sum的使用辦法見：Linux中文件MD5校驗.

但md5sum只能通過查看md5值來間接比擬文件是否相同,要實現批量自動比擬,則需要寫成循環.腳本如下：

#!/bin/bash
###########################################################
#  description: compare many files one time               #
#  author     : 駿馬金龍                                   #
#  blog       : http://www.cnblogs.com/f-ck-need-u/       #
###########################################################
# filename: md5.sh
# Usage: $0 file1 file2 file3 ...
IFS=$'\n'
declare -A md5_array
# If use while read loop, the array in while statement will
# auto set to null after the loop, so i use for statement
# instead the while, and so, i modify the variable IFS to
# $'\n'.
# md5sum format: MD5  /path/to/file
# such as:80748c3a55b726226ad51a4bafa1c4aa /etc/fstab
for line in `md5sum "$@"`
do
    index=${line%% *}
    file=${line##* }
    md5_array[$index]="$file ${md5_array[$index]}"
done
# Traverse the md5_array
for i in ${!md5_array[@]}
do
    echo -e "the same file with md5: $i\n--------------\n`echo ${md5_array[$i]}|tr ' ' '\n'`\n"
done

為了測試該腳本,先復制幾個文件,并修改此中幾個文件的內容,例如：

[root@linuxidc ~]# for i in `seq -s' ' 6`;do cp -a /etc/fstab /tmp/fs$i;done
[root@linuxidc ~]# echo ha >>/tmp/fs4
[root@linuxidc ~]# echo haha >>/tmp/fs5

現在,/tmp目錄下有6個文件fs1、fs2、fs3、fs4、fs5和fs6,此中fs4和fs5被修改,剩余4個文件內容完全相同.

[root@linuxidc tmp]# ./md5.sh /tmp/fs[1-6]
the same file with md5: a612cd5d162e4620b442b0ff3474bf98
--------------------------
/tmp/fs6
/tmp/fs3
/tmp/fs2
/tmp/fs1
the same file with md5: 80748c3a55b726226ad51a4bafa1c4aa
--------------------------
/tmp/fs4
the same file with md5: 30dd43dba10521c1e94267bbd117877b
--------------------------
/tmp/fs5

更具通用性地比較辦法：比較多個目錄下的同名文件.

[root@linuxidc tmp]# find /tmp -type f -name "fs[0-9]" -print0 | xargs -0 ./md5.sh  
the same file with md5:a612cd5d162e4620b442b0ff3474bf98
--------------------------
/tmp/fs6
/tmp/fs3
/tmp/fs2
/tmp/fs1
the same file with md5:80748c3a55b726226ad51a4bafa1c4aa
--------------------------
/tmp/fs4
the same file with md5:30dd43dba10521c1e94267bbd117877b
--------------------------
/tmp/fs5

腳本闡明：

(1).md5sum計算的成果格式為"MD5 /path/to/file",因此要在成果中既輸出MD5值,又輸出相同MD5對應的文件,考慮使用數組.

(2).一開始的時候我使用while循環,從尺度輸入中讀取每個文件md5sum的結果.語句如下：

md5sum "$@" | while read index file;do
    md5_array[$index]="$file ${md5_array[$index]}"
done

但由于管道使得while語句在子shell中執行,于是while中賦值的數組md5_array在循環停止時將失效.所以可改寫為：

while read index file;do
    md5_array[$index]="$file ${md5_array[$index]}"
done <<<"$(md5sum "$@")"

不外我最終還是使用了更繁瑣的for循環：

IFS=$'\n'
for line in `md5sum "$@"`
do
    index=${line%% *}
    file=${line##* }
    md5_array[$index]="$file ${md5_array[$index]}"
done

但md5sum的每行結果中有兩列,而for循環采納默認的IFS會將這兩列分割為兩個值,因此還修改了IFS變量的值為$'\n',使得一行賦值一次變量.

(3).index和file變量是為了將md5sum的每一行結果拆分成兩個變量,MD5部門作為數組的index,file作為數組變量值的一部門.因此,數組賦值語句為：

md5_array[$index]="$file ${md5_array[$index]}"

(4).數組賦值完成后,開始遍歷數組.遍歷的辦法有多種.我采用的是遍歷數組的index列表,即每行的MD5值.

# Traverse the md5_array
for i in ${!md5_array[@]}
do
    echo -e "the same file with md5: $i\n--------------\n`echo ${md5_array[$i]}|tr ' ' '\n'`\n"
done

本文永遠更新鏈接地址：

更多LINUX教程，盡在維易PHP學院專欄。歡迎交流《LINUX教程：shell腳本示例：批量比較多個文件的內容是否相同》！

轉載請注明本頁網址：
http://www.snjht.com/jiaocheng/8870.html

標簽：

欧美97色伦欧美一区二区日韩,国产福利片在线观看,freexxx性欧美vide0高清,西西亚洲,日本欧美国产精品第一页久久,成人18免费软件

PHP教程

WEB前端開發

數據庫

WEB服務器

APP開發

LINUX學習

后端開發課程

前端開發課程

數據庫課程

LINUX教程：shell腳本示例：批量比較多個文件的內容是否相同

同類教程排行

特輯教程