Find duplicate files using VB.Net


This article is written by Pon Saravanan  on 17-Mar-10 Last modified on :05-May-10

Ads by Lake Quincy Media



VB.Net Tutorial to find duplicate files

When you are receiving files from different sources but the content is same, there is a possibility of having a different filename for the same content. Hence finding a duplicate file just by file name may not be sufficient. To compare by file data, there are several ways.

Usage of Message Digest (MD5)

To find duplicate files even after renamed, the content/data has to be compared after the content of files fetched. Once the file content is in data format, the data can be encoded with MD5 hash algorithm. The string result after hash can be used for comparing. MD5 is a widely used cryptographic hash function with a 128-bit hash value, and is also commonly used to check the integrity of files

MD5 in .Net Framework

.NET Framework has very rich support for encrypting and decrypting. Computing hashes and encrypting data using a variety of algorithms is very easy. Use the ComputeHash() method to compute the MD5 Hash.

Computed Hash to compare

For the MD5 to work we should give which encoding it should follow, basically we are using ASCIIEncoding. This same function can be used in a recursive call to check all the duplicates. Once all
the files in the directory are scanned and compared it is much easier to delete the duplicate files found.

Source Code

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        If (CompareFiles("D:\FirstFile.txt", "D:\FirstFile1.txt")) Then
            MsgBox("duplicate")
        Else
            MsgBox("diff")
        End If
    End Sub
    Public Function CompareFiles(ByVal FirstFile As String, _
        ByVal SecondFile As String) As Boolean
        Return ReadFile(FirstFile) = ReadFile(SecondFile)
    End Function
    Private Function ReadFile(ByVal Path As String) As String
        Dim ReadFileStream As FileStream
        Dim FileEncoding As New System.Text.ASCIIEncoding()
        Dim FileReader As StreamReader
        Dim HashData As New MD5CryptoServiceProvider()
        ReadFileStream = New FileStream(Path, FileMode.Open)
        FileReader = New StreamReader(ReadFileStream)
        Dim FileBytes = FileEncoding.GetBytes(FileReader.ReadToEnd)
        Dim FetchedContent = FileEncoding.GetString(HashData.ComputeHash(FileBytes))
        FileReader.Close()
        ReadFileStream.Close()
        Return FetchedContent
    End Function


« Previous - Website Screen Capture
- Next »







Comments

Comments
   
Captcha Image
For you specially:  
Captcha Text Enter the text in the image.(Not Case sensitive)    



Spam Bot Trap
   






Select Theme
White
Gray
Blue
Brown