VB.Net Web Page Link Grabber

This article is written by Pon Saravanan  on 11-Apr-10 Last modified on :11-Apr-10

Grab Links from a Web Page

There are several ways to get all the links in a web page document. But I like this approach because of its simplicity and straight forward implementations. The rest of the approaches are like text processing with regular expressions.  But for this approach a little knowledge of HTML and WebBrowser control is necessary

WebBrowser Control

.Net framework 2.0 onwards is shipping with WebBrowser Control. This can be used to deal with HTML DOM objects. As the control is dealing with web the nature of the document loaded is in HTML format. To use it effectively one should know some on HTML.

DocumentCompleted Event

The document object is filled only after the document is loaded into the WebBrowser component. So to access a document we need to wait till the document is fully loaded. To determine this event there is an event in the WebBrowser component called DocumentCompleted. But unfortunately this is not exactly as the name says. So we need to use another property to find out the document is fully complete. The property is ReadyState.  The value should be WebBrowserReadyState.Complete.


The next is to get all the links once the document is loaded in the component. Here we can use the Document.Links to fetch all the links. Yes, this is really that simple.  But the downfall is the links collection contains all HtmlElement. It is a generic type used to get all controls. So you can not expect a direct property like href, target. So we need to use GetAttribute(“”) to get the attributes of the HtmlElement.

Source Code

Public Class Form1
    Private Sub Button1_Click(ByVal sender As System.Object, _
                              ByVal e As System.EventArgs) Handles Button1.Click
    End Sub
    Private Sub WebBrowser1_DocumentCompleted( _
                            ByVal sender As Object, _
                            ByVal e As WebBrowserDocumentCompletedEventArgs) _
                            Handles WebBrowser1.DocumentCompleted
        If (WebBrowser1.ReadyState = WebBrowserReadyState.Complete) Then
            For Each ClientControl As HtmlElement In WebBrowser1.Document.Links
        End If
    End Sub
End Class

    test 2/24/2011 9:03:57 PM

    cool 3/26/2011 8:41:11 AM

    Thanks for this Gret Job 8/31/2011 1:56:38 PM

    Sehr gut 12/11/2011 3:55:37 AM

  
