Implementing preg_replace in VBScript

Here are a couple of wrapper VBScript functions that make using script’s built-in RegExp class similar to PHP’s preg_xxx family of functions.

First a private helper function to initialize an instance of RegExp:

Private Function preg_init(find_re)
    Set preg_init = New RegExp
    With preg_init
        .Global = True
        If Left(find_re, 1) = "/" Then
            Dim pos: pos = InStrRev(find_re, "/")
            .Pattern = Mid(find_re, 2, pos - 2)
            .IgnoreCase = (InStr(pos, find_re, "i") > 0)
            .Multiline = (InStr(pos, find_re, "m") > 0)
            .Pattern = find_re
        End If
    End With
End Function

This enables optionally to set flags of the object in the search string after the trailing slash e.g. “/test/mi” will search for “test” ignoring case and multi-line. This helper function is not meant to be used by the client scripts but helps implement the actual search&replace functions.

Next the implementation of preg_match, preg_replace and preg_split becomes very simple

Function preg_match(find_re, text)
    preg_match = preg_init(find_re).Test(text)
End Function

Function preg_replace(find_re, replace_arg, text)
    preg_replace = preg_init(find_re).Replace(text, replace_arg)
End Function

Function preg_split(find_re, text)
    Dim esc: esc = ChrW(&HE1B6) '-- U+E000 to U+F8FF - Private Use Area (PUA)
    preg_split = Split(preg_init(find_re).Replace(text, esc), esc)
End Function

These are fairly straight forward and are about enough to complete 99% of any search&replace task at hand. For preg_replace one can use replace placeholders in VBScript notation ($1, $2, etc.) to match search groups.

The troublesome 1% with this implementation of preg_replace is that it cannot handle callback functions for replace_arg. So here is a manual implementation of preg_replace that can handles both strings and a callback objects for replace_arg.

Function preg_replace_callback(find_re, replace_arg, text)
    Dim matches, match, count, offset, retval
    Set matches = preg_init(find_re).Execute(text)
    If matches.Count = 0 Then
        preg_replace_callback = text
        Exit Function
    End If
    ReDim retval(matches.Count * (1 - IsObject(replace_arg)))
    For Each match In matches
        With match
            retval(count) = Mid(text, 1 + offset, .FirstIndex - offset)
            count = count + 1
            If IsObject(replace_arg) Then
                retval(count) = replace_arg(match)
                count = count + 1
            End If
            offset = .FirstIndex + .Length
        End With
    retval(count) = Mid(text, 1 + offset)
    If IsObject(replace_arg) Then
        preg_replace_callback = Join(retval, vbNullString)
        preg_replace_callback = Join(retval, replace_arg)
    End If
End Function

For a callback object one has to pass an instance of a class with a default method. Here is sample class that implements PHP notation for replace placeholders (\1, \2, etc. or \{1}, \{2}, etc.)

Function preg_substitute(replace_arg)
    Set preg_substitute = New preg_substitute_class.init(replace_arg)
End Function

Class preg_substitute_class
    private m_esc
    Private m_replace
    Public Function init(replace_arg)
        m_esc = ChrW(&HE1B6) '-- U+E000 to U+F8FF - Private Use Area (PUA)
        m_replace = Replace(replace_arg, "\", m_esc)
        Set init = Me
    End Function
    Public Default Function callback(match)
        Dim idx, replace_str
        replace_str = match.Value
        callback = Replace(Replace(m_replace, m_esc & "{0}", replace_str), m_esc & "0", replace_str)
        With match.SubMatches
            For idx = .Count To 1 Step -1
                replace_str = .Item(idx - 1)
                callback = Replace(Replace(callback, m_esc & "{" & idx & "}", replace_str), m_esc & idx, replace_str)
        End With
        callback = Replace(callback, m_esc, "\")
    End Function
End Class

This can be used like preg_replace_callback("/(test)\s+(this)/mi", preg_substitute("\{2} \{1}")).

The most interesting method of the callback class is the default one — here it’s named callback but the actual name can be arbitrary. The default method receives a match argument which is the entry in the RegExp‘s matches collection and returns a string to be used as a replace string.

Match object exposes currently matched substring in Value property, its position in FirstIndex property and all the matched subgroups in SubMatches collection. This allows a much more sophisticated replacement implementation, for instance lower/upper casing entries in the SubMatches collection, etc.

The performance of preg_replace_callback is about 20% worse than RegExp‘s build-in Replace method used directly by the simple preg_replace even for thousands of occurrences of the search regular expression.

Full source code of these functions including sample usage is available in our pg_conv.vbs converting script that we used to convert a Microsoft SQL Server database table definitions script to PostgreSQL dialect.

This entry was posted in Articles and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s