Home > Groovy > File.eachFileRecurse() – speedup when filtering directories

File.eachFileRecurse() – speedup when filtering directories

File.eachFileRecurse() is a wonderful GDK addition in Groovy. In fact, this function was exactly the reason why I’ve started to code Groovy one day. We’ve talked about how “Maven with branches” problem can be solved and came to conclusion the simplest way would be modifying all <groupId> with some script. That’s when I’ve recalled that Groovy has a nice recursive iteration function for all files (that I was shown once).

After using this lovely function to iterate over all POMs in our source tree (and we have lot’s of them, believe me) I’ve noticed the iteration goes into directories I’d rather not go into – “.svn”, “build” and “dist” (the last two are our Maven’s <outputDirectory> and <directory>). That seemed like quite some time that could be saved! Unfortunately, eachFileRecurse() has no way to stop the recursion in a folder in order not to go any deeper.

So I have re-wrote it a little:

/**
 * File.eachFileRecurse( Closure ) improvement accepting a filtering closure, which will be passed each file and
 * each directory found.
 * If filtering closure returns "true" - recursion continues as usual.
 * If it returns "false" for file      - the file isn't passed to execution closure (second argument).
 * If it returns "false" for directory - recursion will skip it, this makes a powerful speedup when recursively
 *                                       iterating files while skeeping certain folders
 */
private static void eachFileRecurse( File    dir, 
                                     Closure closure, 
                                     Closure filter = { return true } )
{
    for ( file in dir.listFiles())
    {
        if ( filter.call( file ))
        {
            if ( file.isDirectory())
            {
                eachFileRecurse( file, closure, filter );
            }
            else
            {
                closure.call( file );
            }
        }
    }
}

Now, having the following filter ( this.skipDirectories is [‘.svn’, ‘build’, ‘dist’] in my case):

Closure recursionFilter =
{
    file –>
    if ( file.isFile())
    {
        return ( file.name == ‘pom.xml’ );
    }


if ( file.isDirectory() && ( this.skipDirectories.any{ it == file.name } )) { return false; }
return true; };

I’m now getting a 3-7 times speedup when iterating over sources tree! Skipping certain directories does matter 🙂
It reminds me of university days where we were solving a TSP problem as part of “Concurrent And Distributed Programming” course – the major speedup was always coming from skeeping certain paths.

Let’s now try to add this option to Groovy GDK.

Advertisements
Categories: Groovy Tags:
  1. August 16, 2009 at 22:52

    After upgrading to Groovy 1.6.4 I see that a scanning time of 10Gb huge SVN repository went down from ~110-120 seconds to ~80 seconds now. Cool!

  2. Liqun
    January 9, 2011 at 22:36

    Paste following code to one of your Groovy class, you will be able to filter out .svn or CVS folder. also add method eachFileRecurseMatch and eachDirRecurseMatch

    static {
    def excludeFilter = ‘\\.svn|CVS’

    // Override eachFile to exclude dir: .svn or CVS
    File.metaClass.eachFile = { FileType fileType, Closure closure ->
    if (!delegate.exists())
    throw new FileNotFoundException(delegate.getAbsolutePath());
    if (!delegate.isDirectory())
    throw new IllegalArgumentException(“The provided File object is not a directory: ” + delegate.getAbsolutePath());
    final File[] files = delegate.listFiles();
    if (files == null) return;
    for (File file : files) {
    if ((fileType != FileType.FILES && file.isDirectory() && !(file.name ==~ /${excludeFilter}/)) ||
    (fileType != FileType.DIRECTORIES && file.isFile())){
    closure.call(file);
    }
    }
    }

    File.metaClass.eachFile = { Closure closure ->
    delegate.eachFile(FileType.ANY, closure)
    }

    File.metaClass.eachDir = { Closure closure ->
    delegate.eachFile(FileType.DIRECTORIES, closure)
    }

    // Override eachFileMatch to exclude dir: .svn or CVS
    File.metaClass.eachFileMatch = { FileType fileType, Object nameFilter, Closure closure ->
    if (!delegate.exists())
    throw new FileNotFoundException(delegate.getAbsolutePath());
    if (!delegate.isDirectory())
    throw new IllegalArgumentException(“The provided File object is not a directory: ” + delegate.getAbsolutePath());
    final File[] files = delegate.listFiles();
    if (files == null) return;
    final MetaClass metaClass = InvokerHelper.getMetaClass(nameFilter);
    for (final File currentFile : files) {
    if ((fileType != FileType.FILES && currentFile.isDirectory() && !(currentFile.name ==~ /${excludeFilter}/)) ||
    (fileType != FileType.DIRECTORIES && currentFile.isFile())) {
    if (DefaultTypeTransformation.castToBoolean(metaClass.invokeMethod(nameFilter, “isCase”, currentFile.getName())))
    closure.call(currentFile);
    }
    }
    }

    File.metaClass.eachFileMatch = { Object nameFilter, Closure closure ->
    delegate.eachFileMatch(FileType.ANY, nameFilter, closure)
    }

    File.metaClass.eachDirMatch = { Object nameFilter, Closure closure ->
    delegate.eachFileMatch(FileType.DIRECTORIES, nameFilter, closure)
    }

    // Override eachFileRecurse to exclude dir: .svn or CVS
    File.metaClass.eachFileRecurse = { FileType fileType, Closure closure ->
    if (!delegate.exists())
    throw new FileNotFoundException(delegate.getAbsolutePath());
    if (!delegate.isDirectory())
    throw new IllegalArgumentException(“The provided File object is not a directory: ” + delegate.getAbsolutePath());
    final File[] files = delegate.listFiles();
    if (files == null) return;
    for (File file : files) {
    if (file.isDirectory()) {
    if (!(file.name ==~ /${excludeFilter}/)) {
    if (fileType != FileType.FILES) closure.call(file);
    file.eachFileRecurse(fileType, closure);
    }
    } else if (fileType != FileType.DIRECTORIES) {
    closure.call(file);
    }
    }
    }

    File.metaClass.eachFileRecurse = { Closure closure ->
    delegate.eachFileRecurse(FileType.ANY, closure)
    }

    File.metaClass.eachDirRecurse = { Closure closure ->
    delegate.eachFileRecurse(FileType.DIRECTORIES, closure)
    }

    // Expando File class, add method eachFileRecurseMatch
    File.metaClass.eachFileRecurseMatch = { FileType fileType, Object nameFilter, Closure closure ->
    if (!delegate.exists())
    throw new FileNotFoundException(delegate.getAbsolutePath());
    if (!delegate.isDirectory())
    throw new IllegalArgumentException(“The provided File object is not a directory: ” + delegate.getAbsolutePath());
    final File[] files = delegate.listFiles();
    if (files == null) return;
    final MetaClass metaClass = InvokerHelper.getMetaClass(nameFilter);
    for (File file : files) {
    if (file.isDirectory()) {
    if (!(file.name ==~ /${excludeFilter}/)) {
    if (fileType != FileType.FILES) {
    if (DefaultTypeTransformation.castToBoolean(metaClass.invokeMethod(nameFilter, “isCase”, file.getName())))
    closure.call(file);
    }
    file.eachFileRecurseMatch(fileType, nameFilter, closure);
    }
    } else if (fileType != FileType.DIRECTORIES) {
    if (DefaultTypeTransformation.castToBoolean(metaClass.invokeMethod(nameFilter, “isCase”, file.getName())))
    closure.call(file);
    }
    }
    }

    File.metaClass.eachFileRecurseMatch = { Object nameFilter, Closure closure ->
    delegate.eachFileRecurse(FileType.ANY, nameFilter, closure)
    }

    File.metaClass.eachDirRecurseMatch = { Object nameFilter, Closure closure ->
    delegate.eachFileRecurse(FileType.DIRECTORIES, nameFilter, closure)
    }
    }

    • January 10, 2011 at 03:06

      Hey, thanks a lot! I’ll be improving Groovy version of file/directory iterations so this example will be very handy.

  3. September 13, 2014 at 05:46

    Hi there friends, its enormous article on the topic of tutoringand fully explained, keep it up all the time.

  4. November 12, 2014 at 22:39

    Greetings from Idaho! I’m bored at work so I decided to browse your site on my iphone during lunch break.
    I really like the information you present here and can’t wait to take a look when I get home.

    I’m shocked at how fast your blog loaded on my cell phone ..
    I’m not even using WIFI, just 3G .. Anyways, fantastic blog!

  5. September 15, 2015 at 06:24

    Hey There. I discovered your weblog the usage of msn. This is a very neatly written article.
    I will make sure to bookmark it and come back to read extra of your
    helpful information. Thank you for the post. I’ll definitely return.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: