Thursday, February 26, 2009

Regex File Search with Groovy

I needed something interesting to get me going.  And then I moved to a new team at work, losing access to my nice tool rich linux account, leaving me with only a shiny new Dell Windows laptop and a need to quickly hunt through a large, legacy codebase.   Hooray - Groovy to the rescue!

The first pass at this script took me about 2 hours from Groovy Console install to working code.  This cleaned up version took me maybe another 2 of tweaking the regex and collapsing code down into more compact closures.    For reference I leaned heavily on Google and the working with files and I/O section from "Groovy in Action".

(Prerequisite note on starting point for true beginners - download the installer from groovy.codehaus.org,  following code was developed and run from the GroovyConsole
My apologies for any syntax bugs introduced in moving code into this format)

Quick Summary of lessons in developing this code:
  • If you don't call flush on writing to a file, file most often empty when script exits.
  • Explicitly calling java.util.Pattern does not produce the same matching behavior as using the Groovy notation
  • I'm still a little confused on where is safe to use single or double quotes, I got lots of errors about parsing the script.  Seemed like comment indicators don't completely insulate against the console recognizing quote marks.
  • I'm not sure how much patience I'll have in the future for the java.io package!

//*************
//Walk through a set of files
//collects information about a target string into an output file.
//*******************

// no need to define type for variables, just use generic def keyword
def outputfile = new File("myresultsfile.out").newWriter()

//This is the class invented for ant 
//spiders through directories to get list of files to search. 
//list of prefixes identifies specific file types of interest.
FileNameFinder dirList = new FileNameFinder();
List sourcefiles = dirList.getFileNames("mysrcdirectory/", "**/*.jsp, **/*.java, **/*.js, **/*.javascript, **/*.properties, **/*.xml, **/*.conf, **/*.props")

//defining a holder for results.  I want unique so using a set
Collection fileList = new HashSet();

//define a simple function to control where results are written. 
//with closures the 'method' is simply defined as another type.  
//Default input variable is it
def collectResults = {
  println it
   outputfile.writeLine(it)
}

//function takes three inputs filename, linenumber, and line
//for the line that holds the pattern
//write out the information to the output file
//save the filename in a set to generate a list of target files

def processfile = {
//defining specific function inputs instead of using default it
   srcfile, srcline, src ->   //newbies note the arrow signature it's important!

//find all text matches in the line, 
//no need to mess with java.util classes
// Groovy's added simple syntax to generate a matcher (like Perl)
   def finder = ( src =~ /.*targetstring*/ )
   if (finder.matches())
   {
     //write file and line number and matched results
       fileList.add(srcfile)
      //another favorite script feature available in Groovy
      //just use token syntax to generate mix of variable and static output
       collectResults( "${srcfile}^${srcline}^${src}")
    }
    else
  {
    //easy to debug - print information to sysout without all the typing.
    println 'did not find ' + finder.pattern()
   }
}

//Now for the meat of the script. 
//Notice the easy for iteration syntax,  
//define a list or collection and a variable to hold each reference (again see perl)

for ( sourceFil in sourcefiles){
//read file in 
  File fileHandle = new File(sourceFil)
   Reader openFile = fileHandle.newReader()
   //for each line - use the Groovy defined easy file line reader method.
   // this method takes a closure and run it each line of a file,
   // notice 'it' is the active line of file
   openFile.eachLine{ processfile(sourceFil, openFile.getLineNumber(), it)}
}

//when done processing files flush output file to ensure capture of text
outputfile.flush()


//Also check if want to write out list of unique file names captured
if (fileList.size() > 0)
{
    def chgFiles = new File("filesofinterest.out").newWriter()
    //First line to tell me how many files were identified to contain the target pattern
    chgFiles.writeLine(fileList.size().toString())

//again using a nifty Groovy method added to every list or collection type 
//passing it a closure to execute for each element in the list
    fileList.each{ 
       chgFiles.writeLine(it)
       println(it)
     }
    //ensuring the output gets written
    chgFiles.flush()
}

No comments:

Post a Comment