Estándar

zip files the groovy way

I’ve been told to perform some maintenace into one chaotic directory.

This directory stores about 225000 plain text files per month without any structure (yes I know, don’t ask). Try to do a simple ls into this directory after one year of files to realize that there is a problem.

We decided to create a monthly zip archive to store the files (based on it last modified date) while trying to explain the problem to the process «developer????»

Maybe you can do it with shell scripting , but I’ve chosen groovy.

There are a lot of improvements to do ( exception handling, validate parameters,… ), but it’s a starting point and worked fine for me. Feel free to change the script and share your improvements or comments.

Here is the script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import java.text.*
import org.codehaus.groovy.runtime.TimeCategory
import org.apache.tools.zip.ZipOutputStream
import org.apache.tools.zip.ZipEntry
//import org.apache.tools.tar.TarOutputStream
//import org.apache.tools.tar.TarEntry
 
//"2011-04-01"
 
def startd = new SimpleDateFormat("yyyy-MM-dd").parse(args[0])
def endd = startd
 
use(TimeCategory) {
    endd = startd + 1.month
}
 
println "Archiving files between $startd and $endd"
 
def period = {file -> new Date(file.lastModified()) > startd && new Date(file.lastModified()) < endd  } def thefiles = new File(args[1]).listFiles().toList().findAll(period) if( thefiles.size > 0 ){
 
    ByteArrayOutputStream baos = new ByteArrayOutputStream()
    ZipOutputStream zipFile = new ZipOutputStream(baos)
 
    thefiles.each{
       if( it.isFile() ){
          zipFile.putNextEntry(new ZipEntry(it.name))
 
          it.withInputStream { i ->
            zipFile << i
          }
 
          zipFile.closeEntry()
       }
    }
    zipFile.finish()
 
    def month = startd.format('yyyyMM')
 
    OutputStream outputStream = new FileOutputStream ( "${args[1]}/backup.${month}.zip" )
    baos.writeTo(outputStream)
    thefiles.each{
        it.delete()
    }
    println "${thefiles.size} files archived into ${args[1]}/backup.${month}.zip "
 
}else{
    println "No files found for given date  "
}

To execute it, simply run

1
groovy monthly_files.groovy 2012-07-01 /path/to/directory

Specify as parameters the first day of month to process, and the path to the directory containing the files to archive.

If the amount of files is very large you’ll have to give more memory to the JVM

1
export JAVA_OPTS="-Xmx2048m -XX:MaxPermSize=128m"

Hope you find it useful

Estándar

Esta semana 2012-10-14 en twitter

Estándar

Esta semana 2012-10-07 en twitter

Estándar

Esta semana 2012-10-03 en twitter

Estándar

Esta semana 2012-09-16 en twitter