I’ve been told to perform some maintenace into one chaotic directory.
This directory stores about 225000 plain text files per month without any structure (yes I know, don’t ask). Try to do a simple ls into this directory after one year of files to realize that there is a problem.
We decided to create a monthly zip archive to store the files (based on it last modified date) while trying to explain the problem to the process «developer????»
Maybe you can do it with shell scripting , but I’ve chosen groovy.
There are a lot of improvements to do ( exception handling, validate parameters,… ), but it’s a starting point and worked fine for me. Feel free to change the script and share your improvements or comments.
Here is the script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | import java.text.* import org.codehaus.groovy.runtime.TimeCategory import org.apache.tools.zip.ZipOutputStream import org.apache.tools.zip.ZipEntry //import org.apache.tools.tar.TarOutputStream //import org.apache.tools.tar.TarEntry //"2011-04-01" def startd = new SimpleDateFormat("yyyy-MM-dd").parse(args[0]) def endd = startd use(TimeCategory) { endd = startd + 1.month } println "Archiving files between $startd and $endd" def period = {file -> new Date(file.lastModified()) > startd && new Date(file.lastModified()) < endd } def thefiles = new File(args[1]).listFiles().toList().findAll(period) if( thefiles.size > 0 ){ ByteArrayOutputStream baos = new ByteArrayOutputStream() ZipOutputStream zipFile = new ZipOutputStream(baos) thefiles.each{ if( it.isFile() ){ zipFile.putNextEntry(new ZipEntry(it.name)) it.withInputStream { i -> zipFile << i } zipFile.closeEntry() } } zipFile.finish() def month = startd.format('yyyyMM') OutputStream outputStream = new FileOutputStream ( "${args[1]}/backup.${month}.zip" ) baos.writeTo(outputStream) thefiles.each{ it.delete() } println "${thefiles.size} files archived into ${args[1]}/backup.${month}.zip " }else{ println "No files found for given date " } |
To execute it, simply run
1 | groovy monthly_files.groovy 2012-07-01 /path/to/directory |
Specify as parameters the first day of month to process, and the path to the directory containing the files to archive.
If the amount of files is very large you’ll have to give more memory to the JVM
1 | export JAVA_OPTS="-Xmx2048m -XX:MaxPermSize=128m" |
Hope you find it useful
alfonsorv
zip files the #groovy way: a script to archive files monthly
http://t.co/n0xLK4hC
vía @alfonsorv