IAF9 J:? ;LC@E Á C8 :8IK

Transcription

JPJ8;D@EOpen Source Job SchedulerFg\e JfliZ\ AfY JZ \[lc\iAF9 J:? ;LC@E Á C8 :8IK Planning and scheduling jobs can mean a lot of work,K ebX# fkfc Xespecially if they are spread across multiplemachines. Here’s a tool to make that task a lot easier.BY JAMES MOHRKhe ability to perform a certaintask at a specific time or at regular intervals is a necessary taskfor sys admins. The original cron daemon offers an easy method for jobscheduling on Unix-based systems. Although cron has seen a number of improvements over the years, even thenewer versions are designed for verybasic scheduling. An administrator whowants to do anything unusual must either create a wrapper script or build theadditional functionality into whateverscript is started by cron.Imagine how much time you couldsave if you no longer needed to createwrappers, hack your scripts, or do anything else to get programs to react toerror conditions and run at exactly thetime you need – in exactly the order theyshould. Several commercial productsoffer this functionality, but they can takea big bite out of your IT budget. Luckily,the open source world also provides solutions for beyond-cron scheduling.In this article, I will explain how to getstarted with a powerful alternative: theOpen Source Job Scheduler.With anyscheduling software, the primary admin-istrative unit is the job, which is typicallya script or program started by the scheduling software. In many cases, cron issufficient to handle the most simplisticscheduling requirements, such as running a certain job once a day (i.e., backups). Even jobs that need to run at morefrequent intervals (every 15 minutes),less frequently (once a month), or evenon specific dates (the first of the month)can be handled by cron.When you start dealing with dependencies of any kind, you quickly beginto see the limitations of cron – for example, if you want to start a specific program after a certain event occurs. In mywork, I have a number of job chains thatconsist of up to a dozen individual jobsthat must be run in a precise order, andeach job can only run if the previous jobin the chain was successful. If one stepfails, the execution of subsequent jobswould create major problems. Even thenewest versions of cron cannot handlethis, especially if you need to be ablejump into the middle of the chain occasionally and start a job from there.Events that start jobs can be morethan just specific times or the comple-66DECEMBER 2008ISSUE 97tion of other jobs. Often jobs simply waituntil either a specific file is delivered oranything at all arrives in a predefined directory. Naturally, on many occasions,specific jobs must be started on demandrather than waiting for a specific event.Many tasks need to be managed onmultiple machines, so the better scheduling software allows you to manage allof your machines from a central point,remotely start jobs, and so forth. To distribute crontabs to these remote machines, you could configure rsync or usesome other mechanism, but this quicklybecomes an administrative nightmarewhen the configuration is differentacross several machines, when jobs needto be started manually often, and whenother tasks exist that are not part ofcron’s functionality.IXYY k kf k \ I\jZl\Although you can install open sourceversions of cron on Windows machines,dealing with different operating systemscauses even more problems. For example, some Unix dialects do not support acentral /etc/crontab file, so you need toset up files for individual users. Cron

Open Source Job Schedulerwas a useful tool in its time, and oftenstill is, but the requirements of manycompanies rule it out.The limitations of cron have not goneunnoticed, and a number of productshave come on the market. Commercialproducts can cost a small fortune andare often licensed on the basis of thenumber of servers or, in some cases, thenumber of scripts you start.Problems with job scheduling andcommercial software licensing have notgone unnoticed by the open source community. One amazing solution is theOpen Source Job Scheduler, developedby Software- und Organisations-Service(SOS) GmbH in Berlin, Germany. Versions are available for Linux, Solaris,HP-UX (PA-RISC, IA64), AIX, and Windows. It also supports several differentdatabases, including DB2, Oracle, MSSQL Server, PostgresSQL, and MySQL.Depending on your needs, two different licenses are available: GPL and Guaranteed License. Both license types provide the same basics, including all of thefunctionality, source code, and upgrades.Note that the product is the same withboth licenses.Although some products provide morefeatures in the commercial version, SOSManaging Director Andreas Püschel seestheir business focused on support andservice and not the product itself. Also,SOS does not market their product – oreven their services – in the traditionalsense. When demonstrating the product,the goal is not to convince potential customers, but rather to show what theproduct can do and let the customers decide for themselves whether the productfills their requirements. Simply seeingwhat this product can do will convince alot of people.The commercial version also providesa “responsible person” for each servicecall with guaranteed response times, andfeature requests are given a higher priority for possible implementation. Also,this version does not have the restrictions of the GPL, so you can bundle itwith your own application, for example,without having to adhere to the GPL.Unlike most open source and commercial software, SOS also provides a limitedtwo-year warranty, as well as an indemnity agreement. According to Püschel,the company feels obligated to give thecustomer what they pay for, and thisalso extends to discrepancies betweenthe product and documentation.The central component of the packageis the Job Scheduler engine, which runson every machine on which you want toschedule jobs. This method is differentfrom executing the job, and the documentation provides a couple of ways toprovide for remote execution, one ofwhich is to have a scheduler installed onthe remote machine acting as a slave forthe first machine. However, if necessary,the other servers can function as a fullscheduler, as well. This mechanism canbe expanded to enable load balancingacross multiple servers. A job chain ismanaged from one machine and distributes the requests to the other machines.One key aspect to consider is the socalled order, which is a token or flag thatis passed between jobs. Depending ontheir configuration, jobs cannot startuntil they have been given their order.In the simplest form, orders are like abaton in a relay race, handed from onejob to the next. However, they can alsocontain parameters that are passed between jobs (e.g., the current file beingprocessed). Also, you can configure orders with specific start times so they areautomatically generated by the system atthe specified time, and then the respective job chain can start.Note that to be able to react to anorder, a job must be configured to acceptthem, but the order can only be associated with a job chain, as opposed to anindividual job. That is, the order ispassed from job to job within the jobchain, but it is associated with the chainand causes the job chain to start. If youwant a single job to start at a specifictime, for example, this can be donewithin the job itself. Another key component is Hot Folders, which are directories that the scheduler monitors forchanges, such as new or modified jobs.Jobs can be configured from the JobScheduler Editor GUI (Figure 1; hereaftercalled the Job Editor), or XML files canbe edited directly. The Job Editor GUI isa Java-based application that you canuse to configure the various jobs, chains,and other aspects of the system. All ofthe server configuration information isstored in XML files, and all you need todo is open up the respective XML file inthe Job Editor to make your changes.From my experience with other jobDECEMBER 2008JPJ8;D@Eschedulers, being able to use vi on configuration files is more than a blessingwhen having to do massive changes.The XML files can be copied to remotemachines and saved via FTP directlyfrom the Job Editor. When they land in aHot Folder, the files are immediatelyavailable to the scheduler engine on theremote machine.The Job Editor GUI is not as intuitiveas it could be, and the purpose of manyof the fields is unclear at first. Becausethe documentation leaves something tobe desired, explanations for many ofthese fields was simply not to be found.In some cases, I could still figure outwhat was meant by the descriptions ofthe XML files in the documentation.Although the configuration information is stored in XML files by default,you can configure the scheduler to use anumber of different databases. Thesejobs are called “managed jobs,” andevery scheduler you set up can be configured to access the jobs in the database, so you do not need to copy thefiles manually.The day-to-day operation is handledby the Job Scheduler operations GUI(hereafter called the Operations GUI),which is run through a web browser,thus allowing you to manage your jobsfrom almost any machine. With thisGUI, you can monitor not only jobs, butalso start, stop, handle errors, and manyother functions.The Job Scheduler also provides anAPI that allows you to manage and control jobs externally. The API supportsseveral languages, including Perl, VBScript, JavaScript, and Java. Surprisingly, PHP is not supported, despite theability to manage jobs from a webbrowser and the documentation of sample PHP scripts.JZ \[lc e k \ @ejkXccXk feFor the examples in this article, I usedversion 1.3.4 for Linux, which you candownload from the Job Scheduler site atSourceForge [1]. If you plan to use aMySQL database, as I did, note that theJob Scheduler does not provide a JDBCdriver for MySQL, although one is provided for Oracle and other databases.The MySQL JDBC Driver can be downloaded directly from the MySQL website[2]. Simply input the path to the appropriate .jar during the installation.ISSUE 9767

JPJ8;D@EOpen Source Job SchedulerBefore you start, I recommend thatyou read the PDF installation guide thatis included in the package. Also, severalother PDFs are on the company’s website [3] that go into more detail aboutvarious topics. Be warned: To get eventhe most basic information from the documentation, you need to be somewhatfamiliar with object-oriented programming and XML because the documentation provides almost no background information in these areas. Also, the documentation is not well organized, so expect to do a lot of searching. Althoughthe documentation is extensive, it’s noteasy to use, which is something thecompany plans to improve.On Linux, installation is through aJava installer and, if done as a normaluser, the default is to install it in HOME/scheduler. If you install it asroot, it ends up in /usr/local/scheduler.After you install the product, you willfind a README file that recommendsyou don’t install the Job Scheduler asroot, but nothing in the Job SchedulerInstallation and Configuration guidementioned this. To avoid potential problems, I re-installed as a normal user.During the install, you are promptedfor the database type and connection information. Whether “database parameters” are the connection parameters oncethe database is running or connectionparameters to create the database is unclear. Unfortunately, it’s recommendedthat you create a database and user forthe Job Scheduler to use, but this is firstmentioned after all of the installationsteps. Creating the database by hand andassigning DB privileges before you startthe installation does the trick.The installation is fairly intuitive, butit takes a few minutes to create the database tables and complete. If you areplanning to install the scheduler on multiple machines with the same parameters, you are prompted to create an automated installation script after the end ofthe first installation.Regardless of whether you install theJob Scheduler as root or as a normaluser, you will need to start the schedulerby hand. Control of the Job Scheduler isdone like a typical rc script: HOME/scheduler/bin/jobscheduler.sh HOME/scheduler/bin/jobscheduler.shISSUE 975stopIf you want to start the scheduler automatically when the system boots, I suggest that you create a specific user justfor the Job Scheduler and then create anrc script that does an su to that user andstarts the scheduler. Should jobs need tobe run as root or another user with moreprivileges, you can set up an appropriatesudo environment.:i\Xk e AfYjTo create jobs, you can either edit theXML files directly or through the Job Ed- li\ )1 @eglkk e k \ jfliZ\ Zf[\ [ i\Zkcp ekf k \ k \ AfY [ kfi% li\ (1 K \ AfY JZ \[lc\i [ kfi L@%685startitor GUI. On Linux, run the jobeditor.shscript, which is located by default in /usr/local/scheduler/bin or HOME/scheduler/bin.As an example, consider the typicaltask of creating a backup of your systemconfiguration. For now, I’ll assume thatyou want to do this daily with the script/usr/local/bin/config backup.sh.First, start the Job Editor and selectNew Hot Folder Element Job (Figure1). On the first form, begin inputting thebasic information for your job. For thisexample, simply input the Job Name“Configuration Backup”; in the Job Titlefield, add a description or leave it blank.In the left-hand panel, click Execute toinput the details about the program orscript you want to start. Because youwant to execute an external script, selectthe radio button Run executable andinput the complete path to the scriptnamed above. Here, you also can defineadditional parameters that are passed tothe script. For example, if the backup isto compress the files it backs up, youmight add a -c here.Also, you could include the individualexecution steps by selecting the Scriptradio button and the type of programcode, then inputting the source code inthe appropriate box. Note that this ismore than the name of a script and caninclude programming constructs basedon the language you select (Figure 2).To save the job, press the Save buttonor choose Save from the File menu. TheDECEMBER 2008

Open Source Job Schedulerfirst time you save the file, you areprompted to input the file name. To doso, navigate to the ./config/live/ directory and save the file as ConfigurationBackup, and the .xml extension will beadded automatically.Because you saved the file into the livedirectory, it is immediately visible to thesystem. This directory is pre-defined as aHot Folder, which the system reads regularly. At this point, you have not scheduled the job but simply added it to thesystem. To start the job, you need to runthe Operations GUI by pointing yourbrowser to http://localhost:4444.When you connect from the browser,you are presented with a GUI similar tothat in Figure 3. To see the details of thejob, double-click on your backup job inthe left-hand column. To run the job immediately, click on the Job menu buttonand select Start task now.Because you could easily start thescript from the command line, this ispretty unspectacular. If you go back intothe Job Editor and click the Run Timeentry in the left panel, you can select atime when this job should run. To do so,click on Everyday and then define a newperiod by clicking the New Period button. For Start Time, input something like09:00 in the Single Start field. Now clickthe Save button and this new configuration is active – the job will now startevery day at 9:00am.In terms of creating jobs, so far theonly thing the Job Scheduler editor provides that cron does not is a nice GUI –but you have only scratched the surface.When you begin working with with jobchains, you will start to see the power ofjob scheduling.First, assume you have created a second job that does a database backup,which you want to run immediately afterthe configuration backup has completed.One alternative is to create a single scriptthat first does the configuration backupand then immediately starts the database backup. However, job chains comein handy in many more complex situations that you cannot simply implementwith a single shell script, which I discusslater. For simplicity’s sake, I’ll stick withthese first two jobs.As with the first job, create a new HotFolder element, but select Job Chains.Here, you input the Chain Name and, ifdesired, a Title (i.e., a description). Be-JPJ8;D@E li\ *1 K \ AfY JZ \[lc\i fg\iXk fej L@%cause each element in a chain is referredto as a Node, you need to add a NewChain Node next by clicking the respective button. If you know the name of thejob, you can input it manually or use theBrowse button to search for the job.In the State field, you can define astate for this step or job node. By defining states, you can define a more complex job flow. For example, you coulddefine a state named “Error,” and if oneof the steps encounters an error, you immediately jump to that job, skipping allof the other jobs. Also, you could havedifferent error jobs run for each of thevarious steps.Although creating a job chain consisting of just a single job has certain advantages, don’t stop there. As I mentioned,you want a backup job chain consistingof two steps, so you’ll create a secondchain node with the database backupjob and configure it similarly to the second job. In the example here, you defined the first Node as state Start and thesecond as state End, although this is notreally necessary.When you click on the Save button,you will see the new chain under the JobChains tab in the Operations GUI. Clicking on the Show Jobs checkbox displaysthe individual jobs in your chain. Double-clicking on the chain opens the de-DECEMBER 2008tails panel on the right-hand side, as itdid with the single job.At this point, the job chain is still notgoing to be run because it needs itsmarching orders, which you can createmanually by selecting the Job Menu button on the right and selecting Add Order.A new window pops up and allows youto define various characteristics of theorder, such as the order ID, start time,and even the state to which the chainshould jump. In this example, just leaveeverything blank, and the system willcreate an order ID for you.If you did not define any conditions,the chain starts immediately; however,you could have defined a Time Slot foreither of these jobs and the schedulerwould wait until that time was reachedbefore starting the job.Note that the jobs must be able to accept orders to react to them. This step isdone in the configuration window forthe individual jobs. In the Main Optionswindow is an On Order radio button thatneeds to be set to Yes; otherwise, theorder will not start the job.So far, you have started everythingmanually, more or less. Because youneed to define an order in which to startthe job chain, you obviously need a wayto create orders dynamically. One waywould be to create an order at a specificISSUE 9769

JPJ8;D@EOpen Source Job Schedulertime, which in turn triggers the jobchain. As you might expect, this is donethough the menu New Hot Folder Element Order. After you name the orderand, in the Job Chains window, selectthe specific job chain you want to associate with this order. Note that an ordercan only be associated with a singlechain. Then define a new Time Periodand a Single Start period of 09:00. Toactivate the changes immediately, youneed to store it in the config/live/ directory when you save it.When you return to the OperationsGUI, you will see that an order is nowassociated with the job chain you created. Below the order, a next start entryshows the date and time that this orderwill start. Because it is already after9:00am in this example, the date is tomorrow. Had you used a different starttime, such as the first of the month, thenext start would be on that date.:fekifcc e JkXik K d\jIn defining the time period for jobs,chains, and orders, you have a couple ofchoices. First, you can define a specifictime slot in which a job can start – forexa

Problems with job scheduling and commercial software licensing have not gone unnoticed by the open source com-munity. One amazing solution is the Open Source Job Scheduler, developed by Software- und Organisations-Service (SOS) GmbH in Berlin, Germany. Ver-sions are available for