Setting up recurrent routines : Jobs

Setting up recurrent routines : Jobs#

Sen2Chain uses jobs to execute whole processing operations (downloading L1C, computing L2A with Sen2Cor, masking clouds, and producing indices) on any tile. All of Sen2Chain’s functions parameters can be specified, allowing for the production of multiple products in one go.

Jobs can be launched once or at scheduled hours using crontab.

Jobs config files are stored in your ~/sen2chain_data/config/jobs/.

Each job is made of 2 files :

  • job_jid.cfg to configure the job

  • job_jid.py automatically created once job is configured

With jid the job identifier who could be any combination of number or character string.

Job config file contains global job parameters and tasks.

A task only targets a single Tile, meaning if you want to download and process multiple tile products you need to have the equal number of task lines in your configuration file.

We recommend manually editing the configuration file if you plan to set up a job with multiple Tiles, as editing with the command line only is for 1 task at a time.

Job listing#

The Jobs class is used to list all jobs created in your Sen2Chain install.

>>> from sen2chain import Jobs
    >>> Jobs()
               job_id  config_file  python_script logging       timing cron_status cron_timing
    0  0123456789         True           True   False    0 5 * * *      absent        None
    1         335         True           True   False  10 10 * * *      absent        None
    2         012         True          False   False    * * * * *      absent        None
    3         tes         True          False   False    * * * * *      absent        None
    ```

Jobs can be removed using the remove function and their jid identifier

>>> Jobs().remove("335")
    10094:2022-03-17 17:03:35:INFO:sen2chain.jobs:Removing Python script...
    10094:2022-03-17 17:03:35:INFO:sen2chain.jobs:Removing config file...

Job#

Create a new Job#

To create a new Job or select an existing one use the command line Job(jid="jid")

>>> from sen2chain import Job
>>> j=Job(jid="test")
>>> j.save()

This command creates a configuration file in ~/sen2chain_data/config/jobs/ :

logs = True
timing = 0 20 * * *
provider = cop_dataspace
tries = 2
sleep = 4
nb_proc = 18
copy_L2A_sideproducts = False
clean_before = True
clean_after = True

tile;date_min;date_max;max_clouds;provider;filter_max_pb;download;compute_l2a;cloudmasks;indices;remove;comments
40KCB;;today;80;cop_dataspace;True;l2a;True;CM004-CSH1-CMP1-CHP1-TCI1-ITER1;NDVI/NDWIGAO/MNDWI/NDRE/IRECI/BIGR/BIRNIR/BIBG/EVI/NBR;l2a;Reunion

The configuration file first section is a list of global parameters for the job execution :

  • # logs: True | False

  • # timing: in cron format

  • # tries: the number of times the download should loop before stopping, to download OFFLINE products

  • # sleep: the time in min to wait between loops

  • # nb_proc: the number of cpu cores to use for this job, default 8

  • # copy_l2a_side_products: to duplicate msk_cldprb_20m and scl_20m from l2a folder to cloudmask folder after l2a production.

  • # Interesting if you plan to remove l2a to save disk space, but want to keep these 2 files for cloudmask generation and better extraction

  • # Possible values: True | False

  • # clean_before: set to False or True to call a Libray.clean(remove=True) on selected tiles

  • # before starting the job, default True

  • # clean_before: set to False or True call a Libray.clean(remove=True) on selected tiles

  • # after finishing the job, default True

  • # process_by_line: split job execution into n lines for processing of large datasets

The second section of the configuration file is a list of tasks that will be processed on a single Tile when the Job is executed :

  • # tile: tile identifier, format ##XXX, comment line using ! before tile name

  • # date_min the start date for this task, possible values: empty (2015-01-01 will be used) | any date | today-xx (xx nb of days before today to consider)

  • # date_max the last date for this task, possible values: empty (9999-12-31 will be used) | any date | today

  • # max_clouds: max cloud cover to consider for downloading images, computing l2a, cloudmask and indice products

  • # provider: the provider where to get the data (l1c/l2a), possible values: empty (cop_dataspace will be used) | cop_dataspace | other

  • # filter_max_pb : True if you want to download only products with the maximum processing baseline, then you don’t have to indicate the pb_min and pb_max. True | False

  • # pb_min the minimum processing baseline for this task, possible values: empty (0 will be used) | any positive float number

  • # pb_max the maximum processing baseline for this task, possible values: empty (98 will be used) | any positive float number

  • # download: product type to download: l1c|l2a|False

  • # compute_l2a: computing l2a using sen2chain / sen2cor: True | False

  • # cloudmasks: the cloudmask(s) to compute and use to mask indice(s). Possible values range from none (False) to multiple cloudmasks: empty or False | CM001/CM002/CM003-PRB1-ITER5/CM004-CSH1-CMP1-CHP1-TCI1-ITER0/etc.

  • # indices: empty or False | All | NDVI/NDWIGAO/etc.

  • # remove: used to remove L1C and L2A products, considering only new products (-new) (dowloaded or produced) or the whole time serie (-all) and with possible filtering to remove products above specified cloud couver (-ccXX) example of possible values: empty or False | l1c-new-cc80 | l1c-all | l2a-new | l1c-all/l2a-new-cc25

  • # comments: free user comments, ie tile name, etc.

Configure Job#

To configure a job with a large number of tasks on different tiles, we recommend manually editing the configuration file nano ~/sen2chain_data/config/jobs/job_jid.cfg

Make sure to keep the same table structure. A job can also be configured with command lines.

Add a task to a job config file with the task_add() method :

>>> from sen2chain import Job
>>> j=Job(jid="jid")
INFO:sen2chain.jobs:Reading existing config...
>>> j.task_add()

Edit a task with task_edit(task_id, **kwargs) :

>>> j.task_edit(task_id=0,tile='40KEC',remove='l1C')

Remove a task with task_remove(task_id)

>>> j.task_remove(task_id=0)

Save and Launch a Job#

Save the config file to your local database. If the job you created is not saved, you will not be able to load it next time.

>>> j.save()

To launch a job directly from the command line :

>>> j.run()

Job in cron#

You can add, deactivate and delete a job in cron with cron_enable(), cron_disable() and cron_remove().

Cron will run the job at the frequency specified by the cron parameter (minute - hour - day - month - weekday) in job_jid.cfg.

>>> j.save()
>>> j.cron_enable()
>>> j.cron_disable()
>>> j.cron_remove()