Data Reduction Concepts and Walk-throughs

This chapter is intended as an introduction and simple reference document for data processing. It will provide examples of how to use the system, but it will not show how to set up the different necessary components. In particular it is assumed the user has a working system, and has access to the database, a parallel computing cluster and his/her own checkout of the code. Once you have obtained the code, an environment variable called $AWEPIPE should point to the installation directory (awe). This variable is referenced repeatedly in the following text.

Processing steps

Several key points can be distinguished in the data reduction process:

  • Ingesting raw data into database/data-server
  • Producing calibration files
  • Producing calibrated science data (applying calibration files)
  • Coaddition of calibrated science data
  • Source extraction
  • User specific (much more so than previous steps at least)

This is also the order in which various pipelines (recipes) need to be run so that the necessary calibration files are present in the database. See sections The bias pipeline, The flat-field pipeline, The photometric pipeline, and The image pipeline for more specific information about the pipelines.

Ingesting raw data into the database

See HOW-TO Ingest raw data into the database.

The first step in data reduction is the ingestion of the raw data into the database. This is handled by a recipe called, which can be found in $AWEPIPE/astro/toolbox/ingest. The recipe is invoked from the Unix command line with the following command :

awe $AWEPIPE/astro/toolbox/ingest/ -i <raw data> -p <purpose> [-commit]

where is a list of input data (filenames) to ingest, and the purpose for which the input data was obtained. The input data should be be unsplit and uncompressed.

Data processing

This section will distinguish three general ways of using the system to do data reduction. The first, most laborious one, is to do this interactively, step-by-step at the Python prompt. While unsuitable to process a lot of data quickly, this gives a lot of insight into the inner workings of the code, and it may show the strengths of the system fairly evidently. It is also possible to use recipes to reduce data on a single machine. This is helpful while testing. Finally it is possible to use a parallel cluster, which is the most convenient way to process large volumes of data quickly.

Three characteristics of the Astro-WISE Environment (AWE) should be known prior to calibrating data with it.

  • The newest versions of calibration files present in AWE are considered best by default.
  • Each calibration file in the system has its individual period of validity associated with it, which is called its timestamp.
  • Any calibration file in the system can be flagged as invalid to ensure that it will not be used in data calibration. The name of the flag in the database is .

The implication of these characteristics for automated calibration with AWE can be illustrated well with an example. Assume that a science exposure is to be automatically calibrated. AWE will search for calibration files (e.g., BiasFrame, MasterFlatFrame, etc.) that have a timestamp which encompasses the time at which the exposure was taken and have not been invalidated. If, for example, one BiasFrame exists it will be used. If more than one exists, the last created one will be used (if none exist the system might try to construct one in some situations).

An instrument specific note: to facilitate automated calibration for data from the ESO Wide Field Imager (WFI), calibration files have been created which have timestamps encompassing the times of all WFI images. The creation date of these “forever valid” data is set to a date before any of the other calibration data present in the system. The calibration files are ReadNoise, BiasFrame, GainLinearity, HotPixelMap, ColdPixelMap, DomeFlatFrame, TwilightFlatFrame, MasterFlatFrame, FringeFrame, photometric zeropoints and exinction curves, and bandpass transformations. The filter dependent calibration files have been derived for the U (#841, #877), B (#842, #878), V (#843) , R (#844) and I (#845, #879) broad-band filters. Thus for the calibration of WFI science image through such filters, the calibration files will always be available. For optimal reduction of a specific science dataset one should keep in mind that these defaults might not represent the best calibration files.

Interactive processing

An example of interactive processing:

awe> from astro.main.BiasFrame import BiasFrame
awe> from astro.main.RawFrame import RawBiasFrame
awe> from astro.main.ReadNoise import ReadNoise
awe> rn ='WFI', chip='ccd50', date='2001-02-01')
awe> r ='WFI', date='2001-02-01', chip='ccd50')
awe> for raw in r: raw.retrieve()
awe> b = BiasFrame()
awe> b.raw_bias_frames = list(r)
awe> b.read_noise = rn
awe> b.process_params.OVERSCAN_CORRECTION = 1
awe> b.set_filename()
awe> b.make()
awe> b.commit()

This will select a readnoise calibration file from the database, raw bias frames for the night of January 2nd, 2001, retrieve the FITS files from the data-server, create a masterbias frame, then upload the image to the data-server (store) and finally commit its dependencies and meta-data to the database. Note that a process parameter was tweaked. Whenever there is a possibility to adjust parameters, this is done by changing their values in the associated -Parameters class designated by the process_params attribute of BiasFrame. See HOW-TO Configure process parameters for a more exhaustive explanantion of how process parameters are set in the system.

Note that in interactive processing you may be tempted to use loops such as the following wrong code:

awe> sl = SourceList()
awe> query ='*MYNAME*ccd52*.fits')
awe> for frame in query:
...      sl.frame = frame
...      sl.make()
...      sl.commit()

This is incorrect code in AWE because of the way objects are stored in the database (made persistent). A new instance of SourceList has to be created for every SourceList that you want to commit to the database. That is, the instantiation of the SourceList object should be done within the loop in this example:

awe> query ='*MYNAME*ccd52*.fits')
awe> for frame in query:
...      sl = SourceList()
...      sl.frame = frame
...      sl.make()
...      sl.commit()

Non-parallel processing

Python classes are available to do any of the calibration steps; these classes act as recipes. They are located in $AWEPIPE/astro/recipes/ and must be imported into the Python interpreter, and can then be ‘run’:

awe> from import DomeFlatTask
awe> task = DomeFlatTask(instrument='WFI', date='2000-04-28', chip='ccd50',
                         filter='#842', commit=1)
awe> task.execute()

It is possible to call the help file on these classes:

awe> help(DomeFlatTask)

A page containing docstrings, methods etc. defined in DomeFlatTask will be shown. Hit “q” to exit this page.

Parallel processing

See HOW-TO Process data in a distributed (parallel) way.

When the awe-prompt starts, an instance of the class “Processor” is automatically created and given the name “dpu” (Distributed Processing Unit). Using this class you can run tasks in parallel. Start a task as follows:

awe>'ReadNoise', d='2000-04-28', i='WFI', c='ccd50', oc=6, C=1)

Here the first argument is the task name, the possible arguments can be found in table [calibration_pipelines_atomic_tasks]. The other arguments are query arguments, “d” is the “date” at the start of the observing night for which this ReadNoise object is derived, “i” is the “instrument” identifier, “c” is the “chip” (CCD) identfier (omit this argument to process the data for all the CCDs of “instrument” simultaneously), “oc” is the “overscan” correction method, and “C” is the “commit” switch.

Any invalid input to the processor is caught and a usage message is printed. Also note that if you have a local checkout of the code, this code and any changes to it are sent to the DPU and used there.

The bias pipeline

See HOW-TO Calibrations: overview.

Recipes used:


The flat-field pipeline

See HOW-TO Calibrations: overview.

Recipes used:


The photometric pipeline

See HOW-TO Calibrations: overview.

Recipes that can be used:

  • (this recipe represents the OmegaCAM pipeline)

These recipes and their underlying Task classes are described in detail in the chapters dedicated to the photometric pipeline.

The image pipeline

See HOW-TO Image pipeline: overview.

The image pipeline is used to process raw science data and needs the outputs from the various calibration pipelines. The calibration steps performed are de-biasing, flatfielding, astrometric calibration and photometric calibration. The recipes that relate to the image pipeline are:


The Reduce recipe de-biases and flat-fields the raw science data. Astrometry can be done in two ways. First Astrometry derives a astrometric solution. After Astrometry has been run, it is possible to try to improve the astrometric solution by using overlap regions of all the images in a dither pattern. To this end GAstrometricSourceList creates a SourceList that may be used when running the GAstrom recipe. Regrid resamples a ReducedScienceFrame into a new grid of pixels, so that RegriddedFrames can be easily coadded into CoaddedRegriddedFrames.

Starting the image pipeline in single-CCD mode is quite easy. To reduce the data of a given raw science frame from the awe-prompt:

awe> r = ReduceTask(raw_filenames=['<input name>'], commit=1)
awe> r.execute()

where is the filename of the raw science frame to calibrate. Or:

awe> r = ReduceTask(instrument=<instrument>, date=<date>, filter=<filter>,
                    chip=<chip>, object_name=<object name>, commit=1)
awe> r.execute()

The recipe for running the image pipeline in parallel mode is the same one as used for running the calibration pipelines. In this case, however, the -task switch is set to either Reduce or Science. The command issued to run the pipeline is:

awe>'Reduce', i=<instrument name>, d=<date>, f=<filter name>,\
...          o=<object name>, C=<commit: 0 or 1>)

Raw images that are ingested into the database have an attribute “OBJECT”, which is matched to “object name” in the above statement. This OBJECT is the value of the header keyword OBJECT from the raw image. It is possible to use the wildcards “?” and “*” in the object name, which act similar to Unix command line wildcards.

Photometric calibration in the image pipeline

The photometric calibration in the image pipeline is achieved by writing the zeropoint and extinction information from calfile 563 into the header of the science frame. In order for this to work, these calfiles (obviously) have to be present in the database for every combination of chip and filter. The quick creation of these calfiles without having to run the photometric pipeline is decribed in the relevant chapters on the photometric pipeline.


For the smooth running of the image pipeline, some manual adjustments of the contents of the database are sometimes necessary. This is particularly true for the timestamping of the various calibration files, because the selection of the right calibration file depends on these timestamps.

Every calibration file has three timestamps, of which two determine the validity range of the file. These timestamps are timestamp_start, timestamp_end, and creation_date, respectively. The default timestamps that are created in the calibration pipeline are set to reflect the calibration plan of OmegaCAM. However, these timestamps are not really suited for ‘random’ sets of data, or for data which are not subjected to a rigorous calibration plan. It is therefore necessary to adjust the timestamps of the calfiles produced so that these fit the ‘observing schedule’ of the data at hand. This can be done using the database timestamp editor (see

Interfaces to other programs

SQL interface, interaction with the database

See HOW-TO Query the database from Python for information about this interface.

Eclipse interface

For image arithmetic the C library Eclipse is used. In order to use this library in AWE a Python wrapper/interface was written. There are three main classes used in the AWE in this interface: image, cube, and pixelmap, representing much used data structures. Here is an example of its use:

awe> import eclipse
awe> bias = eclipse.image.image('bias.fits')
awe> flat = eclipse.image.image('flat.fits')
awe> sci = eclipse.image.image('science.fits')
awe> result = (sci-bias) / flat

Note that in the above example “science.fits” is a trimmed image that has to be equal in shape and size to the bias and flat. Master bias and master flat files retrieved from the database are trimmed, while raw science data is not. Also note that a new header is created for “result” in the example above. You may want to keep the header of the science image though:

awe> hdr = eclipse.header.header('science.fits')
awe>'sci_red.fits', hdr)

NOTE: Eclipse headers can be used at this low level, but for compatibility and advanced functionality like header verification, AWE uses DARMA headers based on the PyFITS interface (see for more details).

Regions can be cut from images:

awe> region = result.extract_region(1,1,100,100)

If you specify the header for saving here it will adjust values such as “NAXIS1” and “NAXIS2” to reflect the real size of the image.

Statistics can be calculated in the following way (assume that “coldpixels.fits” is an 8-bit pixelmap FITS file locating cold pixels):

awe> coldpixels = eclipse.pixelmap.pixelmap('coldpixels.fits')
awe> mask = ~coldpixels
awe> stats = result.stat_opts(pixelmap=mask, zone=[1,1,100,100])
awe> stats.median

Note the bitwise negation operator (\(^\sim\)) to switch between “masks” (bad pixels 0) and “flags” (bad pixels 1). A mask is optional for calculating the statistics.

Images (i.e. image objects) can be stacked in a cube:

awe> b1 = eclipse.image.image('bias1.fits')
awe> b2 = eclipse.image.image('bias2.fits')
awe> b3 = eclipse.image.image('bias3.fits')
awe> c = eclipse.cube.cube([b1,b2,b3])
awe> med_av = c.median()

Other functionalities such as Fourier transforms, image filtering, etc. are supported. For further information, import eclipse in Python and use the help functionality provided by Python (see HOW-TO Documentation).

SWarp interface

SWarp is an image coaddition program, that performs pixel remapping, projections etc. This interface is very straightforward, as it simply writes a configuration file such as used by this program (similar to SExtractor) and then calls the program itself.

awe> from astro.external import Swarp
awe> from astro.main.Config import create_config
awe> swarpconfig = create_config('swarp')
awe> swarpconfig.COMBINE = 'N'
awe> swarpconfig.RESAMPLE = 'Y'
awe> files = ['file1.fits', 'file2.fits', 'file3.fits']
awe> Swarp.swarp(files, config=swarpconfig)

The first argument of Swarp.Swarp is a list of files to be SWarped. The second is the optional Config object whose options can be set in multiple ways (see below), including the direct setting as shown above.

SExtractor interface

SExtractor is used to extract sources from images. While this is handled by the Catalog class, one can also call the SExtractor interface directly.

awe> from astro.external import Sextractor
awe> from astro.main.Config import create_config, create_params
awe> sexconf = create_config('sextractor')
awe> sexconf.set_from_keys(DETECT_THRESH=2.0, CATALOG_NAME='')
awe> sexparams = create_params('sextractor')
awe> sexparams.update_list(['FLUX_ISOCOR'])
awe> sci = 'sci_1.fits'
awe>, params=sexparams, config=sexconf)

In general, the first argument of is the detection (and measurement) image, the second is an optional measurement image, the third is possible extra output parameters other than those specified in the interface in the form of a Parameters object (from astro.main.Config). The final argument is the configuration (a Config object), which can be updated from separate keyword arguments in KEYWORD1=’value1’, KEYWORD2=’value2’, etc. format as shown in the ‘set_from_keys’ call.

The catalog is a FITS table. Here follows an example of one way to work with the data in

awe> import pyfits
awe> hdu ='')
awe> flux_isocor = hdu[2].data.field('FLUX_ISOCOR')

The use of the Catalog class is discouraged, but explained below for completeness:

awe> from astro.main.Catalog import Catalog
awe> from astro.main.BaseFrame import BaseFrame
awe> cat = Catalog(pathname='')
awe> cat.frame = BaseFrame(pathname='sci_1.fits')
awe> cat.sexparam = ['FLUX_ISOCOR']
awe> cat.sexconf['DETECT_THRESH'] = 2.0
awe> cat.make()

The above can be extended with:

awe> cat.make_skycat()

This will make a skycat catalog called “mycatalog.scat”, which can be overlayed on the FITS image (“sci_1.fits”) when using ESO’s Skycat viewer.

LDAC interface

LDAC (Leiden Data Analysis Center) tools are used in the system to do tasks such as astrometry and photometry. In particular, these tools provide a way to manipulate and associate binary FITS catalogs. Hence catalogs as created in the previous section can be manipulated from Python with the LDAC interface.

awe> from astro.external import LDAC
awe> incat = ''
awe> outcat = ''
awe> ldac = LDAC.LDAC()
awe> ldac.filter(incat, outcat, table_name='OBJECTS', sel='FLUX_RADIUS > 5.0')

These few lines will filter a catalog in file “” so that only astrophysical objects with a half-light radius larger than 5.0 pixels are placed in the output catalog. Note that LDAC is very picky about the syntax of the “sel” selection statement, so be careful here.

A short example


When reducing data, instrumental footprints are removed from the science data and it is calibrated astrometrically and photometrically. This is done by what we call the image pipeline. This short demo skips the creation of the calibration data (biases, flat-fields etc.) and shows how to reduce science data, and then find and inspect the results in the database.

Start the awe-prompt by typing awe.

The image pipeline

In this case we want to reduce data observed on the night of April, 28, 2000, on ESO’s 2.2m WFI telescope, using the Johnson B filter (identifier: #842). This information is necessary to give on the command line for the data to be found in the database:

awe>'Reduce', d='2000-04-28', i='WFI', f='#842', C=1)

The job will be submitted to the queue. Check the DPU web page (Groningen). Wait for the jobs to finish. Logs of the processing can be retrieved as follows:

awe> dpu.get_logs()

Finding the result in the database

Now we want to check the database for the created files, and obtain the reduced images to check the result:

awe> query ='WFI', date='2000-04-28',
                                        filter='#842', chip='ccd50',
awe> for frame in query: print(frame.filename, frame.quality_flags,
...                  ,,
...                            frame.OBJECT, frame.creation_date)
Sci-DEMO-WFI-----#842-ccd50---Sci-53256.5263356.fits 0 ccd50 #842 CDF4_B_3
2004-09-08 12:38:06.00

The last item is the creation date of the ReducedScienceFrame, here you can check that the ReducedScienceFrame(s) selected include those that were just made. It is possible to fully track the history of each image this way.

Do not close the awe-prompt at this point (see next section).

Retrieving the images to check the results

It is now possible to download the images from the data-server(s). This can be done after selecting the images after doing the above query:

awe> q = ReducedScienceFrame.filename == 'Sci-DEMO-WFI-----#842-ccd50---Sci-532
awe> frame = q[0]
awe> frame.retrieve()

Note that the result of the query is in the form of a list, even if the result of the query is only one object. Hence obtain the first element and retrieve the image. The image can now be viewed with your favourite FITS viewer.

A lengthy example

This section will recap the preceding ones to show how to proceed from the point of having a data tape to a question such as: give me a plot of half-light radius versus magnitude for the objects in this field. As data set for this example, 1/4th of the Capodimonte Deep Field (B filter) is used. This area has a size of approximately 30’ \(\times\) 30’ (the WFI field of view). The observations for this data set were done on the 28th of April, 2000.

Ingesting (skip in case of demo, process on local machine)

It is assumed that we have copied all the data for this date from tape (presumably) to hard disk, so that it is located in for example /Users/users/data/. It is now necessary to know the type of (raw) data for each file (bias, twilight flat, dark etc.). The data set needs to be ingested into the database: the multi-extension FITS files are split into single CCD parts and stored on the data-server, and RawFrame objects are created in the database.

Since the total amount of files is considerable, it is convenient to list these by type in ascii files. In our case a file called bias_files.txt containing the bias file names looks like:


These biases can be ingested into the database by looping over this file as follows:

unixprompt>foreach i ( `cat bias_files.txt` )
foreach? awe $AWEPIPE/astro/toolbox/ingest/ -i $i -t bias -commit
foreach? end

Repeat (use option -t dome and -t twilight) for the dome- and twilight flats. The raw calibration data should now be present in the database and data-server as RawBiasFrame, RawDomeFlatFrame and RawTwilightFlatFrame instances.

One can check this by using the database viewer (

Image calibration files

Assuming everything went well, we are ready to start creating calibration files. This will be done using a parallel computing cluster.

Use the following command to create read noise objects, which are necessary to create master biases:

awe>'ReadNoise', i='WFI', d='2000-04-28', C=1)

Check your DPU queue webpage to view the status of your job and wait for them to finish. Then use the following command to create master biases for each CCD:

awe>'Bias', i='WFI', d='2000-04-28', C=1)

Once a job is finished the log file for it can be obtained from the DPU:

awe> dpu.get_logs()

Once the biases finish the other necessary calibration steps need to be performed (in this order) as follows:

awe>'HotPixels', i='WFI', d='2000-04-28', C=1)
awe> # Wait for jobs to finish (check web page for queue)
awe>'DomeFlat', i='WFI', d='2000-04-28', f='#842', C=1)
awe> # Wait for jobs to finish (check web page for queue)
awe>'ColdPixels', i='WFI', d='2000-04-28', f='#842', C=1)
awe> # Wait for jobs to finish (check web page for queue)
awe>'TwilightFlat', i='WFI', d='2000-04-28', f='#842', C=1)
awe> # Wait for jobs to finish (check web page for queue)
awe>'MasterFlat', i='WFI', d='2000-04-28', f='#842', C=1)
awe> # Wait for jobs to finish (check web page for queue)

Photometric calibration files

In order to run the image pipeline and do photometry, a PhotometricParameters object is required. There are two ways to proceed here, as described in the sections below.

Manual values (essentially no photometry)

For relative photometric calibration in the case of WFI observations, values for the zeropoint and extinction can be entered manually:

awe $AWEPIPE/astro/toolbox/photometry/ -z 25.00 -c ccd50
                                                             -f #842 -e 0.19
                                                             -start 2000-01-01
                                                             -end 2000-12-31
awe $AWEPIPE/astro/toolbox/photometry/ -z 24.95 -c ccd51
                                                             -f #842 -e 0.19
                                                             -start 2000-01-01
                                                             -end 2000-12-31

where the option “z” is the zeropoint and “e” the extinction. Repeat this for each CCD (ccd50-ccd57) in the WFI detector. With these default PhotometricParameters objects in place it is possible to run the image pipeline (§[impipe]).

Using standard star fields (absolute photometry)

See HOW-TO Make a photometric source catalog, HOW-TO Photometric pipeline(2): transformation tables, and HOW-TO Photometric pipeline(3): extinction and zeropoint for detailed instruction on how to do accurate photometry. This is outside the scope of this example.

Image pipeline

Now that all calibration files that are necessary have been produced, we can continue by applying all these to the science data. This is done by running a recipe that represents the so-called image pipeline:

awe>'Reduce', i='WFI', d='2000-04-28', f='#842', o='CDF4_B_?', C=1)

The data used in the case of this example consists of 10 dithered exposures that we intend to coadd into one image. The above example will select RawScienceFrames from the database, using in particular the “like” functionality of the SQL interface in selecting for matches of the OBJECT header keyword, and applies the calibration data. This results in 80 ReducedScienceFrames that are stored in the database. In addition these reduced science frames are resampled to a new grid. The grid centers for this system are fixed so that pixels in these RegriddedFrames can be combined without resampling again first.

After the job completes there should be 80 new ReducedScienceFrames and 80 new RegriddedFrames in the database. One can check this from the AWE/Pythoninterpreter as follows:

awe> s ='WFI', date='2000-04-28',
...                                 filter='#842')
awe> len(s)

If this search turns up more than 80 science frames as above, this means other data has been reduced (possibly by other persons) for this filter and for this night. To get a better idea of what is present in the database for this night one could proceed as follows:

awe> for f in s: print(f.raw.filename,,,

(press enter when prompted with ‘…’ to close the statement block in the above loop)


The RegriddedFrames created in the previous step can be coadded into a single mozaic, to form the intended contiguous region on the sky. In order to coadd the data, one can do the following:

awe>'Coadd', i='WFI', d='2000-04-28', f='#842', o='CDF4_B_?', C=1)

A lot of files now need to be retrieved from the data-server, namely all the RegriddedFrames and all WeightFrames associated with these. After processing finishes, you should now have a nice image of 1/4th of the Capodimonte Deep Field.

Source lists

See HOW-TO SourceLists in the Astro-WISE System.

To make a source list of the image we made above, where the information of the sources is available from the database, one can do the following:

awe> sl = SourceList()
awe> query ='Sci*Coadd*.fits')
awe> sl.frame = query[0]
awe> sl.frame.retrieve()
awe> = 'DEMO-sourcelist'
awe> sl.sexconf.DETECTION_THRESHOLD = 2.0
awe> sl.sexparam = ['MAG_AUTO', 'MAGERR_AUTO', 'FLUX_RADIUS']
awe> sl.make()
awe> sl.commit()

One can now select the source list from the database and check its global properties or even specific information about the sources in the source list.

awe> query = == 'DEMO-sourcelist'
awe> sl = query[0]
awe> len(sl.sources)