HOW-TO Ingest raw data into the database

Before any data can be processed by the Astro-WISE system, it must be ingested. Ingestion in this case means: splitting the images, storing these on the fileserver, and making an entry in the database. This chapter describes the necessary preparations for ingestion, and the ingestion process itself.

Preparations for the ingest

The images to be ingested should first be identified regarding their intended use. See Table 1 in HOW-TO Schedule Astro-WISE compliant observations.

For locally stored, uncompressed copies of each image proceed as follows (if the images are compressed, decompress them first):

  • Identify the files by collecting relevant header items. This can be done by using something like gethead <header items> *.fits. However, if there are too many files, the shell will refuse to expand the command. In this case, use foreach (csh) or for (bash) instead, and append the output to a file:

    > foreach i ( *.fits )
    >> headers.txt
    foreach? end
    Explanation: **foreach** is a c-shell looping construct.
    **gethead** is a **wcstools** program to get header items from
    FITS files. Hence for each FITS file a few relevant header items
    are read and appended to the file "headers.txt". Wcstools is most
    likely installed on your system. Website:
  • Group the images by the purpose for which these have been observed. This grouping is based on the header information retrieved in the previous step. For example:

    > grep "BIAS, READNOISE" headers.txt > readnoise.txt
    > grep "FLAT, DOME" headers.txt > domes.txt
    > grep "FLAT, SKY" headers.txt > twilight.txt
    > grep "STD, ZEROPOINT" headers.txt > photom.txt
    > grep "NGC" headers.txt > science.txt

    This will be easy if the guidelines for scheduling observations given in HOW-TO Schedule Astro-WISE compliant observations have been followed.

  • Use, for example, the editor vim to remove anything but the file names from the text files produced in the previous step:

    > vim bias.txt

    Then type “:” to enter command mode (Esc cancels):

    ": %s/fits.*/fits"

    This is a regular expression that will search and replace each occurrance of “fits\(<\)something\(>\)” with “fits”. “%” means for all lines, “s” is for subsitute, and the “/“‘s are to separate the search and replace expressions, “.*” matches one or more characters of any kind.

  • You now have files containing a list of FITS filenames (one per line) named after the purpose for which the data was obtained. Now move the FITS files (or links) to subdirectories named after this purpose, for example:

    > mkdir READNOISE
    > foreach i (`cat readnoise.txt`)
    foreach? mv $i READNOISE
    foreach? end

That is it for the preparation. There are, of course, many ways to do this preparation, but this way is quite fast for any number of files.

Tips and possible complications

It may be helpful, especially when trying to ingest many files, to place links to the location of the raw MEF files in your current working directory:

> foreach i ( /ome03/users/data/WFI/2004/10/*.fits )
foreach? ln -s $i
foreach? end

In case the files are compressed with the common Unix compression programs gzip, zcat or bzip2 just make the links to the compressed files in the same way:

> foreach i ( /ome03/users/data/WFI/2004/10/*.fits.Z )
foreach? ln -s $i
foreach? end

Now we have links to all the files you want to ingest in your current working directory.

In case the images are compressed with common compression algorithms, you could work as follows:

> foreach i ( *.fits.bz2 )
foreach? dd bs=500k count=1 if=$i | bzip2 -qdc > hdr.txt
foreach? echo -n "$i " >> headers.txt
>> headers.txt
foreach? end

(Explanation: dd reads one block of size 500k from the input file $i. The ouput is decompressed by bzip2 and redirected to an ascii file. You can use gethead on this file again to get the header items. Output is appended to the same file “headers.txt”.)

Other commands that may be of use:

> fgrep [-v] -f <file1> <file2>  -- Print difference between files (diff works
much slower on large files).
> wc <file>                      -- Word count

Ingesting data

The actual ingestion of the data is handled by a  called, which can be found in $AWEPIPE/astro/toolbox/ingest. If your username is AWJSMITH the recipe is invoked from the Unix command line with the following command:

env project='AWJSMITH' awe $AWEPIPE/astro/toolbox/ingest/ -i <raw data> -t <type> [-commit]

where is one or more file names (for example WFI*.fits), and of the data to be ingested. Setting the environment variable project ensures that the data is ingested in that project. See HOW-TO Use your Astro-WISE Context to set data scopes for a description of context. To get a list of all possible values for the -t parameter, just type (without arguments):

awe $AWEPIPE/astro/toolbox/ingest/

Example for read noise data:

> env project='AWJSMITH' awe $AWEPIPE/astro/toolbox/ingest/ -i *.fits -t readnoise -commit

Example for science data:

> foreach i (*.fits)
foreach? env project='AWJSMITH' awe $AWEPIPE/astro/toolbox/ingest/ -i $i -t science -commit
foreach? end

Important note: due to the nature of the ingestion script, this last command can only be used for lists of individual science images.

The input data of the ingest script should be in the form of Multi-Extension FITS files (MEFs); most wide-field cameras write the data from their multi-CCD detector block in this form. The ingestion step splits an MEF file into its extensions, creates objects (a software construct) for each extension, stores each extension separately on a dataserver, and then commits the object, with relevant header items connected to it, to the database. Note that each extension is still saved locally, so make sure there is enough free space in the location you are running the ingest script. After ingesting, the local copies of the FITS files can be removed. The commit switch is necessary to actually store/commit data; if it is not specified, nothing is written to the dataserver or committed in the database. Note that a log is generated of the ingest process. The log file is called something like .log.

Each file that is ingested needs to be named according to our filenaming convention. This means that the MEF file is named as follows:


Example: WFI.2001-02-13T01:02:03.123.fits

If the file to be ingested is not named according to this convention, a symbolic link with the correct name is created, and the image is ingested with that filename. Hence the ingested image may not retain its filename.