HOW-TO Query the database from Python

In order to query data in the database or commit data to it, an interface to SQL (Structured Query Language, the standard command driven query interface for databases) was written. Using the interface it is possible to query the database from Python scripts or the Python interpreter and obtain complete Python objects, with their entire history in the form of their normal hierarchy intact. Conversely Python objects with their entire history intact can be committed to the database for later retrieval. The interface supports a number of often used SQL constructs that are described in the following subsections.

General syntax, comparison operators, AND and OR

A database query from Python generally has this structure:

awe> query = <class>.<attribute> <comparison operator> <value>

Where class can be any DBObject (objects that are stored in the database) and attribute can be any attribute of any DBObject and the hierarchy of a DBObject can be followed as deep as it goes. Comparison operators can be all the usual: ==, !=, >, >=, <, <= (equal to, not equal to, greater than, greater than or equal, smaller than, smaller than or equal respectively).

Note that a list of persistent properties (those properties that can be queried on in the database) can be obtained for all ProcessTargets (use the class, NOT an instance of the class) as follows:

awe> BiasFrame.get_persistent_properties()
['chip', 'creation_date', 'filename', 'globalname', 'imstat', 'instrument',
'is_valid', 'object_id', 'observing_block', 'prev', 'process_params',
'process_status', 'quality_flags', 'raw_bias_frames', 'read_noise',
'timestamp_end', 'timestamp_start']

and

awe> RawScienceFrame.get_persistent_properties()
['AIRMEND', 'AIRMSTRT', 'DATE', 'DATE_OBS', 'EXPTIME', 'LST', 'MJD_OBS',
'OBJECT', 'OBSERVER', 'UTC', 'astrom', 'chip', 'extension', 'filename',
'filter', 'globalname', 'imstat', 'instrument', 'is_valid', 'object_id',
'observing_block', 'overscan_x_stat', 'overscan_y_stat', 'prescan_x_stat',
'prescan_y_stat', 'process_status', 'quality_flags', 'raw_fits_data',
'template']

So, an example of a query is:

awe> query = RawScienceFrame.EXPTIME >= 300.0

or

awe> query = RawTwilightFlatFrame.imstat.median < 30000.0

Queries can be comprised of multiple parts separated by AND (&) or OR (|) operators, where the different parts are between parentheses:

awe> q = (RawScienceFrame.OBJECT == 'ngc6822')
awe> q = (RawScienceFrame.OBJECT == 'ngc6822') | \
         (RawScienceFrame.OBJECT == 'ngc 6752 - Field')
awe> q = (RawScienceFrame.OBJECT == 'ngc6822') & \
         (RawScienceFrame.chip.name == 'ESO_CCD_#65')

Note that the backslashes at the end of the lines only indicate that the command continues on the next line. Lengths of queries (number of results) can be obtained using the Python len function:

awe> len(q)
110

Attributes of the obtained objects can be printed as follows:

awe> for f in q: print(f.filename, f.OBJECT, f.filter.name, f.chip.name, f.EXPTIME)
...
OMEGACAM.2012-06-16T05:51:25.429_1.fits ngc6822 OCAM_u_SDSS ESO_CCD_#65 580.0
OMEGACAM.2012-06-16T06:46:22.166_1.fits ngc6822 OCAM_u_SDSS ESO_CCD_#65 580.0
OMEGACAM.2012-07-26T02:30:04.947_1.fits ngc6822 OCAM_H_ALPHA ESO_CCD_#65 580.0
OMEGACAM.2012-07-19T03:17:24.651_1.fits ngc6822 OCAM_g_SDSS ESO_CCD_#65 580.0
OMEGACAM.2012-06-16T06:25:32.663_1.fits ngc6822 OCAM_u_SDSS ESO_CCD_#65 580.0
OMEGACAM.2012-06-16T05:30:37.246_1.fits ngc6822 OCAM_u_SDSS ESO_CCD_#65 580.0
OMEGACAM.2012-06-16T07:07:10.329_1.fits ngc6822 OCAM_u_SDSS ESO_CCD_#65 580.0
OMEGACAM.2012-06-15T07:23:26.839_1.fits ngc6822 OCAM_H_ALPHA ESO_CCD_#65 580.0
OMEGACAM.2012-06-15T07:33:51.550_1.fits ngc6822 OCAM_H_ALPHA ESO_CCD_#65 580.0
OMEGACAM.2012-07-26T02:19:39.575_1.fits ngc6822 OCAM_H_ALPHA ESO_CCD_#65 580.0
etc.

It is also possible to construct a query object without a query clause. This query can be used to iterate through all objects of the specific class.

awe> query = <class>.select_all()

Using wildcards (like)

It is possible to use wildcards in particular when selecting using strings. Wildcards are implemented as they are in the common UNIX shells, (? for any character, * for any number of characters).

awe> q = (RawScienceFrame.instrument.name == 'OMEGACAM') & \
         (RawScienceFrame.OBJECT.like('ngc*'))

awe> q = RawScienceFrame.filename.like('OMEGACAM.2013-06-02T06:51:04.629_?.fits')
awe> for f in q: print(f.filename, f.OBSERVER, f.DATE_OBS, f.filter.name, f.EXPTIME)
...
OMEGACAM.2013-06-02T06:51:04.629_1.fits UNKNOWN 2013-06-02 06:51:04 OCAM_r_SDSS 600.0
OMEGACAM.2013-06-02T06:51:04.629_2.fits UNKNOWN 2013-06-02 06:51:04 OCAM_r_SDSS 600.0
OMEGACAM.2013-06-02T06:51:04.629_3.fits UNKNOWN 2013-06-02 06:51:04 OCAM_r_SDSS 600.0
OMEGACAM.2013-06-02T06:51:04.629_4.fits UNKNOWN 2013-06-02 06:51:04 OCAM_r_SDSS 600.0
OMEGACAM.2013-06-02T06:51:04.629_5.fits UNKNOWN 2013-06-02 06:51:04 OCAM_r_SDSS 600.0
OMEGACAM.2013-06-02T06:51:04.629_6.fits UNKNOWN 2013-06-02 06:51:04 OCAM_r_SDSS 600.0
OMEGACAM.2013-06-02T06:51:04.629_7.fits UNKNOWN 2013-06-02 06:51:04 OCAM_r_SDSS 600.0
OMEGACAM.2013-06-02T06:51:04.629_8.fits UNKNOWN 2013-06-02 06:51:04 OCAM_r_SDSS 600.0
OMEGACAM.2013-06-02T06:51:04.629_9.fits UNKNOWN 2013-06-02 06:51:04 OCAM_r_SDSS 600.0

Querying list attributes (contains)

If an attribute points to a list the method contains can be used to query for elements in this list. The elements in this list can be simple types like int, float and string, or persistent classes. The method contains accepts a singe element or a list of elements. In case of a list all elements in this list must be present in the queried attribute. The order of the elements in the list is not taken into account.

awe> q1 = CoaddedRegriddedFrame.instrument.name == 'OMEGACAM'
awe> regrid1 = q1[0].regridded_frames[0]
awe> regrid2 = q1[0].regridded_frames[1]
awe> q2 = CoaddedRegriddedFrame.regridded_frames.contains(regrid1)
awe> len(q2)
2
awe> q2 = CoaddedRegriddedFrame.regridded_frames.contains( [regrid1, regrid2] )
awe> len(q2)
2

First the CoaddedRegriddedFrames are found which have regrid1 in their regridded_frames attribute. Then all CoaddedRegriddedFrames which have both regrid1 and regrid2.

awe> q = AstrometricParameters.FITPARMS.contains(0.0023721800000000002)
awe> len(q)
2
awe> q = AstrometricParameters.FITPARMS.contains([-0.0023634099999999998, -5.41757e-06])
awe> len(q)
1

In the above examples the FITPARMS attribute of AstrometricParameter are queried. First for a single value, then a list.

Ordering by attribute values (order_by)

It is possible to order a query by one of the attributes of the objects. Note that this alters the returned list.

awe> q = (RawScienceFrame.OBJECT == 'ngc6822') & \
         (RawScienceFrame.chip.name == 'ESO_CCD_#88')
awe> for f in q: print(f.filename, f.DATE_OBS, f.filter.name, f.EXPTIME)
OMEGACAM.2012-06-16T06:46:22.166_28.fits 2012-06-16 06:46:22 OCAM_u_SDSS 580.0
OMEGACAM.2012-06-16T05:51:25.429_28.fits 2012-06-16 05:51:25 OCAM_u_SDSS 580.0
OMEGACAM.2012-07-19T03:17:24.651_28.fits 2012-07-19 03:17:24 OCAM_g_SDSS 580.0
OMEGACAM.2012-06-16T06:25:32.663_28.fits 2012-06-16 06:25:32 OCAM_u_SDSS 580.0
OMEGACAM.2012-07-26T02:30:04.947_28.fits 2012-07-26 02:30:04 OCAM_H_ALPHA 580.0
OMEGACAM.2012-06-16T07:07:10.329_28.fits 2012-06-16 07:07:10 OCAM_u_SDSS 580.0
etc.
etc.
awe> for f in q.order_by('DATE_OBS'): print(f.filename, f.DATE_OBS, f.filter.name, f.EXPTIME)
OMEGACAM.2012-06-02T06:26:52.601_28.fits 2012-06-02 06:26:52 OCAM_u_SDSS 580.0
OMEGACAM.2012-06-02T06:37:18.057_28.fits 2012-06-02 06:37:18 OCAM_u_SDSS 580.0
OMEGACAM.2012-06-02T06:47:41.774_28.fits 2012-06-02 06:47:41 OCAM_u_SDSS 580.0
OMEGACAM.2012-06-15T07:23:26.839_28.fits 2012-06-15 07:23:26 OCAM_H_ALPHA 580.0
OMEGACAM.2012-06-15T07:33:51.550_28.fits 2012-06-15 07:33:51 OCAM_H_ALPHA 580.0
OMEGACAM.2012-06-15T07:44:16.252_28.fits 2012-06-15 07:44:16 OCAM_H_ALPHA 580.0
etc.
etc.

Ordering returning maximum, minimum (max, min)

It is possible to select from a selection of objects the one with the maximum or minimum of a particular attribute:

awe> q = (RawScienceFrame.OBJECT == 'ngc6822') & \
         (RawScienceFrame.chip.name == 'ESO_CCD_#88')
awe> ma = q.max('EXPTIME')
awe> print(ma.DATE_OBS, ma.filter.name, ma.EXPTIME)
2013-05-07 08:59:37 OCAM_g_SDSS 600.0
awe> mi = q.min('EXPTIME')
awe> print(mi.DATE_OBS, mi.filter.name, mi.EXPTIME)
2012-06-16 06:46:22 OCAM_u_SDSS 580.0
awe> latest = q.max('DATE_OBS')
awe> print(latest.filename, latest.DATE_OBS, latest.EXPTIME)
OMEGACAM.2013-07-02T04:43:48.792_28.fits 2013-07-02 04:43:48 600.0

Querying project specific data (project_only)

It is possible to restrict the results of a query to objects of the currently set project or a specific project. The following example first shows the length of a query for all the public data, then for the currently set project and last for a specific project. Note that the project_only method is sticky, it will affect future usage of the query object.

awe> context.set_project('ALL')
awe> q = CoaddedRegriddedFrame.instrument.name == 'OMEGACAM'
awe> len(q)
213
awe> len(q.project_only())
0
awe> len(q.project_only('KIDS'))
204

First the project ALL is set, then all CoaddedRegriddedFrames are queried which have as instrument name OMEGACAM. Then only those specific to project ALL, and finally those visible from project ALL, but in project KIDS. Instead of the project name the (numerical) project id can also be used to indentify the project.

If you set the environment variable PROJECT_ONLY to a project name or id then all queries will use the project_only method automatically with this project. If set to True then the current project will be used, and if set to False the environment variable will be ignored.

Querying user specific data (user_only)

It is possible to restrict the results of a query to objects created by the current or a specific user. The following example first shows the length of a query for all the public data, then for the current user and last for a specific user. Note that the user_only method is sticky, it will affect future usage of the query object.

awe> from common.database.Database import database
awe> context.set_project('ALL')
awe> print(database.username().upper())
'AWEHELMICH'
awe> q = CoaddedRegriddedFrame.instrument.name == 'OMEGACAM'
awe> len(q)
213
awe> len(q.user_only())
9
awe> len(q.user_only(user='AWJMCFARLAND'))
204

First the project ALL is set and the current user is printed. Then all CoaddedRegriddedFrames are queried which have as instrument name OMEGACAM Then only the RegriddedFrames created by the current user are printed. And finally the RegriddedFrames created by AWJMCFARLAND are shown. Instead of the user name the (numerical) user id can also be used to indentify the user.

If you set the environment variable USER_ONLY to an user name or id then all queries will use the user_only method automatically. If set to True then the current user will be used, and if set to False the environment variable will be ignored.

Querying privileges specific data (privileges_only)

It is possible to restrict the results of a query to objects having specific privileges. When an argument is omitted the current privileges are used, otherwise the specified. The following example first shows the length of a query for all the visible data, then for the current privileges (1) and last for privileges of 5. Note that the privileges_only method is sticky, it will affect future usage of the query object.

awe> context.set_project('ALL')
awe> context.set_privileges(1)
awe> q = CoaddedRegriddedFrame.instrument.name == 'OMEGACAM'
awe> len(q)
213
awe> len(q.privileges_only())
9
awe> len(q.privileges_only(5))
0

First the project ALL is set and the current privileges are set to 1. Then all CoaddedRegriddedFrames are queried which have as instrument name OMEGACAM. Then only those with privileges of 1 are printed. Finally those with privileges 5 are shown.

If you set the environment variable PRIVILEGES_ONLY to a privileges number then all queries will use the privileges_only method automatically. If set to True then the current privileges will be used, and if set to False the environment variable will be ignored.

Project favourite (project_favourite)

The project_favourite flag is intended to favor the (calibration) data owned by the project above data from other projects. This is implemented by adjusting the order in which results from a query are returned. The creation_date will still be used to get the newest version, but if data is present in the current project that will be used instead of (possible) newer data from other projects.

The project_favourite flag can be enabled in two ways; on query level and environment level. To make a query project_favourite call the project_favourite method on the query. To make all queries project_favourite set the Environment setting PROJECT_FAVOURITE to True. The default setting is not to use project_favourite.

The following example shows the usage of the project_favourite on the command line. It shows the maximum creation_date of all BiasFrames in the KIDS project, and then the maximum creation_date of all BiasFrames in the current project (ALL) :

awe> context.set_project('KIDS')
awe> q = BiasFrame.instrument.name == 'OMEGACAM'
awe> q.max('creation_date').creation_date
datetime.datetime(2014, 8, 21, 7, 37, 58)
awe> context.set_project('ALL')
awe> q.project_favourite().max('creation_date').creation_date
datetime.datetime(2014, 1, 16, 15, 50, 33)

The select method, quicker queries

Note that this method automatically ignores invalid data

Constructing queries as above can be a somewhat verbose affair. To facilitate easier querying for which less input is necessary, the select method has been implemented for all ProcessTargets. The above queries can be written for example as follows:

awe> q = RawScienceFrame.select(instrument='OMEGACAM', chip='ESO_CCD_#65', object='ngc6822')

For a complete list of possible arguments of the select method see its docstring:

awe> help(DomeFlatFrame.select)
Help on method select in module astro.main.ProcessTarget:

select(cls, **searchterms) method of astro.database.DBMeta.DBObjectMeta instan
ce
    Class method to select RawFrames, Calfiles and ReducedScienceFrames
    from the database.

    Syntax example:

    s = RawScienceFrame.select(instrument='WFI', filter='#842',
        chip='ccd50', time_from='2000-01-02 04:45:46',
        time_to='2000-01-02 05:03:00')

    Possible search terms:
    ----------------------
    chip        - select of the same CCD ('ccd50', 'ccd51', etc.)
    date        - select of the same date (i.e. date at the start of
                  observing night, in yyyy-mm-dd format)
    exptime     - select frames with similar exposure time
                  (EXPTIME-0.8sec to EXPTIME+0.8 sec)
    extension   - select (raw) frames for a certain extension of its
                  RawFitsData object
    filename    - select a frame(!) by its filename
    filter      - select of the same filter ('#842', '#843', etc.)
    instrument  - select of the same instrument ('WFI', 'WFC', 'OCAM')
    object      - select for OBJECT header keyword, uses "like"
                  functionality, which allows wildcards "*" and "?"
    time_from   - precise form of date, in yyyy-mm-dd hh:mm:ss format
    time_to     - required when using time_from

More examples

Question: How do I query using dates? Answer: In general you need to make a datetime object specifying an exact time in UTC for your date to be recognized. All times and dates in the database are in UTC.

awe> date = datetime.datetime(2014,7,1)
awe> query = RawDomeFlatFrame.DATE_OBS > date
awe> query = (ReducedScienceFrame.creation_date > date) &\
...  (ReducedScienceFrame.creation_date < date+datetime.timedelta(1))

In other cases dates are not datetime objects, in particular when given as arguments to methods or objects. In these cases the dates are meant as the starting date of a night. A night is defined as the period between noon on one day and noon the next day. This concept is used to define whether or not calibration files are applicable to a given set of science images.

awe> task = ReduceTask(date='2014-06-10', instrument='OMEGACAM', \
                       filter='OCAM_r_SDSS', chip='ESO_CCD_#77)
awe> bias = BiasFrame.select(date='2014-07-05', instrument='OMEGACAM', \
                             chip='ESO_CCD_#96')

The final query, when not using the select method looks like this:

awe> midnight = datetime.datetime(2014,7,5) + datetime.timedelta(1)
awe> instrument = (Instrument.name == 'OMEGACAM')[0]
awe> midnight = instrument.convert_local_to_ut(midnight)
awe> query = (BiasFrame.timestamp_start < midnight) & \
             (BiasFrame.timestamp_end > midnight)
awe> query &= (BiasFrame.instrument.name == 'OMEGACAM')
awe> query &= (BiasFrame.chip.name == 'ESO_CCD_#96')
awe> bias = query.max('creation_date')

Question: When can I use wildcards in queries? Answer: When using the “like” method and only for strings, or in Tasks and the select method in the object argument:

awe> query = RawDomeFlatFrame.filename.like('OMEGACAM.2014-08-11*')
awe> task = ReduceTask(date='2014-08-10', instrument='OMEGACAM', object='KIDS*')
awe> query = ReducedScienceFrame.select(object='KIDS*', chip='ESO_CCD_#65')

Suppose you’ve just processed a significant amount of data, and are then interested in finding out some properties that you know are stored in the database. How do you get this information?

Question: Give me all image statistics (for example median values) of all OMEGACAM raw bias frames of a particular CCD, observed between two dates:

awe> q = (RawBiasFrame.instrument.name == 'OMEGACAM') & \
         (RawBiasFrame.chip.name == 'ESO_CCD_#77') & \
         (RawBiasFrame.DATE_OBS > datetime.datetime(2014,7,1)) & \
         (RawBiasFrame.DATE_OBS < datetime.datetime(2014,7,10))
awe> for f in q.order_by('DATE_OBS'): print(f.filename, f.imstat.median)
etc.
etc.
OMEGACAM.2014-07-02T11:20:14.159_21.fits 256.0
OMEGACAM.2014-07-02T11:20:56.539_21.fits 256.0
OMEGACAM.2014-07-02T11:37:41.991_21.fits 256.0
OMEGACAM.2014-07-02T11:38:24.101_21.fits 256.0
OMEGACAM.2014-07-03T10:54:22.714_21.fits 263.0
OMEGACAM.2014-07-03T10:55:04.814_21.fits 263.0
OMEGACAM.2014-07-03T10:55:48.834_21.fits 263.0
etc.
etc.

A very long list of filename, median pixel value pairs will be printed on screen. (You can abort with Ctrl-C.)

Question: Give me all the RawScienceFrames for the OMEGACAM instrument, ccd #88, filter r and for object starting with “NGC”.

awe> query = (RawScienceFrame.instrument.name == 'OMEGACAM') &\
             (RawScienceFrame.chip.name == 'ESO_CCD_#88') &\
             (RawScienceFrame.filter.name == 'OCAM_r_SDSS') &\
             (RawScienceFrame.OBJECT.like('NGC*'))
awe> for f in query: print(f.filename, f.instrument.name, f.chip.name, \
...  f.filter.name, f.OBJECT, f.EXPTIME)
OMEGACAM.2011-10-30T07:35:44.043_28.fits OMEGACAM ESO_CCD_#88 OCAM_r_SDSS NGC 1399 280.0
OMEGACAM.2011-10-30T06:50:51.776_28.fits OMEGACAM ESO_CCD_#88 OCAM_r_SDSS NGC 1399 280.0
OMEGACAM.2011-10-30T06:45:26.873_28.fits OMEGACAM ESO_CCD_#88 OCAM_r_SDSS NGC 1399 280.0
OMEGACAM.2011-10-30T06:56:15.459_28.fits OMEGACAM ESO_CCD_#88 OCAM_r_SDSS NGC 1399 280.0
etc.
etc.

Question: Give me all RawTwilightFlatFrames observed between two points in time.

awe> query = RawTwilightFlatFrame.select(time_from='2014-08-18T16:00:00', \
...  time_to='2014-08-19T16:00:00')

Question: Select the most recent OMEGACAM MasterFlatFrame from the database, that is valid for the specified night.

awe> flat = MasterFlatFrame.select(instrument='OMEGACAM', date='2014-07-13', \
                                  filter='OCAM_g_SDSS', chip='ESO_CCD_#90')