Introduction to sdss_brain

sdss_brain provides a set of core of classes and helper functions to aid in the development of user-facing tools and interfaces. It combines the utility of other core SDSS packages, e.g. sdss-access, sdss-tree, sdssdb, sdsstools to enable a more streamlined and simplified SDSS user experience.

This package provides the following:

  • Multi-Modal data access with the MMAccess and Brain classes

  • Convenient starter tools for spectra with the Spectrum class

Multi-Modal Data Access System (MMA)

The MMAccess is a bare-bones class to be mixed with any other class. When mixed in, it adds MMA functionality to that class. The MMA provides three operating modes: auto, local, and remote.

  • auto: Automatically tries to load objects locally, and upon failure loads object remotely.

  • local: Load objects locally first from a database, and upon failure from a local filepath.

  • remote: Load objects remotely over an API.

Depending on the mode and the logic preformed, the MMA will load data from origin file, db, or api. See the Mode Decision Tree for a workflow diagram.

When subclassing MMAccess, there are several abstract methods that you must define. These methods are

  • _parse_inputs: Defines the logic to parse the input string into an object id or filename

  • _set_access_path_params: Defines parameters needed by sdss_access to generate filepaths

The Brain class is a convenience class that creates a basic object template with the MMAccess already applied. It also provides a repr and some placeholder logic to load objects based on the data_origin. When subclassing from Brain, there are several abstract methods that you must define.

  • _load_object_from_file: Defines the logic for loading a local file from disk

  • _load_object_from_db: Defines the logic for loading an object from a database

  • _load_object_from_api: Defines the logic for loading an object remotely over an API

The Brain and MMAccess are designed to build classes that contain valid entries in sdss_access. Multi-modal data access can still be provided to files without defined paths in sdss_access using the MMAMixIn class instead of MMAccess. The main difference is, when using the MMAMixIn class instead, you will need to define two additional abstract methods:

  • get_full_path: Returns a local filepath to a data file

  • download: Downloads a file from a remote location to a local path on disk

There exists a version of the Brain with the standard MMA mixed in. Sub-classing from BrainNoAccess will give you functionality of the Brain but without reliance on sdss_access paths.

Note

The MMA by itself does not contain the logic for accessing data from a filename, database, or over an API. That logic must be created by the user. Methods and classes containing default logic will be provided at a later time. The logic for the remote API access mode is not yet implemented. It will be unavailable until a SDSS API to serve data has been created.

Example Usage

Let’s step through the creation of new class to interface with MaNGA data cubes using the Brain convenience class, highlighting how to integrate the MMA into a new tool.

import re
from sdss_brain.core import Brain
from sdss_brain.helpers import get_mapped_version, load_fits_file
from sdssdb.sqlalchemy.mangadb.datadb import Cube

class MangaCube(Brain):
    _db = Cube
    mapped_version = 'manga' # set the release mapping key
    path_name = 'mangacube'  # set path name for sdss_access

    def _set_access_path_params(self):
        ''' set sdss_access parameters '''

        # set path keyword arguments
        drpver = get_mapped_version(self.mapped_version, release=self.release, key='drpver')
        self.path_params = {'plate': self.plate, 'ifu':self.ifu, 'drpver': drpver}

    def _parse_input(self, value):
        ''' parse the input value string into a filename or objectid '''

        # match for plate-ifu designation, e.g. 8485-1901
        plateifu_pattern = re.compile(r'(?P<plate>\d{4,5})-(?P<ifu>\d{3,5})')
        plateifu_match = re.match(plateifu_pattern, value)

        # create the output dictionary
        data = dict.fromkeys(['filename', 'objectid'])

        # match on plate-ifu or else assume a filename
        if plateifu_match is not None:
            data['objectid'] = value

            # extract and set additional parameters
            self.plateifu = plateifu_match.group(0)
            self.plate, self.ifu = plateifu_match.groups(0)
        else:
            data['filename'] = value
        return data

    def _load_object_from_file(self):
        self.data = load_fits_file(self.filename)

    def _load_object_from_db(self):
        pass

    def _load_object_from_api(self):
        pass

To set up database access for your tool, set the _db class attribute to an appropriate sdssdb database connection, ORM model, or ORM schema relevant for the tool. Since we’re creating a tool for MaNGA cubes, we use the datadb.Cube ORM model from the mangadb database from sdssdb. If there is no relevant database input to attach, leave the _db attribute blank. When a tool is instantiated with a valid database input, a DatabaseHandler is created. See Connecting to Database Objects for more information on what this means.

Next, we setup our tool to interface with sdss_access. To do so, we must specify the sdss_access path template name and keyword parameters needed to build complete file paths. The template name is set as a class attribute, a required string parameter path_name. The template keywords are set in the defined _set_access_path_params method for our tool, as a dictionary self.path_params. If neither the path_name nor path_params are set, errors will be raised. For MaNGA DRP cubes, the sdss_access name is mangacube, and it takes three keyword arguments, a plate id, an IFU designation, and the DRP version to define a complete filepath. To understand what the get_mapped_version function is doing, see version mappping.

We define the _parse_input method. This method defines the logic of determining what kind of input has been passed, either an object ID or a filepath. We add some logic to determine if the input string is a plate-IFU designation, otherwise we assume it is a filepath. This method must return a dictionary containing at minimum keys for either filename and objectid.

These two methods combine to instruct the Brain how to take a custom input “object id” and turn it into a valid filename path, database entry, or remote API call. There are convenience helpers available to simpify the boilerplate process of defining logic for _parse_input and _set_access_path_params. See Conveniences for the MMA for more information.

Finally we define the _load_object_from_file method to load FITS file data using a load_fits_file helper function. These methods can perform any number of tasks related to handling of said data. In this example, we keep it simple by only loading the data itself into the data attribute. The data attribute is a common attribute to store any data loaded from files, a db, or over the API. Note that we must define all abstract methods even if we aren’t ready to use them. Thus we also define placeholders for the api and db load methods.

Now that we have our class defined, let’s see it in use. We can explicitly load a filename.

>>> ff = '/Users/Brian/Work/sdss/sas/dr15/manga/spectro/redux/v2_4_3/8485/stack/manga-8485-1901-LOGCUBE.fits.gz'
>>> cube = MangaCube(filename=ff, release='DR15')
>>> cube
<MangaCube filename='/Users/Brian/Work/sdss/sas/dr15/manga/spectro/redux/v2_4_3/8485/stack/manga-8485-1901-LOGCUBE.fits.gz', mode='local', data_origin='file'>

The data_origin has been set to file and the mode is local. The Brain takes one direct argument as any “data_input”. It will attempt to determine if the input is a valid filename or an object id. We can provide the filename directly.

>>> ff = '/Users/Brian/Work/sdss/sas/dr15/manga/spectro/redux/v2_4_3/8485/stack/manga-8485-1901-LOGCUBE.fits.gz'
>>> cube = MangaCube(f, release='DR15')
>>> cube
<MangaCube filename='/Users/Brian/Work/sdss/sas/dr15/manga/spectro/redux/v2_4_3/8485/stack/manga-8485-1901-LOGCUBE.fits.gz', mode='local', data_origin='file'>

We defined the _parse_input method to instruct the Brain on what kind of “objectid” to expect, in this case a “plateifu” id designation, which is 4-5 digit plate id and and 3-5 digit IFU bundle number. Now we can directly input a “plateifu” as input. If we specified a database input to use during class definition, the default local action is to attempt to connect via the db.

>>> cube = MangaCube('8485-1901')
>>> cube
    <MangaCube objectid='8485-1901', mode='local', data_origin='db'>

The data_origin has been set to db and the mode is local. We can override the default database input we defined on our class with the use_db keyword during instantiation.

cube = MangaCube('8485-1901', use_db=mangadb)

Or we can ignore the database altogther with the ignore_db keyword. If you don’t have a database, it defaults to using local files. You can also turn off the database globally by setting the ignore_db option in your custom configuration.

>>> cube = MangaCube('8485-1901', ignore_db=True)
>>> cube
    <MangaCube objectid='8485-1901', mode='local', data_origin='file'>

Now the data_origin is set to file. If we don’t have the file locally, or we explicitly set the mode='remote', it uses the remote API.

>>> # explicitly set the mode to remote
>>> cube = MangaCube('8485-1901', mode='remote')
>>> cube
    <MangaCube objectid='8485-1901', mode='remote', data_origin='api'>

>>> # load a cube we don't have
>>> cube = MangaCube('8485-1902')
>>> cube
    <MangaCube objectid='8485-1902', mode='remote', data_origin='api'>

Now that we’ve seen how to create a tool, take a look at Convenience Tools for a set of starter tools to begin using, to start customizing with advanced science-specific features, or simply as alternative examples of how to create new tools.

Conveniences for the MMA

There are several conveniences available when developing a new tool using the Brain.

Decorators

A few class decorators are provided as a convenience to help reduce boilerplate code when creating new classes from the Brain. Available class decorators are:

  • access_loader: decorator to aid in defining _set_access_path_params

  • parser_loader: decorator to aid in defining _parse_input

  • sdss_loader: all-purpose loader combining the others

Using the sdss_loader decorator, we can rewrite the above example as

@sdss_loader(name='mangacube', defaults={'wave':'LOG'}, mapped_version='manga:drpver', pattern=r'(?P<plate>\d{4,5})-(?P<ifu>\d{3,5})')
class MangaCube(Brain):
    _db = mangadb

    def _load_object_from_file(self):
        pass

    def _load_object_from_db(self):
        pass

    def _load_object_from_api(self):
        pass

which effectively converts to the following:

class MangaCube(Brain):
    _db = mangadb
    mapped_version = 'manga'
    path_name = 'mangacube'

    @property
    def drpver(self):
        return get_mapped_version(self.mapped_version, release=self.release, key='drpver')

    def _set_access_path_params(self):
        ''' set sdss_access parameters '''

        keys = self.access.lookup_keys(self.path_name)
        self.path_params = {k: getattr(self, k) for k in keys}

    def _parse_input(self, value):
        ''' parse the input value string into a filename or objectid '''

        keys = self.access.lookup_keys(self.path_name)
        data = parse_data_input(value, regex=pattern, keys=keys)
        return data

with the following automatically added attributes, extracted from the parsed input and the sdss_access template keys:

self.plate - the extacted plate ID
self.ifu - the extract IFU bundle designation
self.wave - the default sdss_access key value set to "LOG"
self.parsed_group - a list of all matched group parameters extracted from the regex parsing function

The sdss_loader decorator is equivalent to stacking multiple decorators, for example

@access_loader(name='mangacube', defaults={'wave':'LOG'}, mapped_version='manga:drpver')
@parser_loader(pattern=r'(?P<plate>\d{4,5})-(?P<ifu>\d{3,5})')
class MangaCube(Brain):
    _db = mangadb

    def _load_object_from_file(self):
        self.data = load_fits_file(self.filename)

    def _load_object_from_db(self):
        pass

    def _load_object_from_api(self):
        pass

Regex Pattern Parser

To simplify the boilerplate code needed to determine the propert data input and parse an object identifier within the _parse_input method, there is a convenience function, parse_data_input which will attempt to determine the type of input and parse it using regex. It minimally returns a dictionary with keys filename and objectid. If the objectid can be further parsed to extract named parameters, it will include those parameters as key-values in the dictionary.

>>> # passing a filename to the parser
>>> parse_data_input('/path/to/a/file.txt')
    {'filename': '/path/to/a/file.txt', 'objectid': None, 'parsed_groups': None}

>>> # passing a custom regex pattern to parse an object id
>>> parse_data_input('8485-1901', regex=r'(?P<plate>\d{4,5})-(?P<ifu>\d{3,5})')
    {'filename': None, 'objectid': '8485-1901', 'plate': '8485', 'ifu': '1901', 'parsed_groups': ['8485-1901', '8485', '1901']}

To read more, see Parsing the Data Input Argument.