Introduction#

This lesson is a brief introduction to TOAST: how data is represented in memory and how to build processing workflows. First we import some packages we will use in this notebook.

# Built-in modules
import sys
import os
from datetime import datetime

# External modules
import numpy as np
import matplotlib.pyplot as plt
import astropy.units as u

# For interactive visualization of observations
import ipywidgets as widgets
from IPython.display import display, clear_output
import plotly.graph_objects as go

# TOAST
import toast
import toast.widgets as tw

# Capture C++ output in the jupyter cells
%load_ext wurlitzer

# Display inline plots
%matplotlib inline
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 12
      9 import astropy.units as u
     11 # For interactive visualization of observations
---> 12 import ipywidgets as widgets
     13 from IPython.display import display, clear_output
     14 import plotly.graph_objects as go

ModuleNotFoundError: No module named 'ipywidgets'

Runtime Environment#

The toast module can be influenced by a few environment variables, which must be set before importing toast:

help(toast)
toast?

You can get the current TOAST runtime configuration from the “Environment” class.

env = toast.Environment.get()
print(env)

The logging level can be changed by either setting the TOAST_LOGLEVEL environment variable to one of the supported levels (VERBOSE, DEBUG, INFO, WARNING, ERROR, CRITICAL) or by using the set_log_level() method of the Environment class. The maximum number of threads is controlled by the standard OMP_NUM_THREADS environment variable.

Data Model#

The basic data model in a toast workflow consists of a set of Observation instances, each of which is associated with a Focalplane on a Telescope. Note that a Focalplane instance is probably just a sub-set of detectors on the actual physical focalplane. These detectors must be co-sampled and likely have other things in common (for example, they are on the same wafer or are correlated in some other way). For this notebook, we will manually create these objects, but usually these will be loaded / created by some experiment-specific function.

MPI is completely optional in TOAST, although it is required to achieve good parallel performance on systems with many (e.g. 4 or more) cores. Most of the parallelism in TOAST is MPI process-based, not threaded. In this section we show how interactive use of TOAST can be done without any reference to MPI. In a separate notebook in this directory we show how to make use of distributed data and operations in parallel workflows.

# Start by making a fake focalplane

from toast.instrument_sim import (
    fake_hexagon_focalplane,
    plot_focalplane,
)

focalplane_pixels = 7 # (hexagonal, pixel zero at center)
field_of_view = 10.0 * u.degree
sample_rate = 10.0 * u.Hz

focalplane = fake_hexagon_focalplane(
    n_pix=focalplane_pixels,
    width=field_of_view,
    sample_rate=sample_rate,
    epsilon=0.0,
)
# Make a plot of this focalplane layout.

detpolcol = {
    x: "red" if x.endswith("A") else "blue" for x in focalplane.detectors
}

plot_focalplane(
    focalplane=focalplane,
    width=1.3 * field_of_view,
    height=1.3 * field_of_view,
    show_labels=True,
    pol_color=detpolcol
)
# Now make a fake telescope

telescope = toast.Telescope(name="fake", focalplane=focalplane, site=toast.SpaceSite(name="L2"))

Now that we have a fake telescope created, we can create an observation:

# Make an empty observation

samples = 10

ob = toast.Observation(
    toast.Comm(),
    telescope, 
    name="2020-07-31_A", 
    n_samples=samples
)

print(ob)

Here we see our observation simply has the starting information we passed to the constructor. Next we will discuss the 3 types of data objects that can be stored in an Observation: detector data products, shared telescope data, and arbitrary metadata.

Metadata#

By default, the observation is empty. You can add arbitrary metadata to the observation- it acts just like a dictionary.

hk = {
    "Temperature 1": np.array([1.0, 2.0, 3.0]),
    "Other Sensor": 1.2345
}

ob["housekeeping"] = hk

print(ob)

Metadata like this is not synchronized in any way between processes. A user or Operator can put any keys here to store small data objects.

Detector Data#

Detector data has some unique properties that we often want to leverage in our analyses. Each process has some detectors and some time slice of the observation. In the case of a single process like this example, all the data is local. Before using data we need to create it within the empty Observation. Here we create a “signal” object for the detectors. The detector data is accessed under the detdata attribute of the observation:

FIXME: talk about naming conventions

Here we create and initialize to zero some detector data named “signal”. This has one value per sample per detector and each value is a 64bit float.

ob.detdata.create("signal", dtype=np.float64)
print(ob.detdata)
print(ob.detdata["signal"])

You can create other types of detector data, and there is some shortcut notation that can be used to create detector data objects from existing arrays. For example:

# This takes an existing N_detector x N_sample array and creates from that

some_data = 3.0 * np.ones(
    (
        len(ob.local_detectors), 
        ob.n_local_samples
    ),
    dtype=np.float32
)

ob.detdata["existing_signal"] = some_data
print(ob.detdata["existing_signal"])
# This takes one detectors-worth of data and replicates it to all detectors
# while creating a new data object.

ob.detdata["replicated"] = 5 * np.ones(ob.n_local_samples, dtype=np.int32)
print(ob.detdata["replicated"])
# You can also create detector data objects from a dictionary
# of single-detector arrays
other = dict()
for i, d in enumerate(ob.local_detectors):
    other[d] = i * np.ones(ob.n_local_samples, dtype=np.int32)

ob.detdata["other_signal"] = other
print(ob.detdata["other_signal"])

By default you will get detector data with one element per sample and float64 dtype. However, you can specify the shape of each detector sample:

# Example of data with different shape

ob.detdata.create("pointing", sample_shape=(4,), dtype=np.float32)
print(ob.detdata["pointing"])

Details of Detector Data#

In the commands above we created named data objects and each one seems to contain an array for each detector. However, this container actually allocates memory in a single block, and you can slice the object both in the detector and sample direction. For example:

# Access one detector by name
ob.detdata["signal"]["D0A"] = np.arange(samples, dtype=np.float64)

# Access one detector by index
ob.detdata["signal"][1] = 10.0 * np.arange(samples, dtype=np.float64)

# Slice by both detector and sample
ob.detdata["signal"][["D2A", "D2B"], 0:2] = 5.0

print(ob.detdata["signal"])
# Access the whole thing as a 2D array
print(ob.detdata["signal"][:])

Shared Data#

Many types of data are common to multiple detectors. Some examples would be telescope pointing, timestamps, other sensor data, etc. When running in parallel we want to have just one copy of this data per node in order to save memory. The shared data is accessed under the “shared” attribute of the observation. For this serial notebook, you will not need to worry about the details of communicators, but when running in parallel it becomes important.
For this serial notebook, the shared attribute will look very much like a dictionary of numpy arrays. See the “parallel” intro notebook for more examples of using shared data when each observation is distributed across a grid of processes.

# Shared data that is common to multiple detectors is shared across each "column"
# of the process grid within a group.

# Some fake timestamps:
ob.shared.create_column("times", (ob.n_local_samples,)) # Defaults to float64
ob.shared["times"][:] = np.arange(ob.n_local_samples, dtype=np.float64)
print(ob.shared["times"])

# Create and initialize to zero some boresight quaternions
ob.shared.create_column("boresight_radec", shape=(ob.n_local_samples, 4), dtype=np.float64)
print(ob.shared["boresight_radec"])

You can see that the data objects are a special “MPIShared” object from the pshmem package. Shared data objects can be read with slicing notation just like normal numpy arrays:

print(ob.shared["boresight_radec"][:])

However, they are intended to be “write once”, “read many” objects. You cannot simply assign data to them. The reason is that the data is replicated across nodes and so setting array values must be a collective operation.

nullquat = np.array([0.0, 0.0, 0.0, 1.0])
full_data = np.tile(nullquat, ob.n_local_samples).reshape((-1, 4))

# In the serial case, simple assignment works just like array assignment
ob.shared["boresight_radec"][:, :] = full_data

# When running with MPI, the set() method avoids some communication
ob.shared["boresight_radec"].set(full_data, fromrank=0)

print(ob.shared["boresight_radec"])

Intervals#

Each Observation may contain one or more “interval lists” which act as a global (within the observation) list of time / sample ranges where some feature of the data is constant. Interval lists support sample-wise inversion, intersection and union operations using the standard python bitwise operators (^, &, and |).

Intervals are not intended to act as individual sample quality flags. Per-sample flags should be created either as a shared timestream (for flags common to all detectors) or as a detector data object (for per-detector flags). Intervals can be used to represent things changing less frequently, for example: left or right moving telescope scans, satellite repointing maneuvers, calibration measurements, etc.

A single Interval consists of a time and a (local) sample range:

? toast.Interval
# The observation starts with no lists of intervals

ob.intervals

To add a new interval list, use the create() method. Remember, in this notebook we have only one process, so do not have to worry about which process this information is coming from:

help(ob.intervals.create)

Here we create one list of intervals. We specify the time ranges and the local array of timestamp values. Inside the code, the timestamps are used to convert these input time ranges into Interval objects.

ob.intervals.create("good", [(1.5, 3.5), (4.5, 6.), (7., 8.5)], ob.shared["times"])
# Now there is one interval list in the observation

print(ob.intervals)
# The create method converted the time ranges into actual Interval instances:

print(ob.intervals["good"])

Now create another list of intervals:

ob.intervals.create("stable", [(0.5, 2.5), (3.5, 5.), (6., 7.5)], ob.shared["times"])
print(ob.intervals)

As mentioned before, we can combine these in different ways:

ob.intervals["stable-and-not-good"] = ob.intervals["stable"] & ~ob.intervals["good"]

print(ob.intervals)
print(ob.intervals["stable-and-not-good"])
ob.intervals["not-stable-or-not-good"] = ~ob.intervals["stable"] | ~ob.intervals["good"]

print(ob.intervals)
print(ob.intervals["not-stable-or-not-good"])

Views#

Typically when defining data intervals in the last section it is because you want to do something with only the data falling in those sample ranges. Each observation has the ability to provide a “view” into the detector and shared data given by a previously defined interval list. Views are created on the fly on first access and are deleted automatically if the underlying interval is deleted. First, examine a view of the “good” interval list we defined in the previous section:

print(ob.view["good"])

The string represention of a view is just a list of sample slices. However, the real power is that we can get a view of any of the observation detdata or shared objects. For example, we could get a view of the detector signal data. Recall that the full data for this is:

ob.detdata["signal"][:]

A view of the signal data falling in the “good” intervals is:

ob.view["good"].detdata["signal"][:]

This view is a list of arrays which have sliced the data in the time direction. These are not copies- they provide read/write access to underlying buffer. If you are doing many operations with a view it is easier to name it something else:

sng = ob.view["stable-and-not-good"]
sng.detdata["signal"]

Again, we can use a view to assign data to a subset of the full samples:

sng.detdata["signal"] = 7.0

print(ob.detdata["signal"][:])

We can access shared data as well with this view, but it is read-only from the view (the set() method of the shared objects or a collective assignment must be used to modify shared data):

ob.view["good"].shared["boresight_radec"]
sng.shared["boresight_radec"]

Data Container#

The Observation instances discussed previously are usually stored as a list inside a top-level container class called Data. This class also stores the TOAST MPI communicator information. For this serial example you can just instantiate an empty Data class and add things to the observation list:

test_data = toast.Data()

print(test_data)

print(test_data.obs)

Obviously this Data object has no observations yet. We’ll fix that in the next section!

Processing Model#

The TOAST processing model consists of Operator class instances running in a sequence on a subset of data. These sequences could be nested within other sequences (see the Pipeline operator below).

The Operator base class defines the interfaces for operators working on data. Operators are configured by defining class traits (attributes) which can be set during construction. An operator has an exec() method that works with Data objects (potentially just a subset of the data). Operators also have a finalize() method which is designed to do any final calculations after all passes through the timestream data are done. We will start by looking at the SimSatellite operator to simulate fake telescope scan strategies for a generic satellite. We can always see the options and default values by using the standard help function or the ‘?’ command:

from toast import ops

?ops.SimSatellite

You can instantiate a class directly by overriding some defaults:

simsat = ops.SimSatellite(
    num_observations=2, 
    observation_time=5 * u.minute,
)

print(simsat)

If you are using multi instances of an operator in your pipeline with different configurations, then you should also pass a unique “name” to the constructor. This allows keeping the operators distinct when using config files (see more below):

other_simsat = ops.SimSatellite(
    name="other_simsat",
    num_observations=2, 
    observation_time=5 * u.minute,
)

print(other_simsat)

After the operator is constructed, the parameters can be changed directly. For example:

from toast.schedule_sim_satellite import create_satellite_schedule

simsat.telescope = telescope
simsat.schedule = create_satellite_schedule(
    prefix="test_",
    mission_start=datetime(2023, 2, 23),
    observation_time=0.5 * u.hour,
    gap_time=0 * u.second,
    num_observations=3,
    prec_period=90 * u.minute,
    spin_period=10 * u.minute,
)

print(simsat)

And now we have an Operator that is ready to use. This particular operator creates observations from scratch with telescope properties generated and stored. We can create an empty Data object and then run this operator on it:

# This is equivalent to single call to "exec()" with all processes,
# and then a call to "finalize()".

simsat.apply(test_data)
print(test_data)

For this trivial case, we use the apply() method of the operator, which simply calls exec() once and then finalize(). When running a more complicated pipeline, the exec() method might be called multiple times on different detector sets (for example) before calling finalize().

Observation Quicklook#

There is an interactive “widget” we can use to quickly look at one of our observation contents. Try to create some more plot tabs using the button on the bottom of the summary tab.

tw.ObservationWidget(test_data.obs[0])

Pipelines#

TOAST includes a special operator (the Pipeline class), which is designed to run other operators (including other Pipeline instances. The purpose of this operator is to run sequences of other operators over sets of detectors to reduce the memory cost of intermediate products and / or to group together operators that support the use of accelerators to avoid memory copies to the host system.

? ops.Pipeline

As an example, we can create two simple operators and put them in a pipeline:

simsat = ops.SimSatellite(
    telescope=telescope
)
simsat.schedule = create_satellite_schedule(
    prefix="test_",
    mission_start=datetime(2023, 2, 23),
    observation_time=5 * u.minute,
    gap_time=0 * u.second,
    num_observations=2,
    prec_period=90 * u.minute,
    spin_period=10 * u.minute,
)

default_noise = ops.DefaultNoiseModel()
pipe = ops.Pipeline(
    operators=[simsat, default_noise]
)

Now we can start with an empty Data object and run the pipeline on it:

data = toast.Data()

pipe.apply(data)

print(data)

You can see here that the same satellite simulation was run, and then a default noise model (using the focalplane properties in each observation) was created.

Configuration of Operators#

Operators are configured through class traits which can be passed as keyword arguments to the constructor. We can also dump information about these traits (name, type, help string) to an intermediate config dictionary and then write that to files in TOML or JSON format. These config dictionaries can also be used to instantiate operators directly.

import toast.config as tc

import tempfile
from pprint import PrettyPrinter

pp = PrettyPrinter(indent=1)

tmpdir = tempfile.mkdtemp()
toml_file = os.path.join(tmpdir, "test.toml")
json_file = os.path.join(tmpdir, "test.json")

As an example, we can take a previous operator and look at the “round trip” from class or instance, to a config dictionary, to a file, and back into creating a new operator instance from that:

# This gives us the config for an existing instance

conf = other_simsat.get_config()
pp.pprint(conf)
# This gives us the default config values for a class

default_conf = ops.SimSatellite.get_class_config()
pp.pprint(default_conf)
tc.dump_toml(toml_file, conf)
tc.dump_json(json_file, conf)

Now we can see what this config looks like dumped to TOML and JSON:

!cat {toml_file}
!cat {json_file}

And then we can load the config back in to a dictionary:

newconf = tc.load_config(toml_file)
pp.pprint(newconf)

Finally, we can create new instances of operators from this config dictionary:

run = tc.create_from_config(newconf)
print(run)

Now we access our new operator and use it:

new_simsat = run.operators.other_simsat
print(new_simsat)

Running the Test Suite#

TOAST includes extensive tests built in to the package. Running all of them takes some time, but you can also run just one test by specifying the name of the file in the toast/tests directory (without the “.py” extension):

import toast.tests

# Run just a couple simple tests in toast/tests/env.py
toast.tests.run("env")
# Now run **ALL** the (serial) tests
# toast.tests.run()