Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/master/docs/notebooks/external-data.ipynb


[ ]:
# this page contains live examples of casa usage from the following installation
!pip install casaconfig  > /dev/null

External Data

Each CASA distribution requires a runtime configuration and minimal repository of binary data for CASA to function properly. This is contained in a casarundata tar file which must be installed before CASA can be used. The data repository includes Measures Tables that deal with the Earth Orientation Parameters (EOPs), reference frames, ephemeris data, antenna configurations, beam models, and calibration corrections. In particular the EOPs include predictions for the near future which drift until they are well determined. The casaconfig module provides functions along with a default configuration file that properly sets up and maintains the data repository contents.

Once an initial data respository is installed CASA maintains this system automatically, placing small daily updates of measures data in the measurespath location. Occasional updates of the other parts of the data repository are updated as they become available. The default measurespath is ~/.casa/data. Site installations of CASA will typically set measurespath to a shared location that the site administrators maintain (automatic updates are then turned off by the site administrators). Users may modify or disable this functionality, including adding their own, personally maintained, measurespath and controlling which data is automatically updated (all of casarundata or just the measures data) by setting configuration values.

CASA now uses a hierarchy of configuration files. The default configuration values are first set by config.py in casaconfig. If a casasiteconfig.py is found then that is used next. Finally if the user has a personal config.py in ~/.casa then that is used. The full configuration is evaluated by importing config from casaconfig.

It is not necessary to set all of the configuration parameters in each of the optional configuration files. The configuration files can consist of any executable python. They should not depend on CASA modules since the configuration must be set before a CASA module can be used.

Monolithic CASA (the casashell module) has command line options that can also be used to change the configuration values. The command line option values always take precedence over any configuration value set in a config file. The location of the user’s configuration file can be set on the commandline. The commnad line can be used to skip both the site and user’s configuration files.

The following figure depicts a high level view of the external data management system, including the casarundata from the CASA site, modular pip package and installation, monolothic tarball and installation, and ASTRON ftp server.

casaconfigsystem

The following sections illustrate how to manually manipulate data contents. By default, CASA will generally handle this automatically after the initial installation unless the user overrides the settings with their own ~/casa/config.py file.

Locating the Data Directory

The measurespath configuration value is used to find the data directory. The default location is ~/.casa/data. Site installations will typically set that to a shared location in a site configuration file found either at the value of a CASASITECONFIG environment variable if set or at /opt/casa/casasiteconfig.py, /home/casa/casasiteconfig.py or a casasiteconfig.py found in the python path (typically in the site-packages directory). An example casasiteconfig.py is provided in casaconfig/private/casasiteconfig_example.py.

The default measurespath is ~/.casa/data as seen below. Unless you’ve previously installed data there this location will not exist.

[ ]:
# get the configuration and show the value of measurespath
from casaconfig import config
print(config.measurespath)
/root/.casa/data
[ ]:
# see what's in it
!ls $config.measurespath
ls: cannot access '/root/.casa/data': No such file or directory
[ ]:
# the default auto update config values are True (enabled)
print(config.data_auto_update)
print(config.measures_auto_update)
True
True
[ ]:
# see what config files were loaded, in the order they were loaded (in this example, there is just one, the default)
print(config.load_success())
# config.load_failure() will contain any errors encountered when loading any of the config files
['/usr/local/lib/python3.10/dist-packages/casaconfig/config.py']

Populating the Data Directory Automatically

It is necessary to first create and populate measurespath. The CASA modules will not create measurespath to avoid surprises that might arise when more than 800 MB of data are installed to a subdirectory of the user’s home directory (~/.casa/data by default).

When CASA starts (when the casatools module is being initialized during import) if measurespath exists but is empty, and it is owned by the user, and the auto update values are True then the casaconfig modules will be used to populate measurespath automatically.

So for most users the following is sufficient to initially populate measurespath (from the OS prompt):

mkdir ~/.casa/data
<path_to_casa_bin>/casa

The casaconfig module methods will then download the casarundata tarball, install it in ~/.casa/data, download the measures tarball from ASTRON, and update the measures tables that were just installed from the casarundata tarball. Regular CASA use can then continue in the casa session just started. Subsequent uses of casa will check for measures and data updates and install them automaticall if new versions are available. The auto update steps only check for updates once per day. The measures tables are updated daily and the full casarundata tarball is rarely updated (typically at each new CASA release and infrequently between releases as necessary).

The size of the installed data is about 831M. That size may be larger than a user wants to keep in a directory under their home directory. In that case, users should create a personal config file in ~/.casa/config.py (if they don’t already have one) and set measurespath to point at that their preferred measurespath location. As with the default location, if measurespath exists but is empty and owned by the user, starting CASA or importing casatools will populate that location when auto updates are enabled.

# an example config.py setting measurespath
measurespath = "/path/to/measurespath"

Subsequent installations of CASA can continue to use the same measurespath and automatic updates will continue to install the data as it becomes available.

Populating the Data Directory Manually

The measurespath location may be populated manually. This is how site installations can install and maintain a data directory. Indivual users may also find this useful - especially if they choose to disable auto updates (e.g. to ensure that nothing changes across multiple CASA sessions or while different CASA sessions are running concurrently).

The following live examples use the casaconfig command line. These examples are done through the python prompt so that they can be used in this notebook. Normally these would be done from the OS prompt using the python environment where casaconfig is installed (e.g. “python3” in a monolithic CASA installed from a tarball).

The casaconfig command line uses the same config files that normal casa uses. This method does not require that measurespath already exist, but it does require that the user be able to create it if it needs to be created or that it has been previously populated by casaconfig.

This example creates the measurespath location and populates it with casarundata and the most recent measures data from ASTRON. This example then uses a second casaconfig command line option (current-data) to print out the current version strings of the data installed at *measurespath.

[ ]:
!python -m casaconfig --update-all
!python -m casaconfig --current-data
Checking for updates into /root/.casa/data
pull_data using version casarundata-2024.02.09-1.tar.gz, acquiring the lock ...
downloading casarundata contents to /root/.casa/data (333M) ... done
casarundata installed casarundata-2024.02.09-1.tar.gz at /root/.casa/data
measures_update ... acquiring the lock ...
 ... connecting to ftp.astron.nl ...
  ... downloading WSRT_Measures_20240221-160001.ztar from ASTRON server to /root/.casa/data ...
  ... measures data update at /root/.casa/data
current data installed at /root/.casa/data
casarundata version casarundata-2024.02.09-1.tar.gz installed on 2024-02-21
measures version WSRT_Measures_20240221-160001.ztar installed on 2024-02-21

Use of CASA after this step will not try and update the data. CASA will not check for new versions of the data until 24 hrs after the last check unless the “force” option is used (command line or the functional interface show next).

The same casaconfig steps can be done from inside a running python session. The following illustrates one way to install and populate that data from python, this time to a custom location (a custom location can also be provided through the measurespath command line argument in the previous example). In order to use this custom location measurespath would need to be set to this location in a configuration file or provided through the CASA command line.

Note: Some of the measures tables are read when casatools starts. If the tables are updated after that step from a running python session then those changes may not be seen by that already running casatools. If this step happens after CASA has started (or the casatools module has been imported) then that sessions should be exited and restarted to ensure that those data changes are used.

[ ]:
from casaconfig import pull_data
# populate some custom location with the data contents
pull_data('./casadata')

!ls ./casadata
pull_data using version casarundata-2024.02.09-1.tar.gz, acquiring the lock ...
downloading casarundata contents to ./casadata (333M) ... done
casarundata installed casarundata-2024.02.09-1.tar.gz at ./casadata
alma  catalogs  data_update.lock  demo  ephemerides  geodetic  gui  nrao  readme.txt

Note: If you are using python-casacore directly (outside of CASA), you will need to set your .casarc file to point to wherever you installed casaconfig and/or populated a data folder.

CASA does not use python-casacore so this step is not necessary when using CASA.

[ ]:
# tell casacore where to find casaconfig
from casaconfig import set_casacore_path
set_casacore_path(config.measurespath)
!more ~/.casarc
writing /root/.casarc...
measures.directory: /root/.casa/data

Updating the Data Directory

Most of the data tables (such as beam models, antenna and Jy/K correction tables, and the antenna configuration files for the CASA simulator) are versioned by CASA release and seldom change. However, the Casacore Measures tables (ie geodetic subdirectory) must be updated frequently after release. If the measures_auto_update configuration value is True then these updates will happen automatically.

The get_data_info function can be used to determine what the installed version of the measures data is, and when it was installed. That information is also found in the readme.txt file located with the measures data (this can also be done for the casarundata).

[ ]:
from casaconfig import get_data_info
meas_info = get_data_info(type='measures')
print('measures version : ' + meas_info['version'])
print('measures install date : ' + meas_info['date'])
!cat ~/.casa/data/geodetic/readme.txt
measures version : WSRT_Measures_20240221-160001.ztar
measures install date : 2024-02-21
# measures data populated by casaconfig
version : WSRT_Measures_20240221-160001.ztar
date : 2024-02-21

The measures_update() function is used to download new measures data from the originating source. By default, this function will retrieve the latest data. If you already have the latest data, then nothing will happen.

[ ]:
from casaconfig import measures_update
measures_update()
!cat ~/.casa/data/geodetic/readme.txt
# measures data populated by casaconfig
version : WSRT_Measures_20240221-160001.ztar
date : 2024-02-21

Specific versions of past measures data can be retrieved as well. This may be important if trying to exactly replicate the conditions of a particular data reduction run in CASA. Generally though the measures data is appended with time, so past and current versions should have the same values at the same points in time (see later section of casacore measures data contents). Similar functions are also provided for installing specific versions of casarundata.

[ ]:
# show the 3 most recent version of the measures available
from casaconfig import measures_available
versions = measures_available()
print(versions[-3:])
['WSRT_Measures_20240219-160001.ztar', 'WSRT_Measures_20240220-160001.ztar', 'WSRT_Measures_20240221-160001.ztar']
[ ]:
# retrieve a version from a while back
measures_update(version=versions[-12])
!cat ~/.casa/data/geodetic/readme.txt
measures_update ... acquiring the lock ...
 ... connecting to ftp.astron.nl ...
  ... downloading WSRT_Measures_20240210-160001.ztar from ASTRON server to /root/.casa/data ...
  ... measures data update at /root/.casa/data
# measures data populated by casaconfig
version : WSRT_Measures_20240210-160001.ztar
date : 2024-02-21

Note: measures_update() requires that the expected readme.txt file exists at the location being updated. This helps to protect against updating a location that is being maintained outside of the casaconfig methods.

Note: when auto updates are enabled (measures_auto_update or data_auto_update are True) then this step happens each time that CASA is started (during initialization of the casatools module). Auto updates require that the user owns the location being updated in addition to the requirement that the expected readme.txt file exists. This helps to protect against updating a location that is shared by multiple users (e.g. a site measurespath). The casaconfig code only looks for new updates at most once per day unless the force argument is used.

Data Locations

There are two configuration values related to where CASA finds the external data described here: measurespath and datapath.

The datapath value is a list of locations to be searched, in order, for the desired data. That value is used by the resolve() method of the utils tool (ctsys in a running CASA environment). The datapath value can be changed after CASA starts and the new value will be used in subsequent searches.

The measurespath value is used when the casatools module starts to find the location of the IERS tables. Those values are read at that point and used throughout the CASA session. Changing measurespath after casatools has started will NOT cause the IERS tables at the new location to be used. Updating the IERS tables after casatools has started will not change the IERS values used by that session of CASA. CASA must be restarted to see any changes in the IERS tables.

The default datapath has measurespath as the first and only element in the list. CASA recommends that measures data be kept at the same location as casarundata and that that location be the first element of datapath. Other arrangements may work but are not tested by CASA and may be confusing.

Except for the IERS data as described previously, data used by CASA is read as needed. Changes to non-IERS data after CASA starts may be seen during that CASA session (depending on whether other tools have already read those files). Updates for casarundata are provided infrequently.

Shared data used by one or more installations maintained at a site are typically updated at a time not controlled by a CASA user. User’s concerned about that may choose to install their own copy of the data using the casaconfig module methods as descrbed here. They should set measurespath in their config.py to point to a location of their choosing. Typically they will turn on automatic updates so that their installed data is updated as needed when CASA starts but they may choose to not use automatic updates if they may run multiple CASA sessions at the same time and they want to ensure that the same data is used for a long-running session, for example.

Site Installations

Monolithic CASA (distributed in a tarball) may be configured for use by a site where one or more CASA installations are shared by multiple users.

The location of measurespath is set in the site configuration file. The site configuration file can be given as the value of a CASASITECONFIG environment variable. If that value is not set then it is found in any of the following locations (the first location found will be the one used):

  • /opt/casa/casasiteconfig.py

  • /home/casa/casasiteconfig.py

  • A casasiteconfig.py at any location in the python path (e.g. the site-packages directory).

Site managers should choose the location (or CASASITECONFIG value) that is most convenient for them.

Note: If CASASITECONFIG is set but a file does not exist at that location then a warning message will be printed and the configuration will proceed without using any site configuration file. The CASASITECONFIG value must be a fully qualified path.

The CASA distribution comes with an example site configuration file at private/casasiteconfig_example.py of the casaconfig module.

This is what the example site configuration file looks like

# An example site config file.
# Place this in a location checked by casaconfig :
#   /opt/casa/casasiteconfig.py
#   /home/casa/casasiteconfig.py
#   the environment value CASASITECONFIG - use the fully qualified path
#   anywhere in the python path, e.g. the site-packages directory in the CASA being used.
#

# This file should be edited to set measurespath as appropriate

# Set this to point to the location where the site maintained casarundata can be found
# by default datapath will include measurespath

measurespath = "/path/to/installed/casarundata"

# turn off all auto updates of data

measures_auto_update = False
data_auto_update = False

Once measurespath has been set in the site configuration file the location must be populated to contain the expected data using the methods previously described. The same data installed at measurespath can be used by multiple installations (a single site configuration file can be used by multiple CASA installations). Note that auto updates are False here so that user’s don’t attempt to update the site measurespath when they start CASA. The auto update mechanism also checks that the user is the owner of measurespath before any attempt to update it is made. An error is printed and CASA continues starting if measurespath can not be updated when either auto update is True.

Legacy Data

Sites with older installations that already have CASA data shared across multiple CASA installations can use the same data location and update methods that they have been using (e.g. the casadata python module or a clone of the casa-data git repository).

In that case, the site would need a site configure file where the measurespath is set to the location of this data and auto updates are False. The site would continue to maintain that data independent of the casaconfig methods described here. The version information printed by the “–current-data” casaconfig command line option or available through the get_data_info function will be “unknown” for a legacy data installation and the casaconfig data functions will not install any data into such a location.

If this legacy update mechanism is used be aware that CASA is no longer distributed with a data or data directory internal to the distribution so replacing that with a link to the site location is no longer necessary and doing that will not allow that installation of CASA to find the data. The measurespath must indicate where the CASA data can be found.

Be aware that the update-data script is no longer provided by casa although existing update-data scripts found with previous versions of casa should continue to work and the resulting updated data can be used with this version of casa provided a measurespath is set appropriately.

Reference Testing

Testing may require that the casarundata be set to a known version. Monolithic CASA is packaged with information that casaconfig can use to set the casarundata to the version at the time that that release was set. The casaconfig command line –reference-testing argument can be used to set the data at measurespath to that version. It is intended for internal testing only. This sets the data at measurespath to what it would have been had CASA been packaged with data at the time that version of CASA was released. This includes the measures data at that date (i.e. the measures data will be out of date, which can be useful for testing). The reference-testing version information is only known to monolithic casa installations.

Note: This option changes the contents at measurespath.

A user doing testing may not wish to change the measurespath that they use regularly or for their personal data reduction. CASA recommends that people doing reference testing have a separate config file (e.g. test_config.py) which sets measurespath to a location for the test data. That configuration file can be used at any command line (casaconfig and casa) with the configfile option. So, for example, one could do the following to start casa using a test_config.py in their ~/.casa directory:

$ casa --configfile ~/.casa/test_config.py --reference-testing

If the measurespath set in that test_config.py exists and is empty or was previously populated by casaconfig then this line will make sure that the versions of data installed there match what the reference versions are for this tarball and install them if necessary before CASA continues.

Note: If the casarundata version is the expected reference version but the measures version is different (e.g. it’s more recent because auto updates have happened) then casaconfig will re-install that casarundata version (which has the correct measures version) and not update any measures data after that.

Reference testers will typically turn off auto updates in the test config file being used so that subsequent CASA uses do not automatically update the data (auto updates will not happen in any event for 24 hrs after the reference data is installed because casaconfig will not check for updates during that time).

Data Directory Contents

Casacore Measures

The casacore Measures tables are needed to perform accurate conversions of reference frames. Casacore infrastructure includes classes to handle physical quantities with a reference frame, so-called Measures. Each type of Measure has its own distinct class in casacore which is derived from the Measure base class. One of the main functionalilties provided by casacore w.r.t. Measures, is the conversion of Measures from one reference frame to another using the MeasConvert classes.

Many of the spectral, spatial, and time reference frames are time-dependent and require the knowledge of the outcome of ongoing monitoring measurements of properties of the Earth and astronomical objects by certain service observatories. This data is tabulated in a number of tables (Measures Tables) which are stored in the casadata repository in the subdirectory geodetic. A snapshot of this repository is included in each tarball distribution of CASA and in the casadata module for CASA6+.

Measures tables are updated daily based on the refinement of the geodetic information from the relevant services like the International Earth Rotation and Reference Systems Service (IERS). Strictly speaking, the Measures tables are part of the casacore infrastructure which is developed by NRAO, ESO, NAOJ, CSIRO, and ASTRON. In order to keep the repository consistent between the partners, the Measures tables are initially created at a single institution (ASTRON) and then copied into the NRAO casadata repository from which all CASA users can retrieve them. As of March 2020, the update of the NRAO CASA copy of the Measures tables in geodetic and the planetary ephemerides in directory ephemerides takes place every day between 18 h UTC and 19 h UTC via two redundant servers at ESO (Garching).

CASA releases need to be updated with recent Measures tables (see above). For observatory use, the update period should not be longer than weekly in order to have the EOPs up-to-date for upcoming observations. The shortest reasonable update interval is daily. For offline data analysis use, the update period should not be longer than monthly. Weekly update is recommended.

Legacy installations processing old data do not have to be updated because the relevant contents of the Measures Tables is not changing any more for the more distant past.

The following list describes the individual Tables in subdirectory geodetic:

  • IERSeop2000: The IERS EOP2000C04_05 Earth Orientation Parameters using the precession/nutation model “IAU2000” (files eopc04_IAU2000.xx)

  • IERSeop97: The IERS EOPC04_05 Earth Orientation Parameters using the precession/nutation model “IAU 1980” (files eopc04.xx)

  • IERSpredict: IERS Earth Orientation Data predicted from NEOS (from file ftp://ftp.iers.org/products/eop/rapid/daily/finals.daily)

  • IGRF: International Geomagnetic Reference Field Schmidt semi-normalised spherical harmonic coefficients. (Note that this still uses IGRF12. An update to IGRF13 is underway.)

  • IMF (not a Measures Table proper, access not integreated in Measures framework): Historical interplanetary magnetic field data until MJD 52618 (December 2002).

  • KpApF107 (not a Measures Table proper, access not integreated in Measures framework): Historical geomagnetic and solar activity indices until MJD 54921 (April 2009)

  • Observatories: Table of the official locations of radio observatories. Maintained by CASA.

  • SCHED_locations (not a Measures Table proper, access not integreated in Measures framework): VLBI station locations

  • TAI_UTC: TAI_UTC difference (i.e. leap second information) obtained from USNO

Measures Tables in the directory ephemerides:

Ephemeris Data

The ephemeris tables hold a selection of the solar system objects from JPL-Horizons database. The data tables are generated from the JPL Horizons system’s on-line solar system data and ephemeris computation service (https://ssd.jpl.nasa.gov/?horizons ). These are primarily used to determine flux models for the solar system objects used in the setjy task. These tables are stored as CASA tables in the casadata repository under ephemerides/JPL-Horizons. The current ephemeris tables cover ephemerides until December 31, 2030 for those objects officially supported in setjy.

Available objects, which include major planets, satellites, and asteroids, are: Mercury, Venus, Mars, Jupiter, Saturn, Uranus, Neptune, Pluto, Io, Europa, Ganymede, Callisto, Titan, Ceres, Vesta, Pallas, Juno, Lutetia, Sun and Moon (the objects in bold are those supported in ‘Butler-JPL-Horizons 2012’ standard in setjy.).

The format of the table name of these tables is objectname_startMJD_endMJD_J2000.tab These tables required by setjy task are included in the data directory in the CASA distribution. The available tables can be listed by the following commands:

#In CASA6

CASA <1>: import glob

CASA <2>: jpldatapath = ctsys.resolve('ephemerides/JPL-Horizons') + "/*J2000.tab"

CASA <3>: glob.glob(jpldatapath)

The following data are retrieved from the JPL-Horizons system (the nubmer in the parentheses indicates the column number listed in the JPL-Horizons system). One should refer https://ssd.jpl.nasa.gov/?horizons_doc for the detailed descreption of each of these quantities.

Quantities

column no.

Unit/format

Descrition

column label

Date

n.a.

YYYY-MM-DD

HH:MM

Date__(UT)__HR:MN

Astrometric RA & DEC

1

degrees

Astrometric RA and Dec with respect to the observer’s location (GEOCETRIC)

R.A._(ICRF)_DEC

Observer sub-long& sub-lat

14

degrees

Apparent planetodetic (“geodetic”) longitude and latitude of the center of the target seen by the OBSERVER at print-time

ob-lon, ob-lat

Solar sub-long & sub-lat

15

degrees

Apparent planetodetic (“geodetic”) longitude and latitude of the Sun seen by the OBSERVER at print-time

Sl-lon, Sl-lat

North Pole Pos. ang. & dist.

17

degrees and arcseconds

Target’s North Pole position angle and angular distance from the “sub-observer” point

NP.ang, NP.ds

Helio range & range rate

19

AU, km/s

Heliocentric range (r) and range-rate (rdot)

r, rdot

Observer range & range rate

20

AU, km/s

Range (delta) and range-rate (deldot) of the target center with respect to the observer

delta, dedot

S-T-O angle

24

degrees

Sun-Target-Observer angle

S-T-O

The task getephemtable can be used to retrieve the ephemeris data from the JPL-Horizons system through its web service and convert a CASA table. (See also Manipulate Ephemeris Objects page).

CASA <5>: getephemtable(objectname='Titan', timerange='2020/01/30~2020/01/31', interval='15m', outfile='Titan_test_ephem.tab')

The converted table contains following columns. Note that required columns by Measures, which is used to read the epehemeris table in CASA are MJD, RA, DEC, Rho, and RadVel. Other extra columns are added for use in setjy task.

Column name

unit/format

description

MJD

day

modified Julian date

RA

degree

atrometric right acension in ICRF/J2000 frame

DEC

degree

astrometric declination in ICRF/J2000 frame

Rho

AU

Geocentric distance

RadVel

AU/d

Geocentric distance rate

NP_ang

degree

North pole position angle

NP_dist

degree

North pole angular distance

DiskLong

degree

Sub-observer longitude

DiskLat

degree

Sub-observer latitude

Sl_lon

degree

Sub-Solar longitude

Sl_lat

degree

Sub-Solar latitude

r

AU

heliocentric distance

rdot

km/s

heliocentric distance rate

phang

degree

phase angle

Array Configuration

Array configuration files for various telescopes are distributed with each CASA release. These configuration files can be used to define the telescope for simulator tools and tasks. Currently, configuration files for the following telescopes are available in CASA:

  • ALMA / 12m Array

  • ALMA / 7m ACA

  • VLA

  • VLBA

  • Next-Generation VLA (reference design)

  • ATCA

  • MeerKat

  • PdBI (pre-NOEMA)

  • WSRT

  • SMA

  • Carma

The full list of antenna configurations can be found in the CASA Guides on Simulations.

One can also locate the directory with the configurations in the CASA distribution and then list the configuration files, using the following commands in CASA:

CASA <1>: print(ctsys.resolve('alma/simmos/'))
/home/casa/data/distro/alma/simmos/

CASA <2>: ls /home/casa/data/distro/alma/simmos/

If a configuration file is not distributed with CASA but retrieved elsewhere, then the configuration file can be called by explicitly writing the full path to the location of the configuration file in the antennalist paramter of the simulator tasks.

NOTE: the most recent ALMA configuration files may not always be available in the most recent CASA version. ALMA configuration files for all cycles are available for download here. For the Next-Generation VLA reference design, the latest information can be found here.