xarray-ms#

xarray-ms presents a Measurement Set v4 view (MSv4) over CASA Measurement Sets (MSv2). It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications to be developed on well-understood MSv2 data.

In [1]: import xarray_ms

In [2]: import xarray

In [3]: import xarray.testing

In [4]: from xarray_ms.testing.simulator import simulate

# Simulate a Measurement Set with 2 channel and polarisation configurations
In [5]: ms = simulate("test.ms", data_description=[
   ...:   (8, ("XX", "XY", "YX", "YY")),
   ...:   (4, ("RR", "LL"))])
   ...: 

In [6]: ms
Out[6]: '/tmp/tmpz9nl2bau/test.ms'

In [7]: dt = xarray.open_datatree(ms)

In [8]: dt
Out[8]: 
<xarray.DataTree>
Group: /
├── Group: /test_partition_000
│   │   Dimensions:                     (time: 5, baseline_id: 3, frequency: 8,
│   │                                    polarization: 4, uvw_label: 3)
│   │   Coordinates:
│   │     * time                        (time) float64 40B 2.09e+11 ... 2.09e+11
│   │     * baseline_id                 (baseline_id) int64 24B 0 1 2
│   │     * frequency                   (frequency) float64 64B 8.56e+08 ... 1.712e+09
│   │     * polarization                (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
│   │     * uvw_label                   (uvw_label) <U1 12B 'u' 'v' 'w'
│   │       baseline_antenna1_name      (baseline_id) <U9 108B ...
│   │       baseline_antenna2_name      (baseline_id) <U9 108B ...
│   │       field_name                  (time) <U7 140B ...
│   │       scan_name                   (time) <U11 220B ...
│   │   Data variables:
│   │       EFFECTIVE_INTEGRATION_TIME  (time, baseline_id) float64 120B ...
│   │       FLAG                        (time, baseline_id, frequency, polarization) uint8 480B ...
│   │       TIME_CENTROID               (time, baseline_id) float64 120B ...
│   │       UVW                         (time, baseline_id, uvw_label) float64 360B ...
│   │       VISIBILITY                  (time, baseline_id, frequency, polarization) complex64 4kB ...
│   │       WEIGHT                      (time, baseline_id, frequency, polarization) float32 2kB ...
│   │   Attributes:
│   │       creation_date:     2025-12-15T10:11:35.108275+00:00
│   │       creator:           {'software_name': 'xarray-ms', 'version': '0.3.8'}
│   │       observation_info:  {'observer': ['observed'], 'project_UID': 'project', '...
│   │       processor_info:    {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│   │       schema_version:    4.0.0
│   │       type:              visibility
│   │       data_groups:       {'base': {'correlated_data': 'VISIBILITY', 'descriptio...
│   ├── Group: /test_partition_000/antenna_xds
│   │       Dimensions:                          (antenna_name: 3, cartesian_pos_label: 3,
│   │                                             receptor_label: 2)
│   │       Coordinates:
│   │         * antenna_name                     (antenna_name) <U9 108B 'ANTENNA-0' ... ...
│   │         * cartesian_pos_label              (cartesian_pos_label) <U1 12B 'x' 'y' 'z'
│   │         * receptor_label                   (receptor_label) <U5 40B 'pol_0' 'pol_1'
│   │           mount                            (antenna_name) <U6 72B ...
│   │           telescope_name                   (antenna_name) <U9 108B ...
│   │           station_name                     (antenna_name) <U9 108B ...
│   │           polarization_type                (antenna_name, receptor_label) <U1 24B ...
│   │       Data variables:
│   │           ANTENNA_POSITION                 (antenna_name, cartesian_pos_label) float64 72B ...
│   │           ANTENNA_DISH_DIAMETER            (antenna_name) float64 24B ...
│   │           ANTENNA_EFFECTIVE_DISH_DIAMETER  (antenna_name) float64 24B ...
│   │           ANTENNA_RECEPTOR_ANGLE           (antenna_name, receptor_label) float64 48B ...
│   │       Attributes:
│   │           type:                    antenna
│   │           overall_telescope_name:  telescope
│   │           relocatable_antennas:    False
│   └── Group: /test_partition_000/field_and_source_base_xds
│           Dimensions:                       (field_name: 1, sky_dir_label: 2)
│           Coordinates:
│             * field_name                    (field_name) <U7 28B 'FIELD-0'
│             * sky_dir_label                 (sky_dir_label) <U3 24B 'ra' 'dec'
│               source_name                   (field_name) <U8 32B ...
│           Data variables:
│               FIELD_PHASE_CENTER_DIRECTION  (field_name, sky_dir_label) float64 16B ...
│           Attributes:
│               type:     field_and_source
└── Group: /test_partition_001
    │   Dimensions:                     (time: 5, baseline_id: 3, frequency: 4,
    │                                    polarization: 2, uvw_label: 3)
    │   Coordinates:
    │     * time                        (time) float64 40B 2.09e+11 ... 2.09e+11
    │     * baseline_id                 (baseline_id) int64 24B 0 1 2
    │     * frequency                   (frequency) float64 32B 8.56e+08 ... 1.712e+09
    │     * polarization                (polarization) <U2 16B 'RR' 'LL'
    │     * uvw_label                   (uvw_label) <U1 12B 'u' 'v' 'w'
    │       baseline_antenna1_name      (baseline_id) <U9 108B ...
    │       baseline_antenna2_name      (baseline_id) <U9 108B ...
    │       field_name                  (time) <U7 140B ...
    │       scan_name                   (time) <U11 220B ...
    │   Data variables:
    │       EFFECTIVE_INTEGRATION_TIME  (time, baseline_id) float64 120B ...
    │       FLAG                        (time, baseline_id, frequency, polarization) uint8 120B ...
    │       TIME_CENTROID               (time, baseline_id) float64 120B ...
    │       UVW                         (time, baseline_id, uvw_label) float64 360B ...
    │       VISIBILITY                  (time, baseline_id, frequency, polarization) complex64 960B ...
    │       WEIGHT                      (time, baseline_id, frequency, polarization) float32 480B ...
    │   Attributes:
    │       creation_date:     2025-12-15T10:11:35.180427+00:00
    │       creator:           {'software_name': 'xarray-ms', 'version': '0.3.8'}
    │       observation_info:  {'observer': ['observed'], 'project_UID': 'project', '...
    │       processor_info:    {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
    │       schema_version:    4.0.0
    │       type:              visibility
    │       data_groups:       {'base': {'correlated_data': 'VISIBILITY', 'descriptio...
    ├── Group: /test_partition_001/antenna_xds
    │       Dimensions:                          (antenna_name: 3, cartesian_pos_label: 3,
    │                                             receptor_label: 2)
    │       Coordinates:
    │         * antenna_name                     (antenna_name) <U9 108B 'ANTENNA-0' ... ...
    │         * cartesian_pos_label              (cartesian_pos_label) <U1 12B 'x' 'y' 'z'
    │         * receptor_label                   (receptor_label) <U5 40B 'pol_0' 'pol_1'
    │           mount                            (antenna_name) <U6 72B ...
    │           telescope_name                   (antenna_name) <U9 108B ...
    │           station_name                     (antenna_name) <U9 108B ...
    │           polarization_type                (antenna_name, receptor_label) <U1 24B ...
    │       Data variables:
    │           ANTENNA_POSITION                 (antenna_name, cartesian_pos_label) float64 72B ...
    │           ANTENNA_DISH_DIAMETER            (antenna_name) float64 24B ...
    │           ANTENNA_EFFECTIVE_DISH_DIAMETER  (antenna_name) float64 24B ...
    │           ANTENNA_RECEPTOR_ANGLE           (antenna_name, receptor_label) float64 48B ...
    │       Attributes:
    │           type:                    antenna
    │           overall_telescope_name:  telescope
    │           relocatable_antennas:    False
    └── Group: /test_partition_001/field_and_source_base_xds
            Dimensions:                       (field_name: 1, sky_dir_label: 2)
            Coordinates:
              * field_name                    (field_name) <U7 28B 'FIELD-0'
              * sky_dir_label                 (sky_dir_label) <U3 24B 'ra' 'dec'
                source_name                   (field_name) <U8 32B ...
            Data variables:
                FIELD_PHASE_CENTER_DIRECTION  (field_name, sky_dir_label) float64 16B ...
            Attributes:
                type:     field_and_source

Measurement Set v4#

NRAO/SKAO are developing a new xarray-based Measurement Set v4 specification. While there are many changes some of the major highlights are:

  • xarray is used to define the specification.

  • MSv4 data consists of Datasets of ndarrays on a regular time-channel grid. MSv2 data is tabular and, while in many instances the time-channel grid is regular, this is not guaranteed, especially after MSv2 datasets have been transformed by various tasks.

xarray Datasets are self-describing and they are therefore easier to reason about and work with. Additionally, the regularity of data will make writing MSv4-based software less complex.

xradio#

casangi/xradio provides a reference implementation that converts CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore package.

Why xarray-ms?#

  • By developing against an MSv4 xarray view over MSv2 data, developers can develop applications on well-understood data, and then seamlessly transition to newer formats. Data can also be exported to newer formats (principally zarr) via xarray’s native I/O routines. However, the xarray view of either format looks the same to the software developer.

  • xarray-ms builds on xarray’s backend API: Implementing a formal CASA MSv2 backend has a number of benefits:

    • xarray’s internal I/O routines such as open_dataset and open_datatree can dispatch to the backend to load data.

    • Similarly xarray’s lazy loading mechanism dispatches through the backend.

    • Automatic access to any chunked array types supported by xarray including, but not limited to dask.

    • Arbitrary chunking along any xarray dimension.

  • xarray-ms uses arcae, a high-performance backend to CASA Tables implementing a subset of python-casacore’s interface.

  • Some limited support for irregular MSv2 data via padding.

  • Refer to the MSv4 compliance and roadmap section for information on adherence to the specification.