io
— Utilities for reading or writing data¶
Overview¶
The grand.io
module provides utilities for easy reading (writing) Python
objects from (to) data files. The following types are supported: numpy
array
, BaseRepresentation
,
bytes
, float
, int
, astropy
Quantity
, str
. In addition any list
or tuple
of numpy array
or astropy
Quantity
are stored as a numeric table with annotated
columns.
Reading and writing coordinate frames is only supported for the grand
specific ECEF
and grand.LTP
frames. Note that the
coordinates data, if any, are not stored. If needed they must written explictly
as a separate entry.
Inside a file data are structured under DataNode
. The
open()
function allows to access the root node of a file.
Conceptualy a DataNode
can be seen as a folder within a file
system while the data would be files inside the folder.
Note
A C complient memory layout is used for storing the data allowing for an efficient read back from C. Numeric values are annotated, e.g. with a unit, column name or a metatype. Note also that Python objects are preserved when writing and reading back from a data file, e.g. the object type is restored.
Warning
The HDF5 format is currently used since it allows a hierarchical organization of data, has bindings both for C and Python and automatic compression. Note however that several issues have been reported when using HDF5, e.g. reliability and performances. Therefore, the underlying data format might change in the future, e.g. for a tar archive which actually provides the same features.
Accessing data files¶
Data files are accessed using the open()
function. The semantic
is the same than the Python open()
or C fopen functions.
-
grand.io.
open
(file, mode='r')¶ Open file and return the root
DataNode
object. If the file cannot be opened, anOSError
is raised.For example, the following creates a new data file using the root
DataNode
as a closing context manager.>>> with io.open('data.hdf5', 'w') as root: ... pass
In order to read from (append to) a file use the ‘r’ (‘a’) mode.
Managing data nodes¶
-
class
grand.io.
DataNode
(group: h5py._hl.group.Group)¶ A node containing data elements and branches to sub-nodes
Sub-nodes can be accessed by index providing their relative path w.r.t. this node. For example the following gets a reference to the sub-node named apples.
>>> node = root['apples']
Note that an
IndexError
is raised if the sub-node does not exist. Use thebranch()
method in order to create a new sub-node.The
read()
andwrite()
methods allow to read and write data to this node.-
property
children
¶ Iterator over the sub-nodes inside this node.
For example, the loop below iterates over all sub-nodes below the root one.
>>> for node in root.children: ... pass
-
property
elements
¶ Iterator over the data elements inside the node.
For example, the loop below iterates over all data elements in the root node.
>>> for name, data in root.elements: ... pass
Note
The data are loaded from disk at each loop iteration. Use the
read()
method instead if you only want to load a specific data element.
-
branch
(k: str) → grand.io.DataNode¶ Get a reference to a sub-node.
Note
If the node does not exists it is created and initialised empty. Use an indexed access instead if you want to access only existing sub-nodes.
-
close
() → None¶ Close the data file containing the current node.
Warning
Closing the data file disables all related nodes, parents and children, which might lead to unexpected results. Therefore it is stronly recommended to wrap all I/Os within a root node context (i.e. using a with statement as shown below) instead of explictly calling the
close()
method.>>> with open('data.hdf5') as root: ... # Do all I/Os within this context ... pass
The data file is automatically closed when exiting the root node’s context. Note that only the root node is a closing context. Contexts spawned from a sub-node do not close the data file.
-
read
(*args: str, dtype: Union[numpy.DataType, str, None] = None)¶ Read data from this node.
The optional argument dtype allows to specify the data type to use for the read values. By default the native data type in the file is used.
Multiple data can be read at once by providing multiple arguments. For example the following reads out two data elements from the root node.
>>> frequency, position = root.read('frequency', 'position')
-
write
(k, v, dtype=None, unit=None, columns=None, units=None)¶ Write data to this node.
The data type (dtype) can be explictly specified a numpy data type. For example the following writes an astropy
Quantity
as a 32 bits floating point.>>> root.write('frequency', 1 * u.Hz, dtype='f')
Note that if dtype is omitted the native Python precision is used when writing the data to file.
The unit keyword allows to specify the unit to use when writing an astropy
Quantity
. If omitted the native unit is used.The units and columns keywords allow to specify the units an names of columns when writing a table, i.e. a
list
ortuple
of numpyarray
or astropyQuantity
.
-
property
Examples¶
Serialising Python data¶
The following example shows how to write basic Python objects to a data file.
>>> with io.open('data.hdf5', 'w') as root:
... root.write('example_of_cstring', b'This is a C like string\x00')
... root.write('example_of_str', 'This is a Python string')
... root.write('example_of_number', 1)
... root.write('example_of_array', np.array((1, 2, 3)))
Note
Python str
objects differ from C ones. In order to generate a C
like string a bytes
object must be used with an explicit null
termination.
Conversely, reading the data back can be done as following.
>>> with io.open('data.hdf5') as root:
... cstring = root.read('example_of_cstring')
>>> python_string = cstring.decode()
Working with physical data¶
The following example illustrates how to create a new data file and populate it with some physical data organised under various branches.
>>> with io.open('data.hdf5', 'w') as root:
... root.write('energy', 1E+18 * u.eV)
...
... with root.branch('fields/a0') as a0:
... r = CartesianRepresentation(0, 0, 0, unit='m')
... a0.write('r', r, dtype='f') # Store the data with a specific format
...
... E = CartesianRepresentation(
... np.array((0, 0, 0)),
... np.array((0, 1, 0)),
... np.array((0, 0, 0)),
... unit='uV/m'
... )
... a0.write('E', E, unit='V/m') # Store the data with a specific unit