H5Writer#

class pyvisgen.io.datawriters.H5Writer(output_path: Path, dataset_type: str, half_image: bool = True, **kwargs)[source]#

Bases: DataWriter

HDF5 file writer for pyvisgen datasets.

This writer saves data arrays to HDF5 files using the h5py library. Each sample is written to a separate .h5 file. The writer automatically crops images to half their height with a small overlap and validates array shapes before writing.

Parameters:
output_pathstr or Path

Directory path where HDF5 files will be written.

dataset_typestr

Type of dataset being written (e.g., ‘train’, ‘test’, ‘validation’). This is used in the output filename pattern.

Examples

>>> writer = H5Writer(output_path="./data", dataset_type="train")
>>> writer.write(x_data, y_data, index=0)

Or as a context manager:

>>> rng = np.random.default_rng()
>>>
>>> with H5Writer(output_path="./data", dataset_type="train") as writer:
...     x_data = rng.uniform(size=(5, 10, 2, 256, 256))
...     y_data = rng.uniform(size=(5, 10, 2, 256, 256))
...
...     for bundle_id, (x, y) in enumerate(zip(x_data, y_data)):
...         writer.write(x, y, index=bundle_id)

Methods Summary

get_half_image(x, y[, overlap])

Extract half height of every image with a small overlap.

test_shapes(array, name)

Validate the shape of input arrays.

write(x, y, index[, name_x, name_y, overlap])

Write FFT pair data to an HDF5 file.

Methods Documentation

get_half_image(x: ndarray, y: ndarray, overlap: int = 5) tuple[ndarray]#

Extract half height of every image with a small overlap.

Parameters:
xnp.ndarray

Simulated data array with shape (B, C, H, W).

ynp.ndarray

Ground truth array with shape (B, C, H, W).

Returns:
tuple[np.ndarray, np.ndarray]

Tuple containing the cropped x and y arrays.

test_shapes(array: ndarray, name: str) None#

Validate the shape of input arrays.

Arrays should have the shape (B, C, H, W), where B is the batch size, C the number of channels (2), and W and H the width and height of the images.

Parameters:
arraynp.ndarray

Array to validate.

namestr

Name of the array for error reporting.

Raises:
ValueError

If array axis 1 is not size 2.

ValueError

If array does not have exactly 4 dimensions.

write(x, y, index, name_x='x', name_y='y', overlap: int = 5, **kwargs) None[source]#

Write FFT pair data to an HDF5 file.

Creates a new HDF5 file for each sample with pattern samp_{dataset_type}_{index}.h5. The input arrays are cropped to half their height (with 5 pixel overlap) and validated before writing.

Parameters:
xnp.ndarray

First array of the FFT pair with shape (batch, 2, height, width). Expected to have 4 dimensions with axis 1 of size 2.

ynp.ndarray

Second array of the FFT pair with shape (batch, 2, height, width). Expected to have 4 dimensions with axis 1 of size 2.

indexint

Bundle index used in the output filename.

overlapint, optional

Overlap parameter for extracting half-images. Default: 5.

name_xstr, optional

Key of the dataset for x array in the HDF5 file. Default: "x".

name_ystr, optional

Key of the dataset for y array in the HDF5 file. Default: "y".

Raises:
ValueError

If x or y arrays don’t have the expected shape (4 dimensions with axis 1 of size 2).

Examples

>>> rng = np.random.default_rng()
>>>
>>> with H5Writer(output_path="./data", dataset_type="train") as writer:
...     x_data = rng.uniform(size=(5, 10, 2, 256, 256))
...     y_data = rng.uniform(size=(5, 10, 2, 256, 256))
...
...     for bundle_id, (x, y) in enumerate(zip(x_data, y_data)):
...         writer.write(x, y, index=bundle_id)