Typespecs#
Data specifications via type hints
Overview#
Typespecs is a lightweight Python library that leverages typing.Annotated to manage metadata (category, description, units, …) within the type hints of your data structures.
It offers a dedicated read-only dictionary called a type specification to attach your metadata to your type hints.
This approach keeps your code clean and seamlessly coexists with other Annotated-based libraries such as Pydantic.
Finally, the attached metadata can be extracted and aggregated into a pandas.DataFrame object called a specification DataFrame, making it easier to manage it using the rich PyData ecosystem.
Installation#
pip install typespecs
Basic Usage#
You can create and attach a type specification, typespecs.Spec(key=value, ...), to a type hint of your data structure such as Python’s Data Classes and Pydantic models.
The Spec object acts as a read-only dictionary, ensuring your metadata remains immutable and safe from runtime modifications.
Once your data structure is defined, use typespecs.from_annotated(obj) to extract and aggregate the attached metadata into a specification DataFrame.
By default, the actual data and the metadata-stripped type hints will also be stored in the data and type columns, respectively (you can control this behavior using the data and type parameters in from_annotated).
import typespecs as ts
from dataclasses import dataclass
from typing import Annotated as Ann, ClassVar, TypeVar
@dataclass
class Weather:
temp: Ann[list[float], ts.Spec(category="data", name="Temperature", units="K")]
wind: Ann[list[float], ts.Spec(category="data", name="Wind speed", units="m/s")]
loc: Ann[str, ts.Spec(category="info", name="Observed location")]
weather = Weather([273.15, 280.15], [5.0, 10.0], "Tokyo")
specs = ts.from_annotated(weather)
print(specs)
category data name type units
temp data [273.15, 280.15] Temperature list[float] K
wind data [5.0, 10.0] Wind speed list[float] m/s
loc info Tokyo Observed location <class 'str'> <NA>
You can attach multiple Spec objects to a single type hint.
If metadata conflicts between them, the last one will take precedence (you can control this behavior using the conflict parameters in from_annotated; see also Handling Metadata Conflicts).
Temp = Ann[list[float], ts.Spec(category="data", name="Temperature")]
Wind = Ann[list[float], ts.Spec(category="data", name="Wind speed")]
Loc = Ann[str, ts.Spec(category="info", name="Observed Location")]
@dataclass
class Weather:
temp: Ann[Temp, ts.Spec(units="K")]
wind: Ann[Wind, ts.Spec(units="m/s")]
loc: Ann[Loc, ts.Spec(name="City")]
weather = Weather([273.15, 280.15], [5.0, 10.0], "Tokyo")
specs = ts.from_annotated(weather)
print(specs)
category data name type units
temp data [273.15, 280.15] Temperature list[float] K
wind data [5.0, 10.0] Wind speed list[float] m/s
loc info Tokyo City <class 'str'> <NA>
Advanced Usage#
Handling Nested Types#
Typespecs simplifies working with nested types. By default, the metadata attached to nested types will be merged into a single parent row.
Float = Ann[float, ts.Spec(dtype="f8")]
@dataclass
class Weather:
temp: Ann[list[Float], ts.Spec(category="data", name="Temperature", units="K")]
wind: Ann[list[Float], ts.Spec(category="data", name="Wind speed", units="m/s")]
loc: Ann[str, ts.Spec(category="info", name="Observed location")]
weather = Weather([273.15, 280.15], [5.0, 10.0], "Tokyo")
specs = ts.from_annotated(weather)
print(specs)
category data dtype name type units
temp data [273.15, 280.15] f8 Temperature list[float] K
wind data [5.0, 10.0] f8 Wind speed list[float] m/s
loc info Tokyo <NA> Observed location <class 'str'> <NA>
You can disable this merging behavior using merge=False in from_annotated.
specs = ts.from_annotated(weather, merge=False)
print(specs)
category data dtype name type units
temp data [273.15, 280.15] <NA> Temperature list[float] K
temp/0 <NA> <NA> f8 <NA> <class 'float'> <NA>
wind data [5.0, 10.0] <NA> Wind speed list[float] m/s
wind/0 <NA> <NA> f8 <NA> <class 'float'> <NA>
loc info Tokyo <NA> Observed location <class 'str'> <NA>
Finally, you can include the nested type itself as part of the metadata using the special typespecs.ITSELF object.
This is useful when you want to handle the inner type alongside other metadata within the specification DataFrame.
Dtype = Ann[TypeVar("T"), ts.Spec(dtype=ts.ITSELF)]
@dataclass
class Weather:
temp: Ann[list[Dtype[float]], ts.Spec(category="data", name="Temperature", units="K")]
wind: Ann[list[Dtype[float]], ts.Spec(category="data", name="Wind speed", units="m/s")]
loc: Ann[str, ts.Spec(category="info", name="Observed location")]
weather = Weather([273.15, 280.15], [5.0, 10.0], "Tokyo")
specs = ts.from_annotated(weather)
print(specs)
category data dtype name type units
temp data [273.15, 280.15] <class 'float'> Temperature list[float] K
wind data [5.0, 10.0] <class 'float'> Wind speed list[float] m/s
loc info Tokyo <NA> Observed location <class 'str'> <NA>
Handling Missing Values#
By default, missing metadata is filled with pandas.NA in a specification DataFrame.
You can pass custom fallback values by using the default parameter in from_annotated.
specs = ts.from_annotated(weather, default={"dtype": None, "units": "1"})
print(specs)
category data dtype name type units
temp data [273.15, 280.15] <class 'float'> Temperature list[float] K
wind data [5.0, 10.0] <class 'float'> Wind speed list[float] m/s
loc info Tokyo None Observed location <class 'str'> 1
Handling Metadata Conflicts#
When multiple Spec objects define the same key (either stacked on a single type hint or across nested types), the default behavior is to override the older value with the newer one.
You can customize this conflict resolution strategy using the conflict parameter in from_annotated.
For example, passing "update" instead of "override" allows you to cleanly merge dictionary-like metadata.
You can also pass a custom callable to handle more complex conflict resolutions.
Temp = Ann[list[float], ts.Spec(attrs={"sensor": "A", "status": "active"})]
Wind = Ann[list[float], ts.Spec(attrs={"sensor": "A", "status": "active"})]
@dataclass
class Weather:
temp: Ann[Temp, ts.Spec(attrs={"sensor": "B"})]
wind: Ann[Wind, ts.Spec(attrs={"sensor": "B"})]
weather = Weather([273.15, 280.15], [5.0, 10.0])
specs = ts.from_annotated(weather)
print(specs)
attrs data type
temp {'sensor': 'B'} [273.15, 280.15] list[float]
wind {'sensor': 'B'} [5.0, 10.0] list[float]
specs = ts.from_annotated(weather, conflict={"attrs": "update"})
print(specs)
attrs data type
temp {'sensor': 'B', 'status': 'active'} [273.15, 280.15] list[float]
wind {'sensor': 'B', 'status': 'active'} [5.0, 10.0] list[float]
Handling Configuration Settings#
You can define configuration settings directly on an object (or class) to take precedence over the behavior of from_annotated.
This is particularly useful when using wrapper libraries where you cannot pass parameters to from_annotated directly.
To do this, add the __typespecs_config__ attribute and assign a dictionary of your settings.
You can optionally type-hint it with typespecs.Config to benefit from static type checking.
@dataclass
class Weather:
__typespecs_config__: ClassVar[ts.Config] = {"conflict": {"attrs": "update"}}
temp: Ann[Temp, ts.Spec(attrs={"sensor": "B"})]
wind: Ann[Wind, ts.Spec(attrs={"sensor": "B"})]
weather = Weather([273.15, 280.15], [5.0, 10.0])
specs = ts.from_annotated(weather)
print(specs)
attrs data type
temp {'sensor': 'B', 'status': 'active'} [273.15, 280.15] list[float]
wind {'sensor': 'B', 'status': 'active'} [5.0, 10.0] list[float]
Handling Type Hint(s) Directly#
You can create a specification DataFrame from type hint(s) using typespecs.from_annotation and typespecs.from_annotations.
This is useful when you want to directly handle type hints without defining them within a data structure.
annotations = {
"temp": Ann[list[Dtype[float]], ts.Spec(category="data", name="Temperature", units="K")],
"wind": Ann[list[Dtype[float]], ts.Spec(category="data", name="Wind speed", units="m/s")],
"loc": Ann[str, ts.Spec(category="info", name="Observed location")],
}
specs = ts.from_annotations(annotations)
print(specs)
category dtype name type units
temp data <class 'float'> Temperature list[float] K
wind data <class 'float'> Wind speed list[float] m/s
loc info <NA> Observed location <class 'str'> <NA>
specs = ts.from_annotation(annotations["temp"])
print(specs)
category dtype name type units
root data <class 'float'> Temperature list[float] K