If you can't wait for the TL;DR, jump to "Lessons Learned" at the end of this article.
I've been playing around with Python 3.7's dataclasses — and so far they've been super awesome. For the most part, they're quite intuitive and easy to use and customize.
Yet, there's one thing I've been struggling with and couldn't find online help for, which is implementing Python properties on dataclasses. And chances are — I am not alone.
That's why I decided to solve the problem once and for all. I wanted to figure out how to reconcile dataclasses and properties.
So I sat down before my computer, fired up an interpreter and wrote down my thought process as I made my way towards a solution. After a night of trial and error and some wording enhancements, the result is this very blog post!
As a pleasant side effect, you'll get to see many features of dataclasses in action. I'll start with a quick overview of dataclasses, and we'll get to the code right after that.
Of course, you'll find all the code supporting this post on GitHub.
Dataclasses: a 10,000-ft overview
Dataclasses are, simply put, classes made to hold data. Their specification in PEP-557 was motivated by the fact that a lot of classes we write are merely used as editable data containers. When that happens, we spend time writing boilerplate code which most often results in an ugly
__init__() method with tons of arguments and just as many lines for storing them as attributes — not to speak about handling default arguments…
And there is! The answer is: dataclasses. 🎉
Python implements dataclasses in the well-named dataclasses module, whose superstar is the
@dataclass decorator. This decorator is really just a code generator. It takes advantage of Python's type annotations (if you still don't use them, you really should) to automatically generate boilerplate code you'd have to mechanically type yourself otherwise.
As a point of comparison, here's how you would create a
Vehicle class with a
wheels attribute using a regular class declaration:
class Vehicle: def __init__(self, wheels: int): self.wheels = wheels
Nothing fancy, really. Now, the
# 0_initial.py from dataclasses import dataclass @dataclass class Vehicle: wheels: int
Believe it or not — these two code snippets are strictly equivalent! It's actually a win, because beyond
@dataclass generates a bunch of extra stuff for free, including a handsome
>>> car = Vehicle(wheels=4) >>> car Vehicle(wheels=4)
In short, dataclasses are a simple, elegant, Pythonic way of creating classes that hold data. 🐍
I sometimes resort to the
@property decorator to implement specific logic when getting/setting an attribute. That's really just the Pythonic way of implementing getters and setters.
Building upon the previous
Vehicle class, I would make the
wheels attribute private and put an
@property on top of it:
class Vehicle: def __init__(self, wheels: int): self._wheels = wheels # note the underscore — it's now private! 👻 @property def wheels(self) -> int: print('getting wheels') return self._wheels @wheels.setter def wheels(self, wheels: int): print('setting wheels to', wheels) self._wheels = wheels
Here's what it looks like:
>>> car = Vehicle(wheels=4) >>> car.wheels = 3 setting wheels to 3 >>> car.wheels getting wheels 3 >>>
Now the question is — how can I implement such a property on a dataclass?
Wait, so this is the problem?
You may think this to be a trivial question. Mind you, it is not trivial.
Dataclasses generate the
__init__() method for you — and that's great. They even provide a
__post_init__() hook method in case you want to do some more initialization (see Post-init processing).
However, this means you cannot do the same trick as with normal classes, i.e. storing a public-looking argument (e.g.
wheels) into a private attribute (
_wheels) that you'll build an
@property out of.
That's where the problem comes from. And to be honest, it gave me a bit of a headache.
Because I think the problem solving was interesting, I'll take you through 5 consecutive attempts to correctly implement that property on a dataclass version of the
Attempt 1: declaring a private field
First, let's keep things simple. We want to store
wheels in a private field and use it in the
@property, right? So why not simply declare a
_wheels field on the dataclass?
# 1_private_field.py from dataclasses import dataclass @dataclass class Vehicle: _wheels: int # wheels @property as before
Unfortunately, this won't work — otherwise this blog post wouldn't be of much use! 😙
The reason why is because the constructor now expects a
_wheels argument instead of
>>> car = Vehicle(wheels=4) Traceback (most recent call last) <ipython-input-3-9c9de8fb1422> in <module>() ----> 1 car = Vehicle(wheels=4) TypeError: __init__() got an unexpected keyword argument 'wheels'
To be fair, that's just
@dataclass doing its job. Still, that's not what we want.
Attempt 2: make use of
If you read through the documentation, you'll learn that
InitVar allows you to implement init-only variables. These variables can be passed to the constructor, but won't be stored in an attribute on the class. Instead, the variable is passed as an argument to
Why not use this to create an init-only
wheels variable and store that in a
_wheels field? We just need to give the latter a default (e.g.
None) so that it is not required by the constructor:
# 2_initvar.py from dataclasses import dataclass, InitVar @dataclass class Vehicle: wheels: InitVar[int] _wheels: int = None # default given => not required in __init__() def __post_init__(self, wheels: int): self._wheels = wheels # wheels @property as before
__init__() now expects a
wheels argument instead of
_wheels, which is what we want.
@dataclass now generates other boilerplate code and magic methods using
_wheels, which is problematic.
>>> car = Vehicle(wheels=4) setting wheels to 4 >>> car Vehicle(_wheels=4) # 😕
Attempt 3: make use of
Digging deeper into the docs, I found that one could fine-tune the field generation behavior using the field() function. You can pass it a
default value and it accepts a
repr argument to control whether the field should be included in the generated
__repr__(). Here's how it looks when used on
# 3_field.py from dataclasses import dataclass, field @dataclass class Vehicle: wheels: InitVar[int] _wheels: int = field(default=None, repr=False) # __post_init__() as before # wheels @property as before
Sweet — we don't have
_wheels included in
__repr__() anymore. But we still don't have
>>> car = Vehicle(wheels=4) setting wheels to 4 >>> car Vehicle() # Where is `wheels=4`? 😕😕😕
Attempt 4: make
wheels a proper field
In the previous attempts,
wheels was an
InitVar — not a field. This time, let's declare it as a field in its own right. It will be possible to pass it in the constructor, and it should be included in
__repr__() this time around.
The good thing is, the
@property definition of
wheels declared later in the class will not interfere with
@dataclass's generation process — because it is not a type annotation, which is what
@dataclass relies on to generate the fields.
That might start to be a bit complicated, so let me show you some code. I'll reproduce the
wheels in full this time:
# 4_wheels_field.py from dataclasses import dataclass, field @dataclass class Vehicle: wheels: int # Now a regular dataclass field # The rest just as before: _wheels: int = field(default=None, repr=False) def __post_init__(self): # Note: wheels is not passed as an argument # here anymore, because it is not an # `InitVar` anymore. self._wheels = self.wheels # (1) @property def wheels(self) -> int: print('getting wheels') return self._wheels @wheels.setter def wheels(self, wheels: int): print('setting wheels to', wheels) self._wheels = wheels
It looks like we're getting there, aren't we?
Unfortunately, not quite. 😞 There's a catch in this implementation.
Indeed, you may think line
(1) puts the value of
wheels that was given to the constructor (and stored into
_wheels. For example, calling
Vehicle(wheels=4) would result in having
_wheels == 4. Sadly, that is not the case!
Here's why: when executing
self.wheels is the value returned by the
wheels property's getter — not the value initially stored during
__init__()! And that getter returns
self._wheels, which is
None by default.
I know, it's getting all tangled up, but please bear with me:
>>> car = Vehicle(wheels=4) setting wheels to 4 getting wheels # hint: this is (1) being executed >>> print(car.wheels) getting wheels None # nope, nothing in there…
If you think about it, what we're doing in (1) is just replacing
_wheels with its own value. Quite useless, if you ask me. We would actually get the same result if we didn't even implement
Duh! So what can we do? 😩
Fortunately, there's hope!
Attempt 5: exclude
_wheels from the constructor
Let me warn you — this fifth and final attempt will work, and the reason why, which I'll explain in a minute, is outrageous.
At this point, you'd be right to feel sad — I felt sad myself. But fear not! There is one thing from the documentation that we haven't tried yet.
So far, the
_wheels attribute has been declared using
field(default=None, repr=False). Using
default=None here means that we are able to omit passing a value for
_wheels in the constructor — it will be given the value of
__init__(). However, it is still possible to give it a value in the constructor, and everything will work as expected:
>>> car = Vehicle(wheels=4, _wheels=3) setting wheels to 4 getting wheels >>> car.wheels getting wheels 3
Well, how about we find a way to remove the
_wheels argument from the constructor? Will it solve our problem? (Spoiler alert: it will.)
field() accepts an
init argument for that exact purpose. The docs on
init: If true (the default), this field is included as a parameter to the generated
Sounds trivial, right? Well, let's try using it on
_wheels (I removed the
__post_init__() hook because we previously showed that it was actually useless):
# 5_init_false.py from dataclasses import dataclass, field @dataclass class Vehicle: wheels: int _wheels: int = field(init=False, repr=False) @property def wheels(self) -> int: print('getting wheels') return self._wheels @wheels.setter def wheels(self, wheels: int): print('setting wheels to', wheels) self._wheels = wheels
Well, guess what? This has just solved all of our problems.
Because we used
init=False, the constructor generated by
@dataclass will not initialize
_wheels at all.
However, it will initialize
wheels with the value passed to the constructor. If we could extract the generated code, one of the instructions in
__init__() would look like this:
self.wheels = wheels
Now, you tell me — what does this execute exactly?
Yep, that's right. It will execute the setter! 🙀
Look! We've actually been seeing the print statement in the setter since attempt 4!
>>> car = Vehicle(wheels=4) setting wheels to 4 # the `wheels` setter being called
Let me remind you of the code for that setter:
@wheels.setter def wheels(self, wheels: int): print('setting wheels to', wheels) self._wheels = wheels
It sets the value of
As a result, the value for
wheels passed in the constructor is put into
_wheels — and nowhere else, because after the class has been generated,
wheels only refers to the
@property, not to a field on the dataclass.
If you think about it, this is exactly what we were doing when implementing the property on good ol' regular classes. Remember?
class Vehicle: def __init__(self, wheels: int): self._wheels = wheels # This is equivalent to calling: # `self.wheels = wheels` # which *is* what the __init__() method # now generated by @dataclass actually does.
Caveat: this approach only holds because the property's setter is implemented. If we only implemented a getter (i.e. to make a read-only field), the
__init__() method wouldn't be able to assign the attribute and would crash. This is intended behavior, though, because dataclasses were designed to be editable data containers. If you really need read-only fields, you shouldn't be resorting to dataclasses in the first place. Perhaps
NamedTuples would be a viable alternative — they are the read-only equivalent of dataclasses.
Anyway, long story short…
We have successfully implemented properties on dataclasses. 🎉
To be honest, it was surprisingly not easy. We've been through five different attempts, navigating through the documentation and painstakingly coding our way towards dataclass properties.
So after all this hassle, can we at least derive a quick recipe for implementing properties on dataclasses?
The answer is: yes, we can. ✌️
Have you noticed a pattern between using an
@property on a regular class vs. on a dataclass?
Look, here's the regular class version:
class Vehicle: def __init__(self, wheels: int): self._wheels = wheels @property def wheels(self) -> int: return self._wheels @wheels.setter def wheels(self, wheels: int): self._wheels = wheels
And the dataclass version, using a diff syntax to highlight the differences:
+ from dataclasses import dataclass, field + @dataclass class Vehicle: + wheels: int + _wheels: field(init=False, repr=False) - def __init__(self, wheels: int): - self._wheels = wheels @property def wheels(self) -> int: return self._wheels @wheels.setter def wheels(self, wheels: int): self._wheels = wheels
Written in plain words, for you litterature freaks:
How to implement a property on a dataclass:
- Declare the property as a field
- Add an associated private variable using
Way to go!
If you managed to read up to here, congratulations! This blog post dealt with a highly specific and technical topic, yet I've had a lot of fun writing it and figuring out how to implement dataclass properties.
For those wondering — the series on Apache Kafka still goes on! I figured a small break never hurts, and I felt like it gave me the opportunity to write something spontaneous.
I hope you enjoyed this post and, as mentioned in introduction, you can find all the code on GitHub. See you next time! 💻