tl;dr: I found a bug in Python 3.7.0, skip ahead to The Bug to try it out yourself.

Background

In Python 3.7, dataclasses was added to make a few programming use-cases easier to manage.

Dataclasses eliminate boilerplate code one would write in Python <3.7.

# Python 3.6
class Example:
    def __init__(self, val1: str, val2: str, val3: str):
        self.val1 = val1
        self.val2 = val2
        self.val3 = val3

example = Example("here's", "an", "example")

This code can be rewritten, like so:

# Python 3.7
from dataclasses import dataclass

@dataclass
class Example:
    val1: str
    val2: str
    val3: str

example = Example("here's", "an", "example")

Dataclasses provide us with automatic comparison dunder-methods, the ability make our objects mutable/immutable and the ability to decompose them into dictionary of type Dict[str, Any].

Let’s see that in action:

from dataclasses import dataclass

@dataclass
class Example:
    val1: str
    val2: str
    val3: str

example = Example("here's", "an", "example")

print(asdict(example))
>>> {'val1': "here's", 'val2': 'an', 'val3': 'example'}

Awesome! I’m sure you can find a few situations where this would be useful.

The Bug

What happens when you compose a dataclass with a namedtuple?

from dataclasses import dataclass, asdict
from typing import NamedTuple


class NamedTupleAttribute(NamedTuple):
    example: bool


@dataclass
class Data:
    attr: NamedTupleAttribute


data = Data(NamedTupleAttribute(example=True))
data_dict = asdict(data)
namedtuple_attr = data_dict['attr']

print(namedtuple_attr.example)
>>> <generator object _asdict_inner.<locals>.<genexpr> at 0x107f45408>

Shouldn’t data.attr.example be of type bool? Why does namedtuple_attr.example evaluate to a generator expression?

To answer those questions, we’ll need to look at a few things. First, tuple vs namedtuple factories and then asdict()’s implementation.

tuple() takes an iterable as its only argument and exhausts it while building a new object. However, namedtuple() takes arbitrarily many arguments and does not exhaust generators supplied as arguments.

print(tuple(x for x in range(5)))
>>> (0, 1, 2, 3, 4)

print(NamedTupleAttribute(x for x in range(5)))
>>> NamedTupleAttribute(example=<generator object <genexpr> at 0x107f45318>)

Where does this fit in with asdict()? We’ll need to look at its implementation to understand.

def asdict(obj, *, dict_factory=dict):
    if not _is_dataclass_instance(obj):
        raise TypeError("asdict() should be called on dataclass instances")
    return _asdict_inner(obj, dict_factory)

def _asdict_inner(obj, dict_factory):
    if _is_dataclass_instance(obj):
        result = []
        for f in fields(obj):
            value = _asdict_inner(getattr(obj, f.name), dict_factory)
            result.append((f.name, value))
        return dict_factory(result)
    elif isinstance(obj, (list, tuple)):
        return type(obj)(_asdict_inner(v, dict_factory) for v in obj)  # right here
    elif isinstance(obj, dict):
        return type(obj)((_asdict_inner(k, dict_factory), _asdict_inner(v, dict_factory))
                          for k, v in obj.items())
    else:
        return copy.deepcopy(obj)

_asdict_inner() will pass a generator to objects that are of type tuple, expecting them to get consumed by the tuple factory.

typing.NamedTuple and collections.namedtuple() are of type tuple, but override its __new__() functionality.

Here’s what happens when asdict() is called on a dataclass that has a namedtuple:

_asdict_inner() recurses on the dataclass object’s fields
When it reaches a field with the type tuple, it calls the object-type’s constructor with a generator expression of fields.
If it’s a tuple, the generator expression is exhausted and a tuple with the generator’s values is produced.
If it’s a NamedTuple, the anonymous generator expression object is not iterated over and is assigned as a field on the NamedTuple.
asdict() returns a new Dict[str, Any] with malformed NamedTuples.

Proposed Solutions

Both Eric V. Smith and Ivan Levkivskyi have quickly proposed solutions to this issue.

Ivan Levkivskyi has suggested that _asdict_inner apply a generator expression only to the standard libraries types list and tuple, then allowing NamedTuples to follow the branch that becomes deep-copied.

Eric Smith proposed a solution in which the generator expression is expanded with star-notation as it is passed in the tuple factory method.

In my (very humble) opinion, NamedTuple is a special case of tuple in the standard library. Since it is a special case in the stdlib, one solution might be to branch on namedtuple with special behavior in _asdict_inner().

def _asdict_inner(obj, dict_factory):
    if _is_dataclass_instance(obj):
        result = []
        for f in fields(obj):
            value = _asdict_inner(getattr(obj, f.name), dict_factory)
            result.append((f.name, value))
        return dict_factory(result)
    elif isinstance(obj, typing.NamedTuple):
        return type(obj)(*(_asdict_inner(v, dict_factory) for v in obj))  # right here
    elif isinstance(obj, (list, tuple)):
        return type(obj)(_asdict_inner(v, dict_factory) for v in obj) 
    elif isinstance(obj, dict):
        return type(obj)((_asdict_inner(k, dict_factory), _asdict_inner(v, dict_factory))
                          for k, v in obj.items())
    else:
        return copy.deepcopy(obj)

Conclusion

Python 3.7 introduces new features that will make development even faster and more fun.

Though I ran into a bug, it is an edge case. Response to my bug report was very quick, polite and professional. A couple of very intelligent people jumped into bugfix mode almost instantly after it was reported.

In conclusion, the rate at which the Python community responds to developer needs and concerns is impressive. Thanks to the contributors that make this project a success!

Python Bug Analysis: dataclasses and namedtuples

Background

The Bug

Proposed Solutions

Conclusion

Comments