Dataclass CSV makes working with CSV files simpler and more reliable than using dictionaries. It leverages Python’s dataclasses to represent each row in a CSV file, while also supporting type annotations for proper type checking and validation.
Represent CSV rows as dataclass instances for cleaner, more structured code.
DataclassReader uses type annotations to validate CSV data automatically.
Supports str, int, float, complex, datetime, and bool, plus any type whose constructor accepts a single string argument.
Pinpoints exactly which line in the CSV contains invalid data, making troubleshooting easier.
Only extracts the fields defined in your dataclass, so you get exactly the data you need.
Works much like Python’s built‑in csv.DictReader, so it feels natural to use.
Leverages dataclass metadata to customize how data is parsed.
Eliminates the need for manual loops to convert types, validate data, or set default values—DataclassReader handles it all.
Alongside reading, the library provides DataclassWriter for generating CSV files from lists of dataclass instances.
pip install dataclass-csvFirst, add the necessary imports:
from dataclasses import dataclass
from dataclass_csv import DataclassReaderAssuming that we have a CSV file with the contents below:
firstname,email,age
Elsa,elsa@test.com, 11
Astor,astor@test.com, 7
Edit,edit@test.com, 3
Ella,ella@test.com, 2
Let's create a dataclass that will represent a row in the CSV file above:
@dataclass
class User:
firstname: str
email: str
age: intThe dataclass User has three properties: firstname and email, both of type str, and age, of type int.
To load and read the contents of a CSV file, you follow the same approach as when using DictReader from Python’s standard csv module. After opening the file, you create an instance of DataclassReader, passing two arguments: the file object and the dataclass you want to use to represent each row of the CSV. For example:
with open(filename) as users_csv:
reader = DataclassReader(users_csv, User)
for row in reader:
print(row)Internally, DataclassReader relies on Python’s csv.DictReader to read CSV files. This means you can pass the same arguments that you would normally provide to DictReader. The full list of supported arguments is shown below:
dataclass_csv.DataclassReader(
f,
cls,
fieldnames=None,
restkey=None,
restval=None,
dialect='excel',
*args,
**kwds
)All keyword arguments supported by DictReader are also supported by DataclassReader, with one additional option:
validate_header — When enabled, DataclassReader will raise a ValueError if the CSV file contains duplicate column names. This validation helps prevent data from being overwritten. To disable this check, set validate_header=False when creating a DataclassReader instance. For example:
reader = DataclassReader(f, User, validate_header=False)Executing the code produces the following output:
User(firstname='Elsa', email='elsa@test.com', age=11)
User(firstname='Astor', email='astor@test.com', age=7)
User(firstname='Edit', email='edit@test.com', age=3)
User(firstname='Ella', email='ella@test.com', age=2)One of the key advantages of using DataclassReader is its ability to detect when the data types in a CSV file don’t match what your application’s model expects. In such cases, DataclassReader provides clear error messages that help you identify exactly which rows contain problematic values.
For example, if we modify the CSV file from the Getting Started section and change the age of the user Astor to a string value, the error will highlight this mismatch:
Astor, astor@test.com, test
Remember that in the User dataclass, the age property is annotated as an int. If we run the code again, an exception will be raised with the following message:
dataclass_csv.exceptions.CsvValueError: The field `age` is defined as <class 'int'> but
received a value of type <class 'str'>. [CSV Line number: 3]
Note that, in addition to describing the error, DataclassReader also indicates which line of the CSV file contains the problematic data.
DataclassReader can process dataclass fields that define default values. As an example, we’ll modify the User dataclass to assign a default value to the email field:
from dataclasses import dataclass
@dataclass
class User:
firstname: str
email: str = 'Not specified'
age: intWe then update the CSV file, removing the email value for the user Astor:
Astor,, 7When you run the code, the output will appear as follows:
User(firstname='Elsa', email='elsa@test.com', age=11)
User(firstname='Astor', email='Not specified', age=7)
User(firstname='Edit', email='edit@test.com', age=3)
User(firstname='Ella', email='ella@test.com', age=2)
Note that the User object for Astor now has the default value Not specified assigned to the email property.
Default values can also be defined using dataclasses.field, as shown below:
from dataclasses import dataclass, field
@dataclass
class User:
firstname: str
email: str = field(default='Not specified')
age: intBy default, DataclassReader automatically maps dataclass properties to CSV columns when their names match. However, there are cases where a column header in the CSV file uses a different name. In such situations, you can explicitly define the mapping using the map method.
For example, consider the following CSV file:
First Name,email,age
Elsa,elsa@test.com, 11
Notice that the column header is now First Name instead of firstname.
To handle this difference, we can use the map method as follows:
reader = DataclassReader(users_csv, User)
reader.map('First name').to('firstname')Now, DataclassReader will correctly extract the data from the First Name column
and assign it to the firstname property in the dataclass.
Currently, DataclassReader supports the following types: int, str, float, complex, datetime, and bool.
When working with a datetime property, you must use the dateformat decorator to specify how the date should be parsed. For example:
from dataclasses import dataclass
from datetime import datetime
from dataclass_csv import DataclassReader, dateformat
@dataclass
@dateformat('%Y/%m/%d')
class User:
name: str
email: str
birthday: datetime
if __name__ == '__main__':
with open('users.csv') as f:
reader = DataclassReader(f, User)
for row in reader:
print(row)Assuming that the CSV file have the following contents:
name,email,birthday
Edit,edit@test.com,2018/11/23
The output would look like this:
User(name='Edit', email='edit@test.com', birthday=datetime.datetime(2018, 11, 23, 0, 0))
It’s important to note that the dateformat decorator defines the date format used to parse all datetime properties in a dataclass. However, CSV files may sometimes contain multiple date columns with different formats. In these cases, you can assign a format specific to each property by using dataclasses.field.
For example, consider the following CSV file:
name,email,birthday, create_date
Edit,edit@test.com,2018/11/23,2018/11/23 10:43
As you can see, the create_date column includes time information.
The User dataclass can be defined as follows:
from dataclasses import dataclass, field
from datetime import datetime
from dataclass_csv import DataclassReader, dateformat
@dataclass
@dateformat('%Y/%m/%d')
class User:
name: str
email: str
birthday: datetime
create_date: datetime = field(metadata={'dateformat': '%Y/%m/%d %H:%M'})Notice that the birthday field does not have a format specified through field metadata. In this case, the format defined in the dateformat decorator will be applied.
When defining a property of type str in a dataclass, DataclassReader treats values that contain only whitespace as invalid.
To change this behavior, you can use the @accept_whitespaces decorator. When applied to the class, this decorator allows whitespace-only values to be accepted as valid input. For example:
from dataclass_csv import DataclassReader, accept_whitespaces
@accept_whitespaces
@dataclass
class User:
name: str
email: str
birthday: datetime
created_at: datetimeIf you need a specific field to accept white spaces, you can set the property accept_whitespaces in the field's metadata:
@dataclass
class User:
name: str
email: str = field(metadata={'accept_whitespaces': True})
birthday: datetime
created_at: datetimeYou can use any type for a field as long as its constructor accepts a string:
class SSN:
def __init__(self, val):
if re.match(r"\d{9}", val):
self.val = f"{val[0:3]}-{val[3:5]}-{val[5:9]}"
elif re.match(r"\d{3}-\d{2}-\d{4}", val):
self.val = val
else:
raise ValueError(f"Invalid SSN: {val!r}")
@dataclasses.dataclass
class User:
name: str
ssn: SSNReading CSV files with DataclassReader gives you the full benefit of Python’s type‑safety through dataclasses and type annotations. But sometimes we need to go in the opposite direction—using dataclasses to produce CSV output. That’s exactly where DataclassWriter shines.
Using DataclassWriter is straightforward. Suppose we have a simple User dataclass:
from dataclasses import dataclass
@dataclass
class User:
firstname: str
lastname: str
age: intAnd in your program we have a list of users:
users = [
User(firstname="John", lastname="Smith", age=40),
User(firstname="Daniel", lastname="Nilsson", age=10),
User(firstname="Ella", "Fralla", age=4)
]To generate a CSV file with DataclassWriter, start by importing it from dataclass_csv:
from dataclass_csv import DataclassWriterInitialize it with the required arguments and call the method write:
with open("users.csv", "w+") as f:
w = DataclassWriter(f, users, User)
w.write()That’s it! Let’s break down what’s happening in the example above.
We start by opening a file named user.csv in write mode. Then we create a DataclassWriterinstance. To initialize a writer, we provide three things: the file object, the list ofUser instances, and the dataclass type itself (User`).
The type is required because the writer uses it to determine the CSV header. By default, it takes the field names defined in the dataclass. For our User example, the resulting column titles are firstname, lastname, and age.
Here’s the CSV generated from the list of User objects:
firstname,lastname,age
John,Smith,40
Daniel,Nilsson,10
Ella,Fralla,4
DataclassWriter also accepts **fmtparams, which are passed directly to Python’s built‑in csv.writer. You can use this to customize delimiter behavior, quoting, line endings, and other CSV formatting options. For details, see the official CSV documentation: https://docs.python.org/3/library/csv.html#csv-fmt-params
There are also cases where you may want to omit the CSV header. The write method provides a skip_header argument for this purpose. It defaults to False, but when set to True, the writer will skip generating the header row.
As mentioned earlier, DataclassWriter uses the dataclass field names as the default CSV header titles. Depending on your use case, you may want to customize these titles. For that, DataclassWriter provides the map method.
Using our User dataclass with the fields firstname, lastname, and age, the example below shows how to rename firstname to First name and lastname to Last name:
with open("users.csv", "w") as f:
w = DataclassWriter(f, users, User)
# Add mappings for firstname and lastname
w.map("firstname").to("First name")
w.map("lastname").to("Last name")
w.write()The CSV output of the snippet above will be:
First name,Last name,age
John,Smith,40
Daniel,Nilsson,10
Ella,Fralla,4
A heartfelt thank you to all the incredible contributors who have supported this project over the years. Special thanks go to @kraktus for setting up GitHub Actions, enhancing automation for package creation, and delivering numerous code improvements.
Made with contrib.rocks.
Copyright (c) 2018 Daniel Furtado. Code released under BSD 3-clause license
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.