Parquet Format

Introduction

Parquet is a column based file format that is often used for data pipelines. Part of the Apache Hadoop ecosystem, it is designed to be compact and efficient for large scale data analysis.

Regrid Parquet Files

  • Our parquet files are generated via PyArrow using the zstd compression option.
  • Parquet format files have one additional column in the parquet file schema, custom_column_json. This column contains any extra data columns the county provides to us, packaged in a json object with the custom column names as keys. Because each county sends different extra data columns, there is no set schema for the json object itself, it will vary county by county.
  • Nationwide Premium tier clients will find the 'parquet' download directory in their downloads directory.

Additional Information

The following links are also recommended for more introductory information on the parquet file format:

If you have any questions about our parquet files, please contact tech@regrid.com.

In this section