Greenplum adds a distribution clause to the Data Definition Language (DDL) for a CREATE TABLE statement. In Greenplum, the data distribution policy is determined at table creation time. All other things being equal, having roughly the same number of rows in each segment of a database is a huge benefit. One of the most important methods for achieving good query performance from Greenplum is the proper distribution of data. Used to speed lookups of individual rows in a table. Provide a method for accessing data outside Greenplum. Used to enhance performance for data that is rarely changed. Used to minimize data table storage in the disk system. Orientationĭetermines whether the data is stored by rows or by columns. Partitioningĭetermines how the data is stored on each of the segments. Data model aside, Greenplum offers a wide variety of choices in how data is organized, including the following: Distributionĭetermines into which segment table rows are assigned. Data warehouses generally prefer a data model that is flatter than a normalized transactional model. A simple âlift and shiftâ from a transactional data model is almost always suboptimal. To make effective use of Greenplum, architects, designers, developers, and users must be aware of the various methods by which data can be stored because these will affect performance in loading, querying, and analyzing datasets.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |