The purpose of this article is to explain in simple steps what the Teradata Multi-value compression (MVC) is
in Teradata, how anybody working with Teradata can utilize this and how one can measure the space saving.
What is Teradata compression?
Compression is a technique when used on Teradata tables it lessens the actual physical space utilized by the table. Multiple repeating values could be compressed for a column. So when values are compressed such values are stored only once in that table in the table header and a
reference is provided in actual column.
More about compression:
• Reduces storage cost by saving more logical data per unit of physical capacity
• Improves Performance as less physical data to retrieve while reading.
• Data remains compressed in memory so the cache can hold more rows.
• A lossless compression method i.e. the data are compressed, but no loss of information.
• Answer sets provided to users include the fully uncompressed results, so extraction works automatically.
• Fully transparent to applications, ETL, queries, and views.
• Up to 255 distinct values in each column can be compressed.
• Best candidates are most often occurring values in each column
• one copy of compressed value per column is stored in the table header
What to compress
We should compress regularly occurring values. Here is a list of values/columns which should be considered for compression, as they might be recurring frequently:
With Teradata 13, VARCHAR columns could be compressed. If you can compress smaller values in varchar Column, you could save sufficient storage.
- Default values
- State, county, city
- Binary indicators
How is MVC implemented?
To perform compression on a table. First, we have to analyze the data going to be stored in the table. Once compression values are determined, compression loading the on those values at the time of table
Find Columns for compression
Analyze the data in each column and find potential values for compression which are highly repetitive.
Use the following statement for determining values to be compressed
select count(*) , <column_name> from <table_name> group by 2 order by 1 desc
Try to compress as minimum as possible values.
Do not compress, if you are not convinced.
Compression in Teradata is specified at the time of table creation, or columns are added to an existing table.
Here is an example for defining compression on a column:
Teradata compress syntax:
CREATE TABLE emp (
job CHARACTER(30) COMPRESS (’developer,’’manager’, ’programmer’)
How to measure the space saving?
After applying the compression on a table (in DDL), create two copies of the same table; with compression and without compression. After loading the same data in the both tables, Compare the size of two tables.
If there is no change, then remove the compression applied and try on another column.
More on Teradata compress function
You cannot compress more than 255 distinct values for an individual column.
You cannot modify the COMPRESS attribute for a column after it has been defined without recreating table all over again.
You cannot create a table that has more bytes compressed than there is room to store them in the table header.
You cannot compress values in the following types of columns:
- Component of the primary index
- Identity columns
- Volatile table columns
- Derived table columns