|
Home
> Publications
> Newsletters
The Metadata Mystique
By Adrienne Tannenbaum
Abstract
"Metadata" is not a globally understood term. There is
no doubt that everyone views yesterday’s metadata as insufficient.
In fact, to a large extent, yesterday’s metadata is often
perceived to be non-existent in that it rarely meets the needs
of the current data warehouse world. But recreating it today
often adds to the problem.
More important, there appear to be no
standard tools or methodologies for dealing with metadata,
so its definition and treatment are often left to the interpretation
of individual warehouse professionals. Because these professionals
are usually primarily responsible for the data warehouse itself,
metadata is often an "also ran" in terms of emphasis. Those
"silver bullet" repository tools just never seem to handle
metadata the way we think it should, and by the time the team
is ready to acknowledge metadata’s importance, it becomes
much easier to resort to a repository tool’s interpretation
of the way metadata should be treated. Hence, we continue
our circular metadata mystique.
It is time for a metadata clarification.
This article will address metadata from a practical entirety
- what it is and how its importance begins. Once it exists,
it never goes away, and because the data warehouse's effectiveness
is directly fueled by the metadata connection, metadata is
a crucial component of the entire data warehouse development
lifecycle. Data warehouses cannot survive without accurate
metadata, and accurate metadata is useless if it is not easily
accessible.
Metadata Defined
Definitions to IT terms typically start
with theory but evolve to represent deployment. In the case
of metadata, the theoretical definition always discusses data
about data, or any vague representation of a documented aspect
of data. To the novice, metadata is understood to consist
of the standard ‘data dictionary’ entries, usually managed
and controlled by a Data Administration group which over time
has become a Data Management organization.
These ‘data dictionary’ entries always included
data element names, definitions, physical attributes such
as length, data type, ranges of allowed values (often called
domains), associated file/database/program names, and/or the
names and contact information of responsible employees. Generally
speaking, data dictionary originating documentation had a
tendency to be physical in nature, and was authored by development
personnel.
The definition of metadata expanded in scope
and became a more widely accepted and expected set of information
when data responsibility became multi-faceted and applications
began servicing multiple functional organizations. Hence the
definition began including aspects of data that go well beyond
its physical characteristics. Organizations began developing
metamodels, logical data models which depict the interrelationships
of all pertinent metadata. These illustrations formed the
basis of an extended metadata definition, still in many cases
theoretical, as follows:
The detailed description of data instances.
Depending on the types of data populated, metadata can range
from simple database field names, lengths, and characteristics,
to the underlying tool constructshttp://www.dbdsolutions.com/DBDS.simply stated, (metadata)
is the definition, format, and characteristics of populated
data.
Metadata Deployed
Based on the wide variation of 'meta-understanding',
it is only reasonable that deployed metadata scenarios be
quite different in terms of origins, scopes, objectives, and
technical implementations. Equally confusing is the variation
in vendor options available in today's data warehouse marketplace
to assist us with our metadata requirements.
So when we look at metadata based on the
way it is, the definition really becomes quite different.
Consider the fact that metadata components can originate during
one or more of the following time-based data warehouse eras:
- Yesterday, the world of legacy applications
which fuels the majority of our data warehouse efforts.
- Today, during data warehouse development,
when we create and/or re-define metadata to represent a
data warehouse perspective.
- Tomorrow, when our metadata changes based
on its interpretations by non-data warehouse developers
and/or the tools and applications which need to access and
represent it.
Metadata and Data Warehouse Development
The most ironic characteristic of failed
data warehouse metadata is the fact that virtually all metadata
is an obvious byproduct of each data warehouse development
phase. In many situations the outputs from one phase function
as inputs to the next and so on. Unfortunately, most data
warehouse teams do not consider metadata requirements from
a deployment point of view until the latter phases of implementation.
Bridging Today's Metadata Gaps
How does one interchange metadata amongst
today's tools and remain sane? Simply speaking, metadata requirements
must be clearly identified during the planning aspects of
any data warehouse effort. A great beginning involves the
modeling of metadata following a methodology which is very
similar to that of modeling data:
- Identify metadata requirements
- Organize metadata requirements by beneficiary
type
- Categorize metadata based on where it
will be needed:
- Common Metadata
- Specific Metadata (one beneficiary category only)
- Unique Metadata (one beneficiary alone!)
- Create the 'metamodel'
- Relate metamodel instances to the metamodels
of your planned data warehouse architecture
- Develop a 'metadata' flow by identifying
the sources for each implemented metadata instance
As each step progresses, more metadata details
will unfold, and the issues identified earlier will surface.
However, by clearly understanding and relating the requirements
to what can be implemented, usable accessible metadata is
more likely to result.
For more information on "The Metadata
Mystique" please refer to The Journal of Data Warehousing,
Volume 3 Number 4 Winter 1998.
|