Companies are now looking for solutions that will allow them to construct and share data dictionaries of their data sources as the need for accessing company data stored in databases becomes more vital. In a data set, a data catalog is a list of tables, files, and columns/fields (database, data warehouse or data lake). To help you create and share such an inventory, TechMag presents you with a variety of tools.
It seems to me that a spreadsheet or word processor would be the most natural instrument for a proof of concept (preferably the former). Most individuals are familiar with these tools for generating and collaborating on documents.
For example, you could take a list of columns from an existing database, run a query, and then paste the results into a spreadsheet for others to fill in the blanks with. In terms of maintenance, making sure it’s up to date with the source might be a time-consuming process.
- Already familiar tool
- Most likely no need for extra licenses
- Easy to edit and read
- Population, and especially maintenance, can be a nightmare
2. DBMS + GUI tool
Another frequent technique is to use a DBMS’s built-in data dictionary, which is also quite popular. DBMSs all have a so-called active data dictionary, which is an inventory of their data structures. Data dictionary items can also be annotated in most DBMSes (called comments, descriptions or extended properties). Teams have choose to keep their metadata in these types of structures. There are a variety of database administration tools accessible to modify comments, and they are available for all databases. It can then be shared with a database documentation generator that generates HTML or another format for simple access to this data dictionary. As a feature in a database GUI or as a standalone application.
So to summarize, the stack looks like this:
- Built-in DBMS data dictionary
- Comment editor (database GUI tool)
- Optional documentation generator
- Metadata with data, directly in the database
- Simple tool stack
- No new fields or items added to the data dictionary.
- As a result of the lack of a global data dictionary, documentation is spread among many databases.
- Does not always make sense (for example, if the vendor does not allow for database changes)
- The database structure must have write access (no offline work)
- Changes to the schema may occur (updating a comment in MySQL alters entire table schema)
- Comment editors that are often clumsy
- There are differences across DBMSes when it comes to stack.
3. Data Catalogs
Data Catalogs (https://dataedo.com/product/data-catalog) are collaborative metadata repositories with very comprehensive search, tagging and profiling features. Data dictionaries are at the heart of these systems, and they can be built by automatically scanning a variety of sources (including NoSQL or data lakes).
- Very rich metadata capabilities
- Advanced search
- Collaboration with commenting, rating, certification of data assets
- Automatic data profiling
- Automatic and intelligent tagging and classification
- Consumers also require a license
4. Database documentation tools
Database documentation tools are another kind of tools that you use. Both of those programs have a lot in common with each other, but they’re specifically built to help with database documentation. You can also export documentation to HTML or PDF using the DBMS data dictionary as well as a (more or less) handy editor of comments.
- More user-friendly comment editors
- Better looking output
- Metadata scope is severely constrained, with only fields and items already existent in the data dictionary being available.
- There is no universal data dictionary and documentation is distributed across databases.
- There is a risk of altering database schema, which is not always desired.
- The database structure must be able to be written to (no offline work)
- Alteration risk of the schema (updating a comment in MySQL alters entire table schema)
- For the most part, they are designed to work with specific databases
5. Data Modeling tools
In addition, data modeling tools fall within the category of software resources. However, none of them offer reverse engineering, which allows you to read the schema of an existing database and import it into your model. It’s possible to export a data dictionary to an HTML page after generating a list of tables and columns in the model, as well as providing descriptions for each table and column.
- Most of them support multiple DBMSs
- Metadata in a separate repository
- More advanced metadata capabilities (e.g. document table relationships)
- Mostly focused on visual modeling with diagrams, rather than data dictionary
- Clunky editing of data dictionary descriptions (a lot of clicking)
- Poor reports
- Very poor and often risky import of changes from the database (works well for the first time)
- Additional cost
6. Data Dictionary tools
One further group of tools, which isn’t exactly defined, is the Data Dictionary tools. It stands out because it contains a distinct metadata repository (as opposed to GUI and documentation tools) and was built for database documentation (in contrast to data modeling tools).
- Rich metadata capabilities (show table relationships, multiple descriptive fields, rich text editor)
- Multiple databases and engines in one repository
- Collaboration capabilities
- Convenient searchable documentation export
- Additional cost
In the end, it depends on what you want to achieve. Is it a long term strategy or are you expecting for an immediate win? Do you already have a set of tools at your disposal? Is it just you or is it a group of people working together? Are the persons involved in the project technically inclined? What if you need read-only access to the data?
For significant and long-term considerations, I recommend Data Dictionary tools and Data Catalogs. A database IDE, or a spreadsheet, is a good option if you’re a techie and don’t want to attempt or buy any new tools.