Classification of logical data models implemented by contemporary DBMS

Tarassov S.В.

doi:10.60797/itech.2025.7.3

Classification of logical data models implemented by contemporary DBMS

Research article

Tarassov S. В.

DOI:

https://doi.org/10.60797/itech.2025.7.3

Issue: № 3 (7), 2025

Suggested:

15.10.2024

Accepted:

01.07.2025

Published:

14.07.2025

11

1

XML

PDF

Abstract

Apart from relational model, many contemporary DBMS implement different data models, including extensions of relational ones. Most of DBMS implement also multiple data models, and are so called “multi-model”. However, there is no commonly used classification of actual data models that is a disorientating factor for database users: engineers, students, teachers, analysts, etc. This article contains common terms and definitions as well as the history of earlier model classification proposed by other experts in the database domain since the 1970s. Developing the graph-based and set-based classification approach, and classical hierarchy-network-relational triad, the article proposes an improved two-axis data model classification including new semi-structured ones as well as examples of DBMS implementing these models.

Keywords:

DBMS, data model, database, classification, NoSQL, SQL, OLTP, OLAP, semi-structured data, database schema, set theory, graph theory.

1. Introduction

Data models are fundamental entities that provide abstraction for DBMS (Data Base Management System). A DBMS is the software that handle all access to the database

. Any DBMS implements one or more data models corresponding to the datalogical modeling level.

The concepts of datalogical and infological data models were introduced by B. Langefors. A user-oriented description is called the infological realm of data modeling. The mapping of basic infological concepts into a corresponding computer representation is called the datalogical realm of data modeling

. According to the ANSI/SPARC three-level architecture , the datalogical models correspond also to the conceptual level.

To resolve this collision, J. Zachman tried to separate clearly conceptual, logical, and physical levels of data in models

.

Modern computing foundations include the topic of database management; database design and data-centered approach are the important parts of software design strategies and methods

, , .

This article is focused on logical level data models implemented by DBMS.

2. Main part

2.1. Data model definitions

A data model can be defined as a combination of three components

:

1. A collection of data structure types.

2. A collection of operators or inferencing rules, which can be applied to any valid instances of the data types listed in (1), to retrieve or derive data from any parts of those structures in any combinations desired.

3. A collection of general integrity rules, which implicitly or explicitly define the set of consistent database states or changes of state, or both.

Another term definition is “A data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints”

.

Data models are often confused with data (database) structural schemes. In fact, a model is an abstraction tool whereas a scheme is a result produced by used tool. For example, a relational database scheme is a result of modeling that uses the relational data model. In the software engineering practice, a database scheme is called also a database model to distinguish them from DBMS physical schemes implementing namespaces or security accessors.

2.2. Existing classifications

As of 1979, some 40 or more data models (mostly incomplete) have been proposed for the management of formatted data

, , .

The classical approach has been suggested by C. Date in the first edition of his book

published in 1975. The approach separates data models in three categories:

– hierarchical model;

– network model;

– relational model.

This classification has been largely reused in the study books on data management since the middle of 1970th and up to 2000th

, , . In the last editions of his book, C. Date writes about the hierarchical and network models: "We do not discuss these categories in detail in this book because — from a technological point of view, at least — they must be regarded as obsolete" .

However, since the 1990th and early 2000th many non-relational data structures and models was introduced and re-introduced in different DBMS. There are several important reasons

, , , , :

– dominating of object-oriented approach and impedance mismatch of object-relational mapping;

– evolution of OLAP (Online Analytical Processing) DBMS;

– evolution of Internet/Web data;

– reintroducing of VLDB (Very Large Databases) storage and analysis as "Big Data".

For example, the authors of "Database System Concepts"

have introduced the following new categories apart the relational one:

– relational model;

– entity-relationship model;

– semi-structured data model;

– object-based data model.

M. Stonebraker and J.M. Hellerstein

have suggested the following data models in the context of their historical epochs:

– hierarchical (IMS): late 1960’s and 1970’s;

– network (CODASYL): 1970’s;

– relational: 1970’s and early 1980’s;

– entity-Relationship: 1970’s;

– extended Relational: 1980’s;

– semantic: late 1970’s and 1980’s;

– object-oriented: late 1980’s and early 1990’s;

– object-relational: late 1980’s and early 1990’s;

– semi-structured (XML): late 1990’s to the present.

As one can see, the categorizations mentioned above:

1. Confuse data models of different levels. For example, ER (entity-relationship) was suggested in 1976

as a conceptual model, and as a "unified view of data" which allows the modeling of underlying relational and other logical database models.

2. Do not take in account the mathematical basics like the graph theory or the set theory.

3. Do not regard schemaless/schemafull approach out of scope of semi-structured models. For example, XML data model without a schema can be considered as a semi-structured but the simple adding of an XML schema makes the model fully structured.

2.3. Suggested classification principles

Database systems can be conveniently categorized according to the data structures and operators they present to the user

. However, the data structures and operations may be considered also as elements of the model which are based on some formalism or a theory.

2.3.1. Formalism of data models

The first classification axis is the theoretical basis, or the formalism which a data model is based on. The following formalisms are used since first DBMS had been developed in 1960’s:

– graph theory;

– set theory;

– higher-order function notion from the category theory (map).

Data models based on the graph theory have the following qualities:

1. Each data item is represented as a record of some type. In the modern world an object with properties may be used instead.

2. Each record can be explicitly linked to one or more records, for example, using physical pointers; the model is called "hierarchical" when cyclic links are disabled.

3. To access a data item the user should specify the path containing established links.

Data models based on the set theory have the following characteristics:

1. Each data item called "a tuple" is an ordered set of elements of different data types.

2. There are no explicit links between tuples

3. To access one or more data items user should specify an operation on the data set; for example, intersect two set of tuples

The maps are well known since introducing in the LISP programming language in 1958

. The following qualities are proper to maps :

1. Each data item called "a value" may have any data type.

2. There is no explicit or implicit links between items.

3. To access an item user should specify other value called "a key". Every key can be associated with only one value.

Table 1 - Principal distinctions between data model classes

DOI:10.60797/itech.2025.7.3.1

Comparing element	Graph models	Set-oriented models	Map based models
Data structure type of an item	Record	Tuple	Value
Links between data items	Explicit	Implicit (set operations)	Not supported
Data integrity rules	Supported	Supported	Not supported (*)
Access to a data item	Explicit path (trajectory)	Implicit (set operations)	Explicit (by key)
Storing of data items	Ordered	Not ordered	Ordered
Output of data items	Ordered	Not ordered (**)	Not ordered

Note: * – may be partially supported when introducing constraints on values and types; ** – an ordered set can be produced with some specific operations; for example, ORDER BY in SQL but the storage does not respect any item order

2.3.2. Structuring level classes

The second axis identifies the class of data structuring level supported by a data model:

– structured data models;

– semi-structured data models;

– non-structured data models (out of subject).

In the structured data model, all data items should have a predefined type including complex types. For example, records of the same type should have the same set of fields

.

Semi-structured data model allows the specification of data items where individual items may do not have a type at all, or the items of the same type may have different structure

. For example, "flexible" records even being based on the same type, may have different fields.

Non-structured data models are out of databases realm because of the database definition as a structured data storage

, , .

2.4. Classifications

Axis 1: Formalism used

The following hierarchy seems to be good enough to include all widely used data models.

1) graph based models:

◦ Hierarchical model;

◦ Network model;

◦ Document-oriented;

◦ Object-oriented (data only);

◦ Graph model;

◦ RDF (Resource Description Framework)

.

2) set oriented models:

◦ Relational;

◦ Multidimensional

;

◦ Key-object

;

3) map based models:

◦ Key-value

;

◦ EAV/CR;

◦ Column store.

Axis 2: Structuring level

Only two classes are required to distinguish data models:

1. Structured data models (also called “schemafull”).

2. Semi-structured data models (“schemaless”).

Some data models allow to use both structured and semi-structured facilities. For example, XML, the modern document-oriented framework implementation, allows to define documents which are constrained by XML schema, as well as schemaless documents. Idem for JSON.

2.4.1. Classified DBMS examples

The following examples are based on a wide range of DBMS including multi-model ones, embedded, “on-premises” and cloud SaaS ones, commercial and open source ones etc. Some of DBMS like IBM IMS may be considered as discontinued but they are important at the historical perspective, and still using in business.

Table 2 - Classified DBMS examples

DOI:10.60797/itech.2025.7.3.2

Formalism used	Structured	Semi-structured
Graph based
Hierarchical model	IMS, INES, Caché	LDAP, Windows registry, Caché
Document-oriented	XML: CosmosDB, SQL Server, Oracle, BaseX, MarkLogic JSON: Dynamo, CosmosDB, MongoDB, CouchDB, PostgreSQL, Spark
Network model	IDS, Raima DB, Cronos
Object-oriented	GemStone, Versant, DB4O
Graph	Oracle, SQL Server	Neo4j
RDF	Oracle
Set-oriented
Relational	DB2, Oracle, SQL Server, PostgreSQL, MySQL	Excel, Calc
Multidimensional	SQL Server (MDX), Cognos, SAS	Caché
Key-object	KeySQL
Map based
Key-value	Redis, Couchbase, InfinityDB, RocksDB, Tarantool, Windows INI files
EAV/CR	TrialDB, Magento
Column store	HBase, Cassandra, Riak

The classification may also be presented as a pie-chart diagram.

Figure 1 - Graphical representation of classified DBMS

4.5. Practical cases

Understanding and comparing supported data models may be a crucial factor on DBMS technical choice.

Different data models have their own limitations that may affect the efficiency of storing and querying the data (see more details on different models in the book

, ).

For example, the multidimensional model may require massive recomputing when one of values has been changed. This wont be a good solution for transactional processing but will be for analytical one.

NoSQL models are indeed pre-relational, they are not closed under certain operations on the data. This means, you should write programs to extract the data instead of simple query pipelines as it is for SQL. That is why many NoSQL vendors is trying to add SQL-like query language on the top of existing low level scripting languages.

Data model classification is also useful when giving courses for students: we may concentrate on more abstract notions than the names and functions of different DBMS brands.

3. Conclusion

The proposed classification allows to understand better contemporary DBMS features, opportunities and constraints based on the used data model. In fact, many DBMS implement more than one data model to enhance the field of use.

Some buzzwords like “NoSQL” or “NewSQL” could be formalized as a subset of corresponding models. For example, “NoSQL” can be defined as both structured and semi-structured document-oriented model or, marginally, as all non-relational data models.

The proposed classification is not exhaustive but extensible and can include future data models.

Additional materials

Not specified

Financing

Авторы не получали финансовой поддержки для проведения исследования, написания и публикации статьи

Acknowledgements

Not specified

Conflicts of interests

Not specified

References

Codd E.F. Data models in database management / E.F. Codd // ACM SIGMOD record. — 1981. — Vol. 11, № 2. — P. 112–114.
Silberschatz A. Database System Concepts / A. Silberschatz, H.F. Korth, S. Sudarshan. — 7th ed. — New York : McGraw-Hill, 2020. — 1344 p.
Date C.J. An introduction to database systems / C.J. Date. — 8th ed. — Boston : Pearson, 2004. — 1024 p.
Tsichritzis D.C. Data models / D.C. Tsichritzis, F.H. Lochovsky. — Englewood Cliffs : Prentice Hall, 1982. — 454 p.
ANSI/X3/SPARC Study Group on Data Base Management Systems. Interim Report // ACM SIGMOD Bulletin. — 1975. — Vol. 7, № 2.
Bruchez R. [Bases de données NoSQL et Big Data] NoSQL databases and Big Data / R. Bruchez. — 2nd ed. — Paris : Eyrolles, 2015. — 320 p. [in French]
Zachman J. A framework for information systems architecture / J. Zachman // IBM Systems Journal. — 1987. — Vol. 26, № 3. — P. 276–292.
Karpova T.S. Bazy dannykh: modeli, razrabotka, realizatsiya [Databases: Models, Development, Implementation] / T.S. Karpova. — St. Petersburg : Piter, 2001. — 304 p. — ISBN 5-272-00278-4. [in Russian]
Tarasov S.V. Programmirovanie s bazami dannyh [Programming with databases] / S.V. Tarasov. — Moscow : Solon-Press, 2015. — 256 p. — ISBN 978-2-7466-7383-0. [in Russian]
Tarassov S. Programming with databases / S. Tarassov. — 2nd ed. — Amazon, 2018. — 280 p. — ISBN 978-2-9563008-0-9.
Hellerstein J.M. What goes around comes around / J.M. Hellerstein, M. Stonebraker // Readings in Database Systems. — 4th ed. — 2005. — P. 2–41.
Gilula M. The set model for database and information systems / M. Gilula. — Boston : Addison-Wesley, 1994. — 320 p. — ISBN 0-201-59379-3.
Gilula M. Structured Search for Big Data. From Keywords to Key-objects / M. Gilula. — San Francisco : Morgan Kaufmann, 2016. — 280 p. — ISBN 978-0-12-804631-9.
Chen P. The Entity–Relationship model – toward a unified view of data / P. Chen // ACM Transactions on Database Systems. — 1976. — Vol. 1, № 1. — P. 9–36.
North K. Database Systems: The First Generation / K. North // CODASYL Systems Committee. — 1968. — September.
Fry J.P. Evolution of Data-Base Management Systems / J.P. Fry, E. Sibley // ACM Computing Surveys. — 1976. — Vol. 8, № 1. — P. 7–42.
Date C.J. Introduction to Data Base Systems / C.J. Date. — Boston : Addison-Wesley, 1975. — 400 p.
Ireland C. A Classification of Object-Relational Impedance Mismatch / C. Ireland, D. Bowers, M. Newton [et al.] // First International Conference on Advances in Databases, Knowledge, and Data Applications. — 2009. — DOI: 10.1109/DBKDA.2009.11.
Kim J. The Key to Value: Understanding the NVMe Key-Value Standard / J. Kim, B. Martin. — Storage Networking Industry Association, 2020.
McCarthy J. LISP Programmer's Manual / J. McCarty, K. Maling, S. Russell [et al.]. — 1959.
Gilula M. Key-object data model / M. Gilula // Structured Search for Big Data. — 2016. — DOI: 10.1016/B978-0-12-804631-9.00003-0.
W3C Recommendation. RDFa Core 1.1 – Third Edition. — 2015. — URL: https://www.w3.org/TR/2015/REC-rdfa-core-20150317/ (accessed: 18.12.2024).
Guide to the Software Engineering Body of Knowledge v4.0. / ed. by H. Washizaki. — Waseda University : IEEE Computer Society, 2024.
Hernandez M.J. Database Design for Mere Mortals / M.J. Hernandez. — 4th ed. — Boston : Addison-Wesley, 2020. — 672 p.
Reis J. Fundamentals of Data Engineering. Plan and Build Robust Data Systems / J. Reis, M. Housley. — 1st ed. — Sebastopol : O’Reilly Media, 2022. — 420 p.

Review

All articles are peer-reviewed. But the reviewer or the author of the article chose not to publish a review of this article in the public domain. The review can be provided to the competent authorities upon request.

Author information

AffiliationParis 1 Panthéon-Sorbonne University, Paris, France

Role:Author

Article metrics

Downloads:1

ViewsDownloads

Views

Total: