For several years, MongoDB has established itself as a standard among NoSQL databases. Its ability to support a large volume of data while offering maximum flexibility on data schemas makes it a database highly appreciated by developers.
In this article, we will see what are the reasons for this success, by detailing how a MongoDB database works, and when to use it.
Before learning about MongoDB, we need to understand what a NoSQL database is and how this representation is different from the other popular database family, SQL.
Indeed, the Structured Query Language (SQL) already existed before the World Wide Web. However, as the functionality of websites grew, developers wanted to create web pages using content that could change over time without redeploying code. Therefore, Not Only SQL (or NoSQL) was developed. NoSQL has relaxed the ACID properties (atomicity, consistency, isolation, durability) and guarantees better performance, better scalability, greater flexibility and reduced complexity.
The main difference with the SQL model comes in the absence of a relational model, which means that the data is no longer materialized and represented by tables. This absence (or flexibility, depending on the models) of data schema makes it possible to evolve the architecture of the database over time, which is much more difficult in an SQL context.
NoSQL databases are also designed for scaling. They have the possibility of being distributed, which makes it possible both to support large workloads (thousands of reads/writes per second) while ensuring high availability, thanks to data replication mechanisms.
In return, each tool uses its own query language: where most SQL databases support 90% of the SQL language, with NoSQL, each database has its own query language. It is therefore more difficult to change the NoSQL database, because it means rethinking its architecture and modifying the use of the query language.
Among the so-called NoSQL databases, we find several database models.
- Column-oriented databases: Cassandra, AWS DynamoDB, HBase.
- Document-oriented databases: MongoDB, Elasticsearch.
- Key/value oriented databases: Redis, Memcached.
- Graph-oriented databases: Neo4j, InfluxDB.
The choice of the basic NoSQL model mainly depends on the needs and the use case. If we often have to manipulate temporal data, we will go to a database adapted for time series. Conversely, column-oriented databases can be interesting in situations where you have to manage ECommerce items, for example.
What is MongoDB?
MongoDB development began in early 2007, when a New York-based company, 10gen, was developing a Platform as a Service similar to Heroku, AWS Elastic Beanstalk, or Google App Engine, but based on open source components. .
The initial development was focused on creating a PaaS (Platform as a Service). Their experience through different web projects has taught them that an application that becomes popular will run into scalability issues at the database level. In their search for a database to integrate into their PaaS product, no open source solution met their needs for scalability and compatibility with a Cloud architecture. This is why the 10gen team has developed in-house a new document-oriented NoSQL database technology. They will baptize it MongoDB, inspired by the word “humongous” which could be translated as “gigantic”, like the data it is supposed to host.
MongoDB was built for speed. The data is based on BSON documents, short for binary JSON. The BSON allows MongoDB to be all the faster in the calculation to find data in documents.
In order to be even more efficient in its queries, MongoDB invites the denormalization of the data in its documents. Where a good practice in SQL was to have specific tables and foreign keys to refer to data during joins, MongoDB encourages denormalization by duplicating the data where it is requested.
MongoDB was designed for the age of cloud and distributed infrastructure. To ensure stability, one of the key concepts of MongoDB is to always have more than one copy of the database available in order to ensure always-fast availability even in the event of failure of the main machine.
MongoDB was designed for flexibility. Unlike SQL databases, the data in a Mongo collection can be completely heterogeneous. This is called Schemaless. The advantage of not necessarily having a strict data structure is to be able to quickly change its data structure.
This flexibility is greatly appreciated at all stages of an application’s maturity. Where at the beginning of the project, modifying a data schema in SQL is relatively easy, this same modification can become a hell in a project with several hundred linked tables. It is to this problem that MongoDB answers with the freedom to be able to adjust the properties of each document in a collection, without having to modify all the documents of the collection.
However, schemaless has its drawbacks. It becomes more difficult to perform analysis operations on the data if all the documents do not follow the same structure. This is why it is also possible to impose a schema on the collection.
How does MongoDB work?
MongoDB uses records which consist of documents containing a data structure consisting of pairs of fields and values. Documents are the basic unit of data in MongoDB. Documents are similar to JSON objects, but use a variation called Binary JSON (BSON). The advantage of using BSON is that it supports more data types.
The fields of these documents are similar to the columns of a relational database. The contained values can be of various data types, including other documents, arrays, and arrays of documents. Documents also include a primary key as a unique identifier.
Document sets, called collections, function like the equivalent of relational database tables. Collections can contain any type of data, but the restriction is that data in a collection cannot be split into different databases.
In order to filter these documents, MongoDB uses queries to find exact matches, using upper or lower comparisons, or using regular expressions. These methods work quite well in many situations. However, they are insufficient when it comes to filtering fields containing rich textual data.
MongoDB integrates a very interesting functionality to remedy this, it is the text search. It allows you to query string fields to find specific text or words. Text searches can be performed using the text index or the $text operator.
A text index can be a string or an array of string elements. To perform searches, the collection must contain a text index. A collection can have only one text index and a text index can be applied to multiple fields.
The MongoDB Ecosystem
MongoDB offers a collection of products to make working with data easier. In addition to the MongoDB document database, here are some examples of the company’s products:
MongoDB Atlas is the cloud-based Database as a Service (DBaaS) solution. Atlas allows you to deploy a managed MongoDB server on an Amazon Web Services, Google Cloud Platform or Microsoft Azure cloud, in the region of your choice. You can choose the size of your cluster while having the advantage of having your database managed by the MongoDB engineering team.
Realm is a lightweight database embedded in the mobile client. In the case of a mobile application, Realm makes it possible to store part of the data directly on the device and to coordinate synchronizations with the main database according to different events. This is ideal to avoid network requests and allow better offline use of the application.
Atlas search is one of the newest members of the Mongo Cloud family. It aims to compete with Algolia and ElasticSearch in the field of search engines. It allows you to index your data differently in order to have a finer and more intelligent search function than a simple query with filters.
Charts is the way to create graphs to visualize your data directly from MongoDB. We can create several types of graphics based on the data we have on our Atlas cluster and integrate them directly on your site directly in HTML. Charts allows you to exploit your data quickly without having to develop a specific Frontend interface for this need.
Cloud Manager is a comprehensive performance monitoring and optimization tool for a cluster on MongoDB Enterprise Advanced. We have access to several indicators through its database in order to analyze the performance of understanding the requests made by its application. An alert system can be configured to prevent emergencies and connects natively to Slack, DataDog or PagerDuty.
Some of the MongoDB Tools
MongoDB Compass is a powerful GUI for querying, aggregating, and analyzing your MongoDB data in a visual environment. Compass is free to use and source available, and can be run on macOS, Windows, and Linux.
The MongoDB Shell,
The MongoDB Kafka connector is a Confluent-verified connector that persists data from Kafka topics as a data sink into MongoDB as well as publishes changes from MongoDB into Kafka topics as a data source.
MongoDB Charts is a tool to create visual representations of your MongoDB data. Data visualization is a key component to providing a clear understanding of your data, highlighting correlations between variables and making it easy to discern patterns and trends within your dataset. MongoDB Charts makes communicating your data a straightforward process by providing built-in tools to easily share and collaborate on visualizations.
When to use MongoDB?
MongoDB can be used for many different use cases, including:
- Management of large databases, thanks to its scalability and flexibility in terms of data model.
- Development of web and mobile applications, due to its simplicity of use and easy integration with many programming languages.
- Storage of semi-structured data, such as messages, events, and activity logs.
- Real-time data analysis, thanks to features such as data aggregation and fast indexing.
- Management of configuration data, metadata and application settings, thanks to its flexibility in terms of data schemas.
ABOUT LONDON DATA CONSULTING (LDC)
We, at London Data Consulting (LDC), provide all sorts of Data Solutions. This includes Data Science (AI/ML/NLP), Data Engineer, Data Architecture, Data Analysis, CRM & Leads Generation, Business Intelligence and Cloud solutions (AWS/GCP/Azure).
For more information about our range of services, please visit: https://london-data-consulting.com/services
Interested in working for London Data Consulting, please visit our careers page on https://london-data-consulting.com/careers