When transitioning from a monolithic to a microservices architecture, one of the critical tasks is data refactoring. This involves adjusting your database's schema, including its table structures, data, stored procedures, and triggers. These adjustments not only improve the design's efficiency but also ensure that it retains its behavior and information.
Microservices architecture decentralizes data management, leading to scalability, resilience, and faster development cycles. However, choosing the right database is crucial. Will your data benefit from the consistency and durability of a structured database, the flexibility and availability of a non-structured database, or the ultra-fast access provided by an in-memory solution?
In this article, we'll explore these database options and walk through the refactoring process using a sample application for Pedal, a fictional bike-rental startup platform. But first, let's take a look at Pedal's current database setup and understand how it will benefit from a modern microservices architecture.
The old monolithic database approach
The current monolithic database approach used by the Pedal app stores data for bikes, users, and orders in a single database schema. This interconnected storage, processing, and management system means that changing one part of the database can have unforeseen impacts elsewhere in the application. While this approach simplifies development for applications not requiring dynamic data management, it makes scaling complex and inefficient.
Microservices often have disparate data requirements, with some services needing structured relational databases and others relying on non-structured and in-memory storage for rapid data retrieval. The monolithic database approach is ineffective at fulfilling these diverse demands. Its one-size-fits-all solution often compromises performance, scalability, and flexibility.
For example, a traditional relational database's rigid, persistent structure inhibits a service requiring fast, ephemeral data access, where an in-memory database would be more suitable. Additionally, scaling a monolithic database to meet high demands in one area of the application can over-resource other areas, reducing efficiency and increasing costs. In contrast, a microservices architecture allows each service to use a database suited to its specific requirements, enabling more targeted scaling and optimization.
The new microservices data approach
Pedal’s future success hinges on its transition to a modern microservices architecture, a paradigm that divides the application into contextually defined services, each with its unique database and dependencies. This approach's data plane is responsible for storing and transporting data between the various components, making data refactoring essential to maintaining the integrity of the data, app functionality, code, processes, and metadata.
Implementing this approach requires a deep understanding of how data fits within each service’s scope. Efficient data management involves breaking the application into parts, with services and databases connecting intricately to form an ecosystem. Each microservice has database-creating contexts that encapsulate some of its specific functions.
This segmentation promotes independence while enhancing scalability. When demands on one service increase, the architecture can efficiently allocate resources without encountering bottlenecks commonly found in monolithic systems.
Furthermore, this approach enables optimized storage solutions for each data type. For example, using structured databases for transactions, NoSQL stores for unstructured data, or in-memory databases for quick access.
This article in our Pedal app series illustrates migrating Pedal to a microservices architecture to accommodate scaling, demonstrating the essence of this data approach. Databases and services can work together harmoniously, enabling scalability and unparalleled flexibility. The microservices data approach helps the services thrive while letting the data adapt successfully to this distributed environment.
Data refactoring
In traditional monolithic web applications, a single database handles all data requirements, streamlining development and deployment with a unified data store. This centralized design simplifies transactions and data management, but as the application scales, it can lead to performance bottlenecks and complex schema changes impacting the entire application. Scalability is limited to vertical scaling (upgrading hardware) due to tight coupling with a single database.
In contrast, a microservices architecture features dedicated databases for each service, known as database per service. This approach aligns with microservices principles, promoting decoupling and autonomy. Each service's database is tailored to its needs, allowing the use of different database types (SQL, NoSQL, graph, etc.). This separation enables independent development, deployment, scaling, and maintenance of services and databases, enhancing application resilience by isolating failures.
However, this architecture introduces challenges in ensuring data consistency and integrity across services. Implementing complex distributed transaction patterns is necessary for maintaining transactional consistency. Data duplication and synchronization between services add operational complexity, requiring management and monitoring of multiple databases.
Despite these challenges, the microservices database approach offers greater flexibility, scalability, and resilience, making it suitable for large-scale, complex applications with evolving data requirements.
For instance, Pedal, a fictional bike rental platform, originally designed as a monolithic system with PostgreSQL and SQL Server databases, is transitioning to a microservices architecture. This transition requires careful consideration of data storage to ensure each service remains independent and loosely coupled. In a microservices architecture, changes in one service's database schema or requirements should not impact other services.
When decomposing the application, the steps in this article series created services with as narrow a focus as possible. For example, the Bike Service deals only with functions for the stored rental bikes. The refactor Bike Service can be found here.
Pedal’s Bike Service example
The functions our Bike Service exposes are in the class below. These functions each propagate through to a function that directly interacts with the database for this service.
public class BikeService {
@Inject
BikeRepository bikeRepository;
@Transactional
public List<Bike> retrieveBikes() {
return bikeRepository.listAll();
}
public Bike retrieveBike(string id){
return bikeRepository.getBike(id)
}
public Bike postBikeAd(Bike bike) {
return bikeRepository.postBikeAd(bike);
}
Public void deleteBikeAd(Bike bike)
return bikeRepository.deleteBikeAd(bike);
}
The only functions pertinent to the Bike Service deal with creating, deleting, or retrieving bike records from the database. So, when you create your database for this service, create a separate database for the Bike table’s information.
Streamline the PostgreSQL database
Now that you know what data you require for the Bike Service database, decide which kind of database is most suitable. This data should have a transactional, structured database to retrieve bike-related information easily.
The best option is to create a more streamlined PostgreSQL database with a single table. Build this database using the following straightforward create script.
CREATE DATABASE BikeStorage;
-- Connect to the newly created database
\c BikeStorage
-- Create a sequence for generating unique IDs
CREATE SEQUENCE bikes_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
-- Create the 'bikes' table
CREATE TABLE IF NOT EXISTS public.bikes
(
image smallint,
price integer,
date_created timestamp(6) without time zone,
id bigint NOT NULL DEFAULT nextval('bikes_id_seq'::regclass),
model character varying(255) COLLATE pg_catalog."default",
name character varying(255) COLLATE pg_catalog."default",
warranty_status character varying(255) COLLATE pg_catalog."default",
CONSTRAINT bikes_pkey PRIMARY KEY (id)
);
The code secures this database by creating a user only accessible to the Bike Service. The database denies requests from any other service.
Crafting a more sophisticated approach
However, a microservices architecture requires a more sophisticated data approach to achieve a suitable response time for a distributed, high-traffic application. At high volumes, it can be taxing for every service requiring bike information to call the Bike Service and connect to its database.
To remedy this situation, use an in-memory database to cache frequently accessed bike information. The application can quickly retrieve available bikes, pricing, and image information without repeatedly querying the primary database. The in-memory cache acts as a high-speed intermediary, reducing latency and database load by serving requests directly from memory.
Additionally, this setup supports real-time updates, letting the cache reflect bike availability or location changes almost instantaneously. The application can more efficiently handle user queries, improving the overall user experience as a result.
This approach also enables offloading read operations from the main database. It helps the system handle many concurrent users without degrading performance, which boosts its ability to scale.
A fitting in-memory caching solution should provide high-performance, real-time updates and support various data types, such as bike availability, pricing, and image data. Red Hat Data Grid (RHDG) is optimized for high-performance and low-latency access, suitable for applications with significant traffic. Setting up RHDG is straightforward, offering immediate usability.
To enhance the Bike Service's performance and provide customers with quick access to bike information, you can update the service to interact with RHDG. Java applications typically integrate with RHDG using the Infinispan client, which is part of the Red Hat Data Grid ecosystem.
Include the following Infinispan dependency in your application's pom.xml file to enable this interaction.
<dependencies>
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-client-hotrod</artifactId>
<version>15.0.0</version>
</dependency>
</dependencies>
With the Infinispan client included, you can use the following Java code to push information to RHDG from the Bike Service.
import org.infinispan.client.hotrod.RemoteCache;
import org.infinispan.client.hotrod.RemoteCacheManager;
import org.infinispan.client.hotrod.configuration.ConfigurationBuilder;
public class BikeServiceRHDG {
public static void main(String[] args) {
// Configuration for connecting to Red Hat Data Grid server
ConfigurationBuilder builder = new ConfigurationBuilder();
builder.addServer().host("<RHDG_SERVER_HOST>").port(<RHDG_SERVER_PORT>); // Replace with your server host and port
RemoteCacheManager cacheManager = new RemoteCacheManager(builder.build());
// Access or create the cache to store bike information
RemoteCache<String, String> cache = cacheManager.getOrCreateCache("bikeCache", "default");
// "default" is a basic configuration
// Sample bike data
String bikeId = <BIKE_ID>;
String bikeModel = <BIKE_MODEL>;
String bikeName = <BIKE_NAME>;
double bikePrice = <BIKE_PRICE>;
String bikeImage = <IMAGE_URL>;
String warrantyStatus = <WARRANTY_STATUS>;
String bikeData = String.format("{\"model\":\"%s\", \"name\":\"%s\", \"price\":%f, \"image\":\"%s\", \"warranty_status\":\"%s\"}", bikeModel, bikeName, bikePrice, bikeImage, warrantyStatus);
// Storing the bike data in the cache
cache.put(bikeId, bikeData);
// Properly stop the cache manager
cacheManager.stop();
}
}
Establishing access control
To establish effective access control in RHDG, it’s crucial to configure user roles and permissions, benefitting from RHDG’s role-based access control (RBAC) system. To begin, define two essential user roles: the bikeService role with permission to modify the data within the cache (including write, update, and delete operations) and the readOnlyService role with read-only access.
This setup’s core involves configuring security realms within the RHDG server. Security realms manage user authentication and authorization. You would typically accomplish this step using the RHDG server’s command-line interface or management console.
Within a defined security realm, such as BikeAppRealm, add users and assign them the appropriate roles. For example, create a user named bikeServiceUser and assign them the bikeService role, granting them full access to modify data. Similarly, create readOnlyServiceUser with the readOnlyService role for read-only access.
After establishing the user roles and adding them to the security realm, turn your attention to configuring cache security. This approach involves applying specific security settings to your cache to enforce the role-based permissions.
Specify which roles can perform certain actions on the cache. For example, in the server configuration file (like infinispan.xml or standalone.xml), define the bikeService role with ALL permissions, and the readOnlyService role with READ permissions.
Add the following code to your RHDG server configuration files to do so.
<security>
<security-realm name="BikeAppRealm">
<!-- User and role configuration -->
<local-realm>
<user name="bikeServiceUser" password="password">
<role name="bikeService"/>
</user>
<user name="readOnlyServiceUser" password="readOnlyPassword">
<role name="readOnlyService"/>
</user>
</local-realm>
</security-realm>
<cache-container>
<security>
<authorization>
<!-- Define roles and permissions -->
<role name="bikeService" permissions="ALL"/>
<role name="readOnlyService" permissions="READ"/>
</authorization>
</security>
</cache-container>
</security>
Once these configurations are in place, it’s important to update the RHDG server’s configuration file and restart the server to activate the new settings. This approach ensures that the server recognizes the newly defined roles and enforces the specified permissions.
Finally, in your application that connects to RHDG, make sure to use the correct user credentials when establishing the connection. This step is crucial as it enforces the server configuration’s role-based permissions. Remember to handle passwords and user credentials securely, following best practices for password management and security.
By carefully following these steps, you can effectively replicate an RBAC system in RHDG. This system ensures that only authorized users have the appropriate level of access to the data, aligning with the principles of least privilege and maintaining robust security within your application’s data management framework.
Additional data considerations
We’ve shown the process for a single microservice. However, every microservice in the application needs similar changes.
Other services may have different data considerations. For example, if the Pedal application starts saving geospatial data that tracks rental bike locations, you would likely want to store that information in an unstructured, NoSQL database.
Creating a NoSQL database to store and manage geospatial data for tracking rented e-bikes involves selecting a database system that efficiently handles geospatial queries and data structures. MongoDB is a popular NoSQL database due to its native support for geospatial data and queries. It can store geospatial coordinates as GeoJSON objects and provides geospatial indexes and query operators suitable for tracking real-time e-bike locations.
In MongoDB, you would typically create a collection (similar to a table in relational databases) dedicated to storing bike information, including geospatial data. This collection can represent each e-bike as a document (similar to a record or row).
You can store geospatial data, such as the bike’s current location, in GeoJSON format, which MongoDB natively supports. Additionally, creating a geospatial index on the location field enables efficient execution of geospatial queries, like finding all e-bikes within a given point’s radius (for example, near a user's location).
Create the database using the following straightforward JavaScript script.
use ebike_rentals;
// Create a collection named 'ebikes'
db.createCollection("ebikes");
// Create a 2dsphere index on the 'location' field for geospatial queries
db.ebikes.createIndex({ "location": "2dsphere" });
This database can then be populated with a simple call that looks like this:
db.ebikes.insert({
"bike_id": "bike123",
"location": {
"type": "Point",
"coordinates": [-73.856077, 40.848447] // Longitude, Latitude
}
});
This approach lets you store and retrieve the bikes’ geospatial information.
Migrating data
Migrating data from a traditional monolithic PostgreSQL database to a new microservices architecture, where Pedal uses a new PostgreSQL database and an in-memory database (like RHDG), is a two-part process. First, you’ll transfer data to the new PostgreSQL database, typically the persistent store for structured data such as bike models, prices, and user information. Second, you’ll load relevant subsets of this data, particularly those requiring rapid access, such as bike availability or real-time location data, into the in-memory database.
You can use data export and import methods to migrate the original PostgreSQL database to the new PostgreSQL database. First, export the relevant tables or datasets from the old database using a tool like pg_dump. It can export data in SQL or CSV formats.
Once you’ve exported the data, you can import it into the new PostgreSQL database using psql commands or the pg_restore utility, depending on the data dump’s format. This migration might also be an opportunity to restructure the data if required to fit the new microservices architecture, for example, by normalizing or denormalizing tables or modifying schema to better align with the service boundaries.
To populate the in-memory database, you’d typically write a script or application logic to read the necessary data from the new PostgreSQL database and load it into RHDG. This method could involve, for example, fetching the latest bike availability data and locations and storing them in RHDG with appropriate key-value pairs. To insert this data, you can use RHDG commands or an RHDG client library for your programming language (like Jedis for Java).
Plan for and implement a strategy to keep this data up-to-date, which might involve regular refreshing from the PostgreSQL database or using a message broker to update the in-memory data in response to persistent store changes.
This migration process requires careful planning to handle data integrity, especially if the system remains operational. You might use blue-green deployment, feature toggles, or temporary data duplication techniques to ensure smooth migration without downtime or data loss.
Conclusion
Data refactoring is vital when shifting from monolithic to microservices architecture, reshaping how the application stores data. Data refactoring transforms the Pedal application’s architecture from a repository to a network of purpose-driven databases. Services gain independence, scalability, and better performance when they use data storage tailored to each one’s specific requirements.
When transforming your own monolithic application, you may move data from structured to unstructured relational to NoSQL databases or in-memory to transactional databases, depending on each service’s requirements. Migrating from a monolithic architecture to microservices is more than a structural change. It encompasses harmonizing data and services with foresight to future needs.
Data refactoring unlocks the potential for scalable and agile applications that are adaptable and resilient in an evolving software landscape. Consider this article’s techniques when migrating your own monolithic application to microservices.
Learn more about the migration from monolith to hybrid cloud with Red Hat:
- Red Hat Data Grid
- MariaDB and PostgreSQL on OpenShift
- MongoDB on OpenShift