SECRET OF CSS

Migrate RDBMS Dinosaurs to the Cloud


This is an article from DZone’s 2022 Database Systems Trend Report.

For more:

Read the Report

Dinosaurs are not extinct. Many of the top businesses of today have either migrated to the cloud or are in the process of currently migrating. As part of their IT organizations, it’s common to possess one or more large relational database management systems (RDBMS) that are at the core of the business. These monstrous dinosaurs are often the most mission-critical of all company data and are in no way extinct but can also serve as an anchor from a full migration to the cloud. No matter the cloud strategy, these monolithic databases are essential to the ecosystem and should be part of the migration strategy to be successful.

 Cloud migration example

Figure 1: Cloud migration example 

A common mistake is when teams attempt to separate the application or smaller systems connected to the large relational databases, as demonstrated in Figure 1. To be successful, the relational databases and all connected resources — no matter if they’re applications, secondary databases, web servers, etc. — must migrate as one. Furthermore, that success requires a strategy to migrate large amounts of relational data, multiple servers, software installations, jobs, and network configurations as part of the data ecosystem. 

After all of this complexity, the network is the last bottleneck and will be one of the biggest challenges to overcome as part of this herculean effort. 

How Large Relational Databases Are Impactful to Cloud Migrations

Relational systems historically have a minimum of two tiers — a relational database and application or access tier. In their more complex designs, they have multiple application server tiers, servers to manage FTP access, ETL/ELT, web servers, middleware, and corresponding databases that either feed or are fed from the main relational system. Some platforms, such as Oracle, are architected around schemas, which result in a historically larger database that is more difficult to migrate unless taken as a whole. 

The Dichotomy of the Relational Dinosaur

The relational database dinosaur’s natural life is one of growth, and with an RDBMS based on a schema design vs. smaller tenancy architecture, each database can possess terabytes and sometimes petabytes of data. Depending on the interconnectivity of the data to other systems, the database size can create its own gravity, pulling systems closer to the source to provide the best user experience. In the cloud, this pull is amplified by the massive real estate covered by an enterprise cloud. 

Data gravity will pull applications, connected data estates, and resources to the largest body of data, most often a legacy relational database possessing critical business data. 

As more data travels between applications and databases to the larger relational system (via ETL/ ELT processing or database links), there is a need for all systems involved to be closely connected to the larger relational body to eliminate latency. This, in essence, is data gravity. 

RDBMS Data Gravity

Figure 2

When architecting an RDBMS for the cloud, data gravity must be taken into consideration. Not just for choices in infrastructure, but even for services, a cloud solution must have awareness of application and database connections to deploy them for the most optimal performance. Design begins from the largest of the systems, then radiates out to the smallest components/ services, ensuring the most impactful systems receive the focus required for success in the architecture design. 

All or Nothing to the Cloud

As customers migrate to the cloud, they may have dipped their toes in with a few migrated systems, then decided to move everything to the cloud in earnest. With this in mind, there is a goal to leave nothing on-premises, and this requires an understanding of archaic relational systems and the requirements for migrating them to the cloud. 

One of the most significant weaknesses with the trickle-to-the-cloud strategy is that previous, smaller cloud migration projects may have shifted various workloads across multiple clouds, and if there is data interaction between the systems, this results in discovery around multi-cloud dependencies. The network becomes our last bottleneck, which no one has discovered how to overcome. Close data center locations with peered networks and accelerated networking may assist in eliminating some of the latency, but as demonstrated in Figure 3, until new networking technology is developed, this challenge will continue. Multi-cloud solutions can provide some benefits of data between cloud providers, but they will never perform like a single cloud solution. 

 Network latency differences between cloud providers can vary between regions and geographies

Figure 3: Network latency differences between cloud providers can vary between regions and geographies 

The first goal to overcome a cross-cloud latency issue is to identify what data is required for moving between the environments daily, weekly, etc. A second goal should be around how developers have performed their work on-premises and optimized it for cloud development, eliminating excess whenever possible. Always choose to simplify any additional IO that could be created when pulling or pushing data across the network. 

All cross-cloud data processing should be tested fully to ensure it can meet the demands of the business and is acceptable even with potential data growth over time. 

Infrastructure as a Service vs. Platform or Software as a Service

Upon investigating cloud migrations, Platform as a Service (PaaS) and Software as a Service (SaaS) are repeatedly marketed as attractive options for all on-premises technology. Users are thrilled to hear they may be able to spend less on supporting infrastructure and platforms, but they forget how much technical debt has already been built into the relational environments they want to move to the cloud. 

Why Are Very Large RDBMS Limited So Often to Iaas?

Once it becomes apparent PaaS and SaaS will require users to give up many customizations and functionalities, the user is back to considering Infrastructure as a Service (IaaS). This occurs due to a combination of factors, but most of the challenges revolve around years of complexity built into the systems and a lack of features in SaaS/PaaS offerings. When deciding what options are available in the cloud vs. data estates moving to the cloud, follow these simple guiding principles: 

  • SaaS:
    • You are working on a greenfield (new) project 
    • There is no customized code required at the database layer 
    • The system possesses application-driven development and has simple data storage requirement 
    • You are working with smaller user bases and simple recovery point objectives (RPOs)/recovery time objectives (RTOs) 
  • PaaS: 
    • You are working on a greenfield project 
    • The resource usage for vCPU, memory, and IO easily fit in limits of PaaS 
    • There are few IT resources to manage infrastructure, or there is a desire to remove this requirement  
    • There are less advanced features or customized options implemented to the database tier
  • IaaS:
    • You are working with large, terabyte-petabyte relational systems
    • You require the same or similar architecture as your on-premises application
    • You have unique demands on resources — IO, vCPU, and/or memory
    • You have very demanding workloads with complex RPOs/RTOs and development demands

If there is a need to go with IaaS, it is important to realize that cloud vendors can provide solutions for an incredible array of workloads, and relational workloads are unique, requiring the correct IaaS solution to meet the requirements. 

How To Build Out an RDBMS Migration Strategy

Migrations are challenging and being prepared is the best course of action to succeed. Relational databases with multitier systems, no matter whether you are working with an archaic client/server architecture or a mainframe solution, require planning to ensure success. Although each project is unique, there are certain aspects that are universal and, if satisfied as part of the plan, will help to guarantee a successful migration. The universal list often includes: 

  • Database size and complexity
  • Data loads and connected ecosystems
  • Application, job, web, and other servers
  • Network latency

What Important Metrics Must Be Identified in RDBMS?

Most relational workloads are resource heavy — in other words, they are more demanding on infrastructure than other workloads. But as much as we may focus on CPU and memory, relational workloads, especially ones such as Oracle, can require high IO storage solutions. 

Most IO storage and benchmarks will focus heavily on requests (IOPs); however, request sizes can vary, leaving these values compromised for marketing benefits. From my experience, a recommendation is made to focus less on IOPs and ensure that the solution chosen, both around virtual machine and storage IO limits, can handle the megabytes per second (throughput). 

Creating Tiers of RDBMS Complexity

As services, high availability, and backups change in the cloud, all decisions around storage and solutions must focus on RPO and RTO. Any required customer uptime SLAs that may be different from the RPO/RTO should also be considered because services could be bundled into storage solutions chosen as part of the architecture. 

Ensure that all architectural decisions are based on how cloud architecture should be designed for recommended practices and not just replicating what a customer has built into their on-premises architecture. This is a common mistake seen in the cloud, creating holes and redundancy. 

A good starting point is to lift and shift the relational database workload, which will remove any infrastructure debt built into the existing, on-premises hardware. If this hardware isn’t considered and all focus goes into the relational workload, a new architecture can be designed based on its needs. 

Rinse and Repeat to Success

Because most data ecosystems require the main database and connected systems to not only be migrated but also duplicated for non-production copies, there’s significant importance to building out a framework that can be simplified, automated, and deployed as part of a DevOps practice. Performing all the actions involved sequentially without a framework each time would be incredibly time-consuming and prone to mistakes. 

Build a Framework

Building a cloud migration framework starts with documenting what is required to deploy a relational system to the cloud from end to end. The beginning outline can look similar to the high-level example shown in Figure 4 and be built out to complete a migration project plan. 

Once this is built out, use tools and scripts to automate as much of it as possible while including enough flexibility to be reused for numerous systems and architecture going forward. 

An example of a high-level framework for a cloud migration

Figure 4: An example of a high-level framework for a cloud migration

Ensure the scripting language and tools can scale as your cloud migrations do, and verify that they can manage the infrastructure, relational system, and the data. As issues arise and are resolved, document them and ensure that these aren’t repeated in the future, allowing for efficiency to develop as part of cloud migration strategies. 

Conclusion

Large relational databases are targeted as the first to disappear from the technical landscape, like an asteroid aiming for dinosaurs and yet, these archaic systems are more often the center point for many cloud migrations. Once moved to the cloud, multiple projects may be proposed to modernize and eliminate these dinosaurs, but more often, their bones become the foundation for new application strategies with the data residing in the same relational systems as they did on-premises. Modernization, due to limited resources, lacking ROI, or the amount of effort to modernize, often removes the urgency to change the system. 

As businesses continue to move to the cloud, recommended practices to move large RDBMS as part of these data centers and data estates will be necessary due to the role these relational systems still play in the data estate.

This is an article from DZone’s 2022 Database Systems Trend Report.

For more:

Read the Report



News Credit

%d bloggers like this: