Migrating an SQL database to NoSQL is always challenging.
SQL models relational data while NoSQL can be designed for a very wide variety of data storage use cases.
I recently had to migrate a legal database from MySQL to Amazon key-value database, DynamoDB.
Amidst the challenges, the migration also forced me to make some tradeoff decisions to better adapt to DynamoDB’s scalable architecture.
I’ll break down the migration process into the following three elements:
Understanding the use cases and access patterns
Understanding the relationships between data
Choosing the best tradeoff with the existing constraints
Let me take you through how each one posed its own challenge and what I did resolve them, in the hopes to offer a broad guideline if you have a similar migration workload.
Understanding Use Cases & Access Patterns
Since I was migrating the data to DynamoDB, the first step was to understand the nature of the application and its access patterns.
The group of lawyers had various types of data stored on their MySQL database.
Some of this data included:
-customers data
-dossiers data
-meetings data
-transactions data
The main data access patterns were the following:
-Getting a customer’s dossier(s)
-Getting customers with a given status (in progress, accepted, rejected)
-Getting all cases in the current month
-Getting invoices for a given customers (by month/year).
-Getting bill payments (transactions) for a given month
There were a few more, but for the sake of brevity I’ll focus on these main ones.
Understanding the relationships between data
With MySQL, the database fetches would make a few joins to query customers with dossiers, customers with meetings, and customers with transactions (for billing puposes).
When a new customer’s dossier was created by a lawyer, a customer record, a dossier and some initial invoicing data is created initially.
Then as the lawyer meets with the customer, some meeting records get added (these would have the total duration and hourly rate, amongst some other metadata).
If there was a court case (quite frequent), a case record was created with additional invoices (to be added as transactions when paid).
Design the Data For DynamoDB
To migrate the data from MySQL to DynamoDB, I started with carefully designing a data model based on the access patterns identified.
Getting a customer’s dossier(s)
All customer dossier items in DynamoDB were transformed to have the following partition key (pk) and sort key (sk):
pk: “customer#101#dossiers”,
sk: “dossier#2025-01-01#201”,
entityType: “dossier”
In MySQL, a dossier record had a foreign key with the customer’s ID to query for dossiers belonging to that customer.
In DynamoDB, I created an item collection — customer##dossiers — for all dossiers items for that customer.
The sort key was defined as the date prefix (to sort dossiers by date created) and suffixed with a random uuid.
(note: for userIds and dossierIds i’m using plain numbers like 101, 201, for simplicity. I do the same for the rest of the data below).
Getting customers with a given status
It was important for my client to be able to filter their customers by status. Each customer would have a status defined based on whether their dossier was accepted, rejected or in progress.
In MySQL this was a plain value. Here’s how I designed it in DynamoDB.
I created a GSI with attributes to satisfy this access pattern:
pk: “customer#101#dossiers”,
sk: “dossier#2025-01-01#201”,
entityType: “dossier”,
GSI1PK: “dossiers#accepted”,
GSI1SK: “dossier#2025-01-01#201”
When creating a dossier item, I had to add the two attributes “GSI1PK” and “GSI1SK” — the first had the value of “dossier#” and the GSI1SK would keep the same value as the base table’s “sk” value.
Using this GSI, I could easily get all dossiers that were of a given status by passing in the status into the “GSI1PK” value.
Getting all cases in the current month
Cases were the less frequently written items to the database since these involved the lawyers going to court with their customers.
As was the case, it was safe to store all case items in a partition prefixed by the current year.
Here’s the primary key design:
pk: “cases#2025”,
sk: “case#2025-01-01#301”,
entityType: “cases”
Getting invoices for a given customers.
Invoice items are part of the customer item collection.
Here’s how they are represented in the new database:
pk: “customer#101#invoices”,
sk: “invoice#2025-01-01#401”,
entityType: “invoices”
Getting bill payments for a given month
Bill payment records follow the same model as invoices. They are partitioned by customer since it is the customers that make the payments.
pk: “customer#101#payments”,
sk: “payment#2025-01-01#501”,
entityType: “payments”
Choosing the best tradeoffs
The main tradeoff was sacrificing some simplicity of join operations in MySQL with more effort on the database item primary key design.
This involved more complex partition and sort key data modeling to “pre-join” items of different entities.
For example, one access pattern would fetch a customer item, their invoices for the current month as well as any payments associated with it.
Other tradeoffs included replicating items versus normalizing data.
Much of the data that was normalized in the SQL database had to be denormalized and sometimes duplicated in DynamoDB.
For example, if two lawyers worked on the same case, a case item had to be duplicated, one for each lawyer. This was designed as such to enable the access pattern where a lawyer could retrieve all cases assigned to them.`
The Migration Process
The migration went mostly smooth after a lot of preparation.
The data model for each record was carefully planned before the migration.
Most of the normalized data was denormalized before as well.
For the actual migration, I wrote one script per table.
I scanned each table, read each item and wrote it to my single DynamoDB table, adding the primary keys, and other attributes.
For the data that was normalized, I had to take that into consideration and create additional items to DynamoDB.
Once all scripts ran successfully, I had some small modifications to do to make sure all items were consistent.
Then came the post-migration, which required me to create serverless functions with AWS Lambda to satisfy the main access patterns to write new data to the DynamoDB table.
The rest of that process can make for an interesting follow up article…
Conclusion
Migrating my client’s database from SQL to DynamoDB required careful planning and tradeoff decisions to adapt to DynamoDB’s context.
While the process involved tradeoffs like denormalizing data and designing complex primary key structures, it ultimately allowed for much more scalable and efficient data access patterns.
The migration taught me a lot about how to think in NoSQL versus SQL and how data modeling in both systems involves different thinking and satisfying different problems.
👋 My name is Uriel Bitton and I’m committed to helping you master AWS, serverless and DynamoDB.
🚀 Build the database your business needs with DynamoDB — subscribe to my email newsletter Excelling With DynamoDB.
Thanks for reading and see you in the next one!