Refactoring databases — or evolutionary database design | Thoughtworks
Brief summary
Pramod Sadalage co-authored the book Refactoring Databases 15 years ago. The concepts remain hugely relevant today for those exploring microservices. We caught up with Pramod and Martin Fowler to hear about the genesis of the book and explore how the principles of refactoring work in a world of NoSQL databases.
Please note: owing to a technical issue, the sound quality for one of our speakers is not ideal. We hope this doesn't detract from your enjoyment of this episode.
Podcast transcript
Rebecca Parsons:
Hello everyone. My name is Rebecca Parsons, Chief Technology Officer for Thoughtworks, and I'd like to welcome you to another edition of the Thoughtworks Technology Podcast. And I'm here with my cohost Neal.
Neal Ford:
Welcome, everyone. My name is Neal Ford. I'm a director, software architect, and a meme wrangler here at Thoughtworks. And we are pleased today to be joined by two of our colleagues, Martin Fowler and Pramod Sadalage. Welcome gentlemen, Martin. Hello.
Martin Fowler:
Hello there.
Pramod Sadalage:
Hello.
Rebecca Parsons:
So what we'd like to talk about today is a book that I consider the book with the greatest disconnect between its level of importance and the understanding and knowledge of the book in the market. And that is a book that Pramod wrote several years ago with Scott Ambler called Refactoring Databases, which does have the subtitle of Evolutionary Database Design. Neal continues to suggest that Pramod just rerelease the book, but swap the title and the subtitle for Evolutionary Database Design and then Refactoring Databases.
Neal Ford:
Which would make it one of the most popular books in the microservices world, without a doubt.
Rebecca Parsons:
So, we'd like to talk a little bit about the history of the book, the concepts. So let's start. Pramod, where'd the idea come from?
Pramod Sadalage:
So I joined Thoughtworks back in May of '99. And I was put on this project called Atlas at that time and we had a big binder of use cases, thousands of them. And then we also had a ER diagram with 600 plus tables and a bunch of stuff like a big upfront design. We were not even doing any Agile stuff at that time. And then Martin and Ward basically came in to coach us and then I remember they would rise, met them all. Shah and a bunch of us, getting excited about what is this new Agile stuff.
Pramod Sadalage:
So iteration one planning came about like, "Okay, let's plan this iterative stuff." Impression one planning came about and iteration one at that time was a month long iteration and we said, "We are only going to try to achieve this much functionality." And then I was looking at my database with its 600 tables and this big ER diagram and all we really needed was nine tables or that. And I was wondering should I carry all the 600 as we go and iterate all that or will I need this in the future or not. So, one fine day I decided let's drop the 591 tables, delete them from everywhere like from ER diagram, from every place. And we have 9 tables and just work with that.
Pramod Sadalage:
And then all the bunch of iterations after that, how do I work with developers? What are they requesting? What story cards they're working, story kickoff, iteration kickoff. Should I participate? How do I participate? What is test data? How many environments? All these questions came about, not just doing this X down. Basically, there's a fire hose of stuff coming at me and I'm just trying to solve it. And one day, Martin called me into his room like, "From all this tech, talk to me like what you are doing." And we had this big room, a whiteboard and these questions about what I do and that kind of stuff. And he said, "Oh, there's something that’s interesting, you should talk about this. Not a lot of people are doing this." And I said, "Oh, okay."
Pramod Sadalage:
And then he graciously drove me to Madison. And I still remember I'd just come back from India for vacation and I slept all the way from Chicago to Madison. And then stood there in front of 600 plus people, because everyone was there to see Martin, not me. But I was sitting behind and all this stuff, so somehow I managed to give a talk. And then after that he drove me back and stupid me slept through the whole drive also, because I had jet lag.
Pramod Sadalage:
So that's how the journey started. And then at some point, I think it was 2003 or maybe late 2002, I met up with Scott Ambler, he was also doing some kind of talk at one of these Agile conferences. And then Martin suggested that, "Maybe both of you should pair up and write about this thing." How do you execute evolutionary design? Or how do you evolve databases in an Agile project? That was few texts, I think. And one other, at least the way I remember it is, the refactoring book that Martin had worked on before that had such a good interface, like IntelliJ, like shift F6 and it did something. And everybody could say the same name and express the same keys and the code did the same thing. A name for pattern and when I say the name, everybody understands what I'm trying to do.
Pramod Sadalage:
I think that's the underlying desire to come up with the refactoring names for these things. Before that, I don't think I had names for these things and Martin and Scott and I sat together and came up with these names. We had a long list, we'd turn it down and then also put them into categories and things like that. That all came from that. Martin, correct me if I'm missing anything here.
Martin Fowler:
Yeah. Something else I'd throw in, though, is the article that was published on my site. That was in the beginning of 2003. So that was kind of what helped begin to get the idea out there. It didn't go into the individual database refactorings, but it talked about the key practices that make this kind of evolutionary database design work. I think, if I recall correctly for the first iteration of the article, I mostly did the writing, but it was all Pramod's ideas. And then Pramod revised the article again a few years ago. So it's still out there and up to date, but it was, to some extent, it was fairly, I know it was somewhat influential because I remember talking about it with David Heinemeier Hansson and that was directly that article that influenced him to put the migrations into Rails. Which was one of the first examples of one of these frameworks taking that approach of this evolutionary migration driven database work into a sort of more widely used framework.
Rebecca Parsons:
Yeah. One of my more reliable laugh lines when I give my talk on evolutionary architecture has to do with this whole approach. Since you mentioned migrations, because I feel that pre these ideas, DBAs we're probably one of the only roles within a software development team, where they had a legitimate gripe with Agile methods, because you have to do data migration. As soon as you've got something out there and you want to change things, you have to do a data migration. And it sounds so simple. You copy data from here to there. What could possibly go wrong? But it always goes wrong. And, to me, that was such a strong motivating factor for people in the database community to really push back on these Agile methods, just because of how difficult it could make their job. And so this migration focus, I've always felt is one of the really powerful aspects of this work.
Pramod Sadalage:
Yeah. And I think lately, I have also started to mix it with the whole notion of shift left. Shift left security, shift left testing, shift left migration, too. Because the migration used to happen, okay I'm ready at the release candidate. Now, what data does this need? What data do I have in production? Let's make a dif of that, come up with the migration script, and then all of that. What if I shifted all the way left to story development kind, that notion of migration. Because that's where the fragility of the context is at the highest. What is needed to be done? What do I have? And what is the pre and what is the post? And all the business requirements are all in one place at that time. What if you did that work there?
Pramod Sadalage:
Of course, when I was doing this back in '99, 2000, I was not thinking of these concepts. But later on, as I've talked to more people, and figured out the communication aspect of why doing this earlier in the cycle makes a lot more sense than later in the cycle. Other than just according to shift left of everything, shift left on this database migration makes a lot more sense. One is it reduces risk and all the knowledge and you get to do that comes into play right at that spot where the migration happens.
Neal Ford:
Well, I think the temporality of data makes a lot of things more difficult that we don't suffer from in code. So I want to go back to something that you said. And I frankly can't imagine the courage that it took to hit the enter key when it said, "Drop 591 tables," because that seems to me like it screams against every instinct that you have as a DBA to drop 591 tables at one time. Because what if somebody is using those? I mean, there's a lot of fear and trepidation in that world because of longevity and the implicit integration of that data. So that makes a lot of these things much more difficult.
Pramod Sadalage:
Yeah, I learned a lot when I came to Thoughtworks, like the notion of version controlling things. Of course, I was not doing migration based approach yet at that time, but I was version controlling all of my work. Even the ER diagram, I made any changes I checked in so I could go back to my original. Or even the database when I deleted 591 tables, I made sure I had a backup before I did that. So that kind of stuff, I think I'll credit David Rice or Greg Warren for instilling that developer mentality of things. Because [inaudible 00:10:47], the DBAs [inaudible 00:10:51], when you wanted to try something and that's it. So the techniques of what the developer does was very interesting things to learn, like [inaudible 00:11:00].
Martin Fowler:
And this is a really huge part of the picture, because, I mean, we talk about the database refactoring and the use of migrations to build up the database, the technical things. But what if that, if not more important than this is, the whole attitude of the way in which database people work with developers. I mean, we have all these kinds of silos in the software world, but I mean, the database one was one of the worst of them. We somehow separated out database people from application developers and application developers would say, "Oh, yes, they're the peoples behind the filing cabinets with the beware of the leopard signs and don't go near them." One of the most important things that Pramod did was to go to come out and spend time with developers, listen to them and teach them how to do things. So it was a much more collaborative approach and that collaborative approach is every bit as important as the technical side of what's happening here, because that is what really allows things to flow.
Pramod Sadalage:
Yes, certainly, I think Martin said it succinctly is the collaborative approach. Going up, I remember we had five or six parts in our office at that time and I used to visit parts regularly four or five times a day to see what people were doing. And if someone had something they'd just walk over into my cubicle of course, we were not yet on those long tables there. But they would visit and I'd just walk over tho their desk and three people, two pairs of developers, or one pair of developer and me would pair together and figure out what needs to be done table structure-wise or views or whatever else we were developing. And then, okay fine, this work we are going to do and check it in right there and then off they go and off I go to the next part to pair with other developers.
Rebecca Parsons:
So Pramod, earlier, you mentioned the power of patterns and such and the connection with the refactoring book. Martin, do you want to address how you see the connection to the fundamentals of code refactoring?
Martin Fowler:
Yeah. Well, it's fundamentally a very similar thing. I mean, what you're trying to do is alter the schema of the database in a way that doesn't alter the overall behavior of the application. So when you say that you say, well, actually, no, I can't just alter the schema. I have to alter two other things as well. I have to alter all the access code in the application that touches the data because if I change the data schema, obviously any access code has to change as well. And I also have to migrate any data that's in the database, either in any test database that we're using for test purposes. But of course, even more importantly in the production database as well. But the core idea of refactoring, or at least what to me is the core idea of refactoring, is refactoring is all about taking even a very large change, but breaking it down to tiny little changes. I often say with refactoring, you want to make each change so small that it's not worth doing.
Martin Fowler:
And so you basically, if you do that with a database side, you've got a very small schema change, just one field being broken up into two. And then you find the access code that uses that bit of data and you make all of those changes. And then you write the tiny little piece of data migration script that will just do that one field and break it up into two columns, two separate columns. Although that's all very, very small, but the point is that once you've got that, it's really relatively easy to understand, relatively easy to test, you can then string hundreds of those tiny little changes together and run them with great confidence, just as you can in the refactoring sense, you can make those changes and then run them all together. And that's what allows you to be able to do that with a production database. If you've got a migration script, if you tried to write a migration script that would do a big migration covering a month or so's work from one schema to another, you'd be scared. But when that big migration is a hundred tiny little migrations that all really simple and you're just running them all in a row, then it becomes much more tractable. And that's the great power of this, is that you've got a very composable system. The refactorings just compose nicely together.
Rebecca Parsons:
And it also makes it much easier to spot where the migration may have gone wrong. And this is where I think the tie to even non-Agile projects is so important because you can identify, okay, these are the records that fail because I tried to split this thing into two fields. Oh, that's right, in 1984, we were using that field for a different purpose. And that's why data migrations are so difficult. And so that focus on that small change, it allows you to not just incrementally change the database, but incrementally fix all of those little data bombs that are leftover from decades and decades of the data being around.
Pramod Sadalage:
And also it enables you to take the same change and apply it multiple times, like the notion of when you talk about in a continuous deliver pipeline perspective, it's the change that's redone once, deployed many times. So we are not writing the change every time. Let's say we're going from QA to UAD in line one, we are not writing the change at that time. The change was written back in development, we're just deploying that change. And before it gets going as UAD, it has already done it's QA, probably done against death, probably done in CI, and probably run hundreds of times on developer machines before it came to that point. So, as it goes deeper into the pipeline, the risks of it failing reduces considerably.
Rebecca Parsons:
So you wrote the book over a decade ago, what's your perspective on its relevance for today? Have things changed significantly that it's no longer as important?
Pramod Sadalage:
I wouldn't say it's no longer as important. Some of it has become standard operating practice for a lot of people. There are so many frameworks that, without taking the world of refactoring pushed the motion of migration based approach, like the flyaway or the rate migration that Martin mentioned, or there's liquid based, there's tactical. There's a bunch of even commercial tools from Redgate and things like that, that support the notion of migration based approach. So that in turn, people may not be using the word or the refactoring book or stuff as a reference, but that's what they are doing in the back.
Pramod Sadalage:
But the relevance, I think, more important than comes into play, especially as I'm seeing more and more legacy applications or current deployed applications. The notion of let's break this up into microservices or let's split up for scalability and all the other things that people are trying to do is how do I split this. And the first place you have to start is let's refactor it so we can split. So that's how I approach it. You see a big monolith where everything is talking to everything, there are views that are spanning across all the tables, stored procedures copied to all the tables, application code referencing everything in the database and now I have to split this around like domain bounded context. I have to split around some kind of a data domain and things like that.
Pramod Sadalage:
How do I get to that point so that I can take this domain to a different server. So to get to that point, you have to refactor it. Without that, you can not split it and without splitting you can not have microservices that talk to their own databases in the first place]. So I am seeing a lot more usage there of techniques on how I can refactor, how I can keep the existing current database alive while in the back like I'm doing a little bit of surgery on the database. So I think the first time we did this was a project that you had also did, Rebecca, in UK, when we had two, three, four versions of the application talking to the same database because some of the installations had moved on to newer versions and some installations, they're still talking to the older version of the code and that means they're using your version of the database.
Pramod Sadalage:
So the notion of using the database schema as an interface to your data came about. Your schema is nothing but an interface to your data. So if you look at your database from that perspective, you can say, "Oh, probably I can give you multiple interfaces to the same data." So we have this notion of this, of course, schema mindset that I'm using here more code but this is work for anything. We had, of course, schema of Oracle, which had all the data that was always the current version, at that time version four, but we had gave you protections of that data into different versions. So version three talked to the same database, but with a different interface layer in the middle. And version two talked to the same database, but with a different version in the middle later of views and sequences and figures and that kind of stuff. And we had these four or five versions talking to the same core database in the back. And that's when I realized that we can use these techniques to refactor the front and do your different interface to different versions and in the back we have a database that's well designed and performing to the front.
Pramod Sadalage:
So as older versions got cut off, all we had to do was delete that schema and the no longer supported version two. But that code kept going and we didn't have to do anything. So the trunk kept moving forward and the laggards had time to catch up.
Neal Ford:
Well, in fact, the origin of the frequent joke that I tell that Rebecca mentioned earlier, that you should rerelease this book with the subtitle promoted to the title is because as many teams went headlong into microservices, they realize that, oh, as I'm breaking apart my architecture, I also have to break up my database. How do you do that? Oh, well, here's a book that tells you how to do that. So I've long maintained that your book was just about a decade ahead of its time, that you could have waited about 10 years and released it and it would have been a red hot because this is exactly now what everybody's looking at. Because once you get past hello world in microservices, you have to start having some serious conversations about how you partition data within this architecture. And in fact, Pramod and I have been having some very deep conversations recently about the intersection of architecture and data. And we believe that this is going to be one of the very interesting subjects over the next few years, as we figure out exactly how the new equilibrium between those things will fall in this new distributed world.
Pramod Sadalage:
I think that a lot of practice has been done since then. How do you do database testing? So the notion of, I think Martin uses this word very well, is the database is not in the database layer, it's in the persistence layer. That's where the database starts. And then you have to figure out how your application is using the database [inaudible 00:22:56] and all that stuff, because when you refactor that layer is the one that gets effected the most. And that's where the database operator should be operating, guiding the developers on this is what they could do, I could put a view here, I would put a worksheet column here, I can add a function for your here and this is how you can use that.
Pramod Sadalage:
So, the consenting aspect of a database person with the developers is profound effects at the persistence layer that you can guide the developers into refactoring the database, because ultimately, that's what you want to refactor or effect the persistence layer there.
Rebecca Parsons:
So what about no SQL databases? This book was clearly written at a time when relational databases were still overwhelmingly dominant. But that's changed now. Tell me a little bit about how you view this patterns-based approach to refactoring the databases when we're in a no SQL world.
Pramod Sadalage:
Yeah, some of the patterns may not actually apply. Some may not apply, but the notion of you should be caring about your data from version one to version two or code version on to code version two still applies. And the schema is not in the database anymore but it is in code. And then when it is in code, you have a little bit more option on how you can migrate the data, how you can do certain things. You don't have to migrate all the rows or all the data at the same time. In relational databases, when you split a column, at that point of time it's true for everyone. All the rows in the database plus all the usages of the particular column from the outside.
Pramod Sadalage:
But that's not true in the no SQL sense. Let's say for example, in the document database you decide from now on I'm going to use first name, last name instead of name. Fine, you can keep all three operating at the same time in the database and you can do this concept... I think we tackle in the no SQL book marking, if I remember correct, is the notion of lazy migration. Your code should know the previous schema and the current schema both at the same time and when it is reading, you should look at the document and say, "Oh, this is version three of the document. I'm going to deploy this kind of a partial to read the document and then when I'm read, I'm going to write it back as if it was the latest version."
Pramod Sadalage:
So this notion of I can migrate every document when I read the document, or when I have the immediate access to document, is interesting, gives you very much interesting dimensions on how you can deploy certain things. You can deploy as many refactorings as you want without having to worry about what happens to the data, the locks and all that kind of stuff that you have to worry about in conventional databases. But at same time, you're building this layers and layers of parcels in your code that we have to be aware of, like how many layers do I keep in the code? Do I keep 15 versions of them? Or two or four are good enough for me? That's a trade-off of this version that you need to have. And even within that, when a script for example, the first name, last name, what we talked about, the name gets written to two things, first name, last name, then you have to make sure you migrate that data. The migrate data happens in code instead of in the database but it still happens. So the concepts apply, I'm not that sure that the main guidance apply. Martin?
Martin Fowler:
Yeah, I mean, that makes sense. I mean, the biggest thing that we had to talk about when we talked about no SQL databases when we wrote the book was just pointing out to people that just because they call their database schema-less, doesn't mean it doesn't have a schema. And that the schema has shifted from an explicit schema, into an implicit one, hidden inside all the access code. Which gives you some nice extra flexibilities, but in some ways it makes it worse because it's much harder to find the schema because no one's declared it. You just got to figure it out. And if you've organized your access code badly, that can be quite a hard job to figure out. And you still have to go through the same data, migrations steps, but as Pramod says, you've got a few more options as to how you can do things because of the fact that the database isn't imposing storage constraints. But I mean, I feel the fundamental ideas still apply of take small steps, migrate your code, schema changes if necessary and the data together. And lots of small steps combined together can make a big change.
Pramod Sadalage:
I personally think a lot of that worked in the whole notion of splitting apart monolithic architectures that are talking to monolithic databases, that's on big work that has seen interesting challenges in different places, different types of usages and lots of differences, different types of differences and things like that. Especially places where you don't have tests or you don't have enough knowledge of what all of the access patterns and things like that.
Pramod Sadalage:
The other notion is what are the intersection of evolution in database design and the DevOps practices. How do they help you make a better database team or a better database, interactive database design team as such. So focusing a lot on the intersection of those things to see what happens and hoe it helps, what other practices can be put in place, what kind of testing, what kind of test data provisioning and a bunch of related things so we can make the deployment pipeline to production from dev ready or dev created to deployed, make that pipeline smoother. That's where most of the focus has been.
Rebecca Parsons:
Well, thank you, Pramod, thank you, Martin, for helping us understand the Genesis of this whole notion of refactoring databases and why it's still a very relevant thing. And I would encourage anyone listening to this, if you're not familiar with the concepts, please go out and find them. The book is called Refactoring Databases, coauthored by Pramod and Scott Ambler. So thank you, Pramod, thank you, Martin, and a pleasure as always Neal.