Lessons from building a Multi-Tenant Web App - Part 1 - Databases
Multi-Tenant apps are the bread and butter of the SaaS push the past few years. You need to be able to offer your customers instant setup & provisioning while at the same time segmenting their information away from others.
Shared Database vs Databases per Tenant
The key decision you have to make when setting up your MTA is whether you need to share any information between tenants.
If so, you might want to take a more traditional approach of throwing everyone in the same database and adding a
tenant_id field to each table to segment who owns what data. Frameworks like Laravel have features known as query scopes that can help facilitate this. They segment the data by automatically adding the WHERE clauses to your queries so you don't inadvertently mix customer data up.
However, if you do not need to share data between your customers (which is what I would think the majority of SaaS apps would not want to do) there is no reason to keep everything in one database. In fact it's in your code's best interest not to do things that way.
I know this because at RunMyBusiness we originally started with the shared approach and realized this wouldn't scale in the long run. The amount of data coming in from marketing campaigns & the complexity to shard that data later on when a client wanted dedicated hardware was too great to stay on a single shared database.
So about 2 years ago while we were upgrading to a new version of our codebase we took the steps to split everyone into their own database with a single shared database to act as the "router" of sorts.
When a request comes in, we check the domain name against the shared database to lookup the tenant. If found, we then bootstrap an new database connection (named "tenant_db" to the correct tenant database named "tenant_1234") through a service provider.
We are using Laravel 5 as our application framework, so we can use Eloquent's
$connection variable on our models to choose which connection each model should use. The shared tables (tenant & tenant_domain) use the "shared_db" connection while tables such as users, properties, & photos use the "tenant_db" connection.
Here are a list of pros/cons for a separate database per tenant:
- Simplified architecture -- Most of your code doesn't need to worry about the tenant as it is set in a service provider at the start of the request.
- Better information security -- Similar to above, if there is only 1 place that has to worry about choosing the right tenant you can ensure your customer's information isn't being exposed to other customers.
- Ability to use off the shelf packages -- Most third party packages you may want to use in your application do not take the "Shared Database" approach into consideration. To use these packages you would have to modify/overload their functionality to include the
tenant_idparameter we mentioned above. By using this split database architecture you can use plugins without modification.
- Data Portability -- If your client decides they want to leave your service you can export the data without hassle for import into another service.
- Enterprise Clients -- If your client gets too big for your shared service and wants their own dedicated hardware its a cinch to export their data and move it to dedicated machines.
- Client Clusters -- Much like the enterprise clients, you could also split half of your customers to an east coast datacenter and the other half to the west coast datacenter depending on where they were located. This would allow you to run multiple "clusters" of your application so if one went down your other customers would still be available. (Obviously you would want to have other failover measures here as well).
- Beta Features -- If you want to deploy beta features to a segment of clients its not difficult to only migrate some of your clients' data and keep the others running on the more stable branches.
- More complicated migrations -- When doing database migrations you must migrate all tenants which may take longer.
- Must pass tenant_id to background jobs -- If you're using a background job queue like beanstalkd you're going to need to pass your
tenant_idin each job to make sure the worker inits the tenant properly before processing the job.
- Must segment other services -- If you use third party services you must segment your clients' data on those as well. For instance if you use Mandrill for outbound emails you would want to make sure that you include a namespace in your identifier to correctly route responses & stats back to the correct client.
As you can see, there are some pretty good reasons to use the split database architecture over the shared architecture but only you will be able to make the decisions for your own app. I plan to release a package for Laravel based on the code our app is currently using that will help you tackle the Cons listed above, most of which are just making sure you're always using the correct tenant when using third-party services.
Please let me know what you think in the comments!