The problem
Diving into Domain Driven Design and trying to take colleagues with me, I find myself returning to the question of persistence. In the case of DDD, Persistence Ignorance more specifically. DDD deals with fetching data through two mechanisms: repositories and traversal. Repositories give access to aggregate roots. From there traversing relations enables navigation to related business entities.
In regards to persistence, the idea of relations bring with it concepts such as “Lazy Loading” and “Eager Fetching”. Should a given relation be loaded in the same shot as its parent or should it be loaded once a client requests the use of it? Because most computers consist of a combination of fast volatile and slow persistent storage, these decisions can have great impact on the performance of an application. A number of very technical frameworks and solution exists only to assist in designing this behavior.
The focus of Domain Driven Design is a domain model, clearly communicating business logic, unpolluted by such technical concerns.
Over a period of time I have come across various solutions to implementing both repository and traversal. I’ve seen several very well thought true solutions – all of which have helped me in understanding more of the persistence aspect. I have yet to come across a solution that feel completely natural, but in the following I will try to shed some light on some of the solutions I have come across and how these could be combined to achieve a yes another implementation.
Existing solutions
The simplest solution is a static configuration. A call to OrderRepository.GetOrder() will return an order with the same pre-specified relations activated each and every time. This solution quickly becomes inadequate due to the diverse ways an entity might be used by a service.
In this post by Ayende Rahien, he describes a service that specifies to the repository for each call, what relations should be eagerly fetched. This leaves the client (service) free to decide exactly what it wants. But it leaves it up to the developer of the client, to make sure a given entity method will execute in an optimized way. The developers must know that the Order.CalculateTotalCost method makes use of all OrderLines and associated Products. If these are not fetched eagerly, performance will suffer. In other words; Developers are required to have knowledge of the inner working of model entity methods, to be able to execute them in a manner retaining performance.
Udi Dahan in response to the post by Ayende suggests introducing interfaces on the domain objects that can be used to determine proper fetching strategy when requested from a repository. OrderRepository would know to load Order with OrderLines when request in the form of IOrderCalculator. IOrderCalculator is partially in the language of the domain in that we DO want to perform calculation on an order. And it feels good to remove the concerns about the concrete fetching strategy from the service layer. But OrderCalculator is still only in there to determine fetching strategy. The domain models freedom from technical concerns described earlier – in this case its Persistence Ignorance – is being compromised.
Both solutions outlined have pros and cons. Ayendes solution retains the integrity of the model and offers flexibility to the developer, but requires a greater knowledge of the inner working of entities. Udi’s solution frees the service layer of specific fetching strategy concerns. It enables the developer to take advantage of business functionality without as much knowledge about internal implementation. But persistence concerns are leaked to the model.
Ideally a persistence framework would handle the optimization of fetching strategy automatically. The service layer should not have to concern itself with persistence. The domain model definitely should not concern itself with persistence. In a perfect world all persistence would be handled by the infrastructure layer. In the case of Domain Driven Development, the concrete implementation of repositories and whatever mechanism (be it proxying or something else) is put in place to handle traversal.
The proposed solution
I presume that most will agree it is practically impossible to construct a repository that can predict how the entities it provides will be used. When it comes down to it, if the service doesn’t know it and the model can’t speak about it, it is hard to know what combined set of relations a service will require from an entity without actually looking at the code. One conclusion is to attempt to construct a repository that investigates the code in entity methods to predict what loading behavior should be applied. I doubt this can be done practically. Another solution is optimizing for all situations. While this “infrastructure side” optimization would keep both the service layer and the domain model free of persistence concerns, we would be back to the first solution described and the outcome would most likely be a lot of unused entities being loaded into memory.
What if a smaller part of the model could be optimized specifically for the task at hand? Would “infrastructure side” optimization still hold a viable solution? Many application are allready devided into such tasks. They are called transactions. In Domain Driven Design transaction are expressed in the Unit Of Work pattern. Unit of work is used to coordinate the writing of information back to the persistent store – but maybe it could be used for coordinating the loading of information too?
Returning to our example of Order.CalculateTotalCost, a corresponding SalesService.CalculateOrderTotalCost() would then start out by creating a new named UnitOfWork instance. This instance would be passed to the repository (or repository factory) and would have the ability to internally configure a data source for eager fetching of certain relations base on the specified Unit Of Work name.
For the sake of example let’s say Linq 4 Sql or nHibernate was the actual persistence mechanism. Lazy loading would be handled transparently and there would be a simple facility for specifying eager loading behavior and there would be the option to output resultant calls to the underlying database, in a console.
If a developer who knew nothing about the CalculateTotalCost method were to use it without any optimization, the result would be a number of SELECT statements: One to the Order table, One to the OrderLine table and finally one to the Product table for each OrderLine associated with the order. The optimization that could be applied here should be quite obvious, even based on just the SQL statements: Configure the order lines to automatically load their products to get rid of the n+1. If order was also configured to load its order lines, we would effectively be down to one trip to the database.
Now say the service had to implement a method that would return the name of the customer associated with an order, based on an order id. Because optimization is related to a named Unit Of Work, for this particular method we would be able to configure the order to automatically load with its associated customer instead of order lines and products.
A concrete implementation
Below are concrete implementations of the Sales service, the Order repository, the Order class and the Unit Of Work. As this is only an example, I did not wrap everything in interfaces as I would normaly and I just used the Linq 4 Sql objects to represent the domain entities. In a real implementation I would of course not have my entities polluted with Linq 4 Sql specific attributes.
The Sales service
public class SalesService
{
public int GetTotalOrderCost(Guid orderId)
{
UnitOfWork unitOfWork = new UnitOfWork("SalesService.CalculateOrderCost");
OrderRepository orderRepository = new OrderRepository(unitOfWork);
Order order = orderRepository.FindById(orderId);
int totalCost = order.CalculateTotalCost();
unitOfWork.Completed();
return totalCost;
}
public string GetOrderCustomerName(Guid orderId)
{
UnitOfWork unitOfWork = new UnitOfWork("SalesService.GetCustomerOrderName");
OrderRepository orderRepository = new OrderRepository(unitOfWork);
Order order = orderRepository.FindById(orderId);
string customerName = order.Customer.Name;
unitOfWork.Completed();
return customerName;
}
}
The Order repository
public class OrderRepository
{
private UnitOfWork unitOfWork;
public OrderRepository(UnitOfWork unitOfWork)
{
this.unitOfWork = unitOfWork;
}
public Order FindById(Guid orderId)
{
return (from o in unitOfWork.DataSource.Orders where o.OrderId == orderId select o).FirstOrDefault();
}
}
The Order class
public partial class Order
{
public int CalculateTotalCost()
{
int totalCost = 0;
foreach (OrderLine currentOrderLine in this.OrderLines)
totalCost += currentOrderLine.Quantity * currentOrderLine.Product.Price;
return totalCost;
}
}
The Unit Of Work class
public class UnitOfWork
{
public SalesDataContext DataSource { get; set; }
public UnitOfWork(string unitOfWorkId)
{
Console.WriteLine("Starting unit of work: " + unitOfWorkId);
DataSource = new SalesDataContext();
DataLoadOptions dataLoadOption = new DataLoadOptions();
switch (unitOfWorkId)
{
case "SalesService.CalculateOrderCost":
dataLoadOption.LoadWith<Order>(o => o.OrderLines);
dataLoadOption.LoadWith<OrderLine>(ol => ol.Product);
break;
case "SalesService.GetCustomerOrderName":
dataLoadOption.LoadWith<Order>(o => o.Customer);
break;
}
DataSource.LoadOptions = dataLoadOption;
DataSource.Log = Console.Out;
}
public void Completed()
{
DataSource.Dispose();
}
}
The Linq 4 Sql .dbml
One thing I would be sure to do when implementing this solution in a production system, would be to remove the actual configuration from the UnitOfWork class and replace it with a pluggable system, where custom configuration classes could be injected / applied. Also string comparison might not be the most elegant of techniques. I contemplated using ENUMS instead and this may indeed be the best solution. I just didn’t feel like communicating such a decision without further investigation.
The cons
I do believe the solution I outlined here could be used with good results in a production system – but I have yet to try it (or even decide to try it) on a real project. There for I would like to share some of the possible downsides and mitigations I have come to observe.
First off it might be difficult for developers to optimize anything based on an output window full of SQL statements. While I feel personally, that this could be done, it might be a deal breaker for some. One way of making it easier to see what is going one could be to speak the domain language e.g. “Customer.OrderLines was activated”, “OrderLine.Product was activated” – or perhaps build some sort of object graph for the developer to examine. In this regard my proposed solution is more like Ayendes and falls a bit short of Udis.
Second there will be situations where the same Unit Of Work will need varying entities and relations, based on some control logic. I guess this is to be expected in any scenarios and is not particular to my solution. For some scenarios it will be a judgment call on the part of the developer – will a particular piece of data be needed most of the time or should it only be loaded when accessed? Unit tests that exercise the business methods would help in making such dissicions.
My conclusion
Thinking it thru the solution i just described really doesn’t differ much from the two pervious – only on a few key points. Like the solution outlined by Udi, optimization is based on intended use and details are shielded from the service. But like Ayendes solution, there is no “meta data” added to the model.
In my opinion optimizing on the infrastructure side, based on Units Of Work is worth investigating further. I find it to be a fine middle ground, building on the best of both of the two solutions discussed earlier, while achieving a greater separation of concerns.
All in all I will definitively be experimenting more with this – maybe even on a future projects.
That said I will reserve final judgment till I find out how this works out in day to day development.
Tags: Domain Driven Design, Domain Driven Development, Eager Fetching, Infrastructure layer, Lazy Loading, Model Driven Design, Persistence, Persistence concerns, Persistence Ignorance, Repository, Unit Of Work
June 23, 2009 at 6:03 am |
The interfaces you describe as meta data aren’t.
They represent the public API of the domain model.
The service layer should not interact with the domain model outside of the use of those interfaces and the exposed events.
For more information on domain events, see this more recent post:
http://www.udidahan.com/2009/06/14/domain-events-salvation/
The order calculator is actually a very poor example since it doesn’t represent a proper use case. The interfaces exposed should each represent one use case.
Oh – and the implementation of fetching strategies I described is already pluggable and doesn’t require any magic strings.
Hope that makes more sense.
June 23, 2009 at 9:45 am |
Hi Udi
I am sure i am speaking out of turn here, as you obviously have a lot more experience with this then i do – i hope you understand that this is only an expression of my entusiasm for the subject and desire to contribute and was in no way ment as a critisism.
I just went thru your posts on Domain Events – great stuff!
But..
Outside of the IOrderCalculator beeing a bad example, why would an Order need a public IOrderCalculator API and not just IOrder, if it was not for the varying fetching strategy? I see how the interface is the perfect concept for specifying what can be accessed – in an optimized fasion – in a certain context. But isent this still leaking technical concerns to the model?
Relating fetching strategy to Use Cases sounds good to me, since i guess they are a natural part of the model. Yet i am not sure that interfaces on individual entities is the best way to identify them? Could a Use Case not be something involving multiple entities?
And as i commented on my own sample implementation i am aware of strings as a week spot
June 23, 2009 at 1:09 pm |
Fetching strategies are outside the domain model – the only thing that is going on is that we’re being explicit about the use case we’re in.
There’s a presentation I gave in QCon a while back online describing this pattern:
http://www.infoq.com/presentations/Making-Roles-Explicit-Udi-Dahan
Hope that helps.
July 30, 2009 at 1:58 pm |
Interesting read.
Do you think it would be possible to create something that can automatically map what it needs. E.g. the first time something is called, the objects it needed are added to a lookup. Then any subsequent calls would check this and get what it needs. This way you don’t need to create a strategy for each bad case. Only downside I can think of here is that if you have a really bad case, the first time it is called the performance would suffer.
I’m not keen of having the SQL in my application as Udi’s example suggests, I prefer the use of stored procedures. Would it be possible to create a bulk of Select commands, so yes it does several queries but still on a single connection.
August 2, 2009 at 12:16 am |
I am glad you enjoyed it
Your idea is not a bad one. A jit-compileish solution might actually be workable, but I personally find it hard to foresee all the possible scenarios. Maybe a half way solution, where fetching strategies would be auto generated, but still baked in would be a viable solution? This would allow for fine tuning as well..
Your question of stored procedures is not one for me to answer. I would prefer to have as little to do with SQL as possible, by using some sort of O/R technology, be it NHibernate or Linq for SQL or whatever.