Profiling Elixir
I just started playing with Elixir and I’ve been very impressed with the tooling that exists for it, already. One of my first projects was getting a very basic N-Triples Parser working, and I wanted to document how I went about improving its performance.
Initial Implementation
I’m going to skip through to the first implementation, and not talk much about how the code works. It basically just splits n-triples by newlines, runs regex on each line, and then organizes all the statements into a big Map.
defmodule NTriples.Parser do
alias RDF.Literal
@ntriples_regex ~r/(?<subject><[^\s]+>|_:([A-Za-z][A-Za-z0-9\-_]*))[
]*(?<predicate><[^\s]+>)[
]*(?<object><[^\s]+>|_:([A-Za-z][A-Za-z0-9\-_]*)|"(?<literal_string>(?:\\"|[^"])*)"(@(?<literal_language>[a-z]+[\-A-Za-z0-9]*)|\^\^<(?<literal_type>[^>]+)>)?)[ ]*./i
def parse(content) do
content
|> String.split(".\n")
|> Enum.filter(fn(str) -> String.length(str) > 0 end)
|> Enum.map(&capture_triple_map/1)
|> Enum.reduce(%{}, &process_capture/2)
end
defp process_capture(nil, accumulator) do
accumulator
end
defp capture_triple_map("") do
%{}
end
defp capture_triple_map(string) do
Regex.named_captures(@ntriples_regex, string)
end
defp process_capture(map, accumulator) when map == %{} do
accumulator
end
defp process_capture(capture_map, accumulator) do
subject = process_subject(capture_map["subject"])
predicate = process_subject(capture_map["predicate"])
object = process_object(capture_map)
append_triple(accumulator, {subject, predicate, object})
end
defp append_triple(map, {subject, predicate, object}) do
case map do
%{^subject => %{^predicate => existing_value}} when is_list(existing_value) ->
new_value = existing_value ++ [object]
Map.put(map, subject, Map.merge(map[subject], %{predicate => new_value}))
%{^subject => %{^predicate => existing_value}} ->
new_value = [existing_value, object]
Map.put(map, subject, Map.merge(map[subject], %{predicate => new_value}))
%{^subject => existing_subject_graph} ->
Map.put(map, subject, Map.merge(map[subject], %{predicate => object}))
_ ->
Map.put(map, subject, %{predicate => object})
end
end
defp process_subject("<" <> subject) do
subject
|> String.rstrip(?>)
end
defp process_object(%{"object" => "<" <> object_uri}) do
uri = process_subject("<" <> object_uri)
%{"@id" => uri}
end
defp process_object(%{"literal_string" => value, "literal_language" => language}) do
%Literal{value: value, language: language}
end
end
Initial Benchmark
Let’s get an initial benchmark, using the Benchfella tool.
-
Add the dependency to your
mix.exsfiledefp deps do [ {:benchfella, "~> 0.3.0", only: [:dev, :test]} ] end mix deps.get- Create a benchmark file at
bench/ntriples_bench.exsdefmodule NTriplesBench do use Benchfella @content elem(File.read("test/fixtures/content.nt"),1) bench "parse large file" do NTriples.parse(@content) end end - Run first benchmark via
mix benchSettings: duration: 1.0 s ## NTriplesBench [12:28:22] 1/1: parse large file Finished in 1.47 seconds ## NTriplesBench parse large file 5 223998.60 µs/op
So there’s our first data point: 224 ms. That’s not awful, but in the context of a web application it’s not great.
Profiling Slowness
Let’s use ExProf to profile where the slow bits are.
-
Add the dependency to mix.exs
defp deps do [ {:benchfella, "~> 0.3.0", only: [:dev, :test]}, {:exprof, "~> 0.2.0", only: [:dev, :test]} ] end mix deps.get- Pop open an IEX terminal via
iex -s mix -
Make a little profiler at
lib/profilers/ntriples.exdefmodule Profilers.NTriples do import ExProf.Macro @content elem(File.read("test/fixtures/content.nt"),1) def profile do profile do run end end def run do NTriples.parse(@content) end end - Launch a shell with
iex -S mix -
Run it
iex(2)> Profilers.NTriples.profile FUNCTION CALLS % TIME [uS / CALLS] -------- ----- ------- ---- [----------] 'Elixir.NTriples.Parser':parse/1 1 0.00 0 [ 0.00] 'Elixir.Profilers.NTriples':run/0 1 0.00 0 [ 0.00] erlang:send/2 1 0.00 0 [ 0.00] 'Elixir.NTriples':parse/1 1 0.00 1 [ 1.00] binary:split/3 1 0.00 1 [ 1.00] 'Elixir.String':split/3 1 0.00 1 [ 1.00] 'Elixir.Enum':map/2 1 0.00 1 [ 1.00] 'Elixir.Enum':reduce/3 2 0.00 1 [ 0.50] lists:reverse/1 1 0.00 2 [ 2.00] 'Elixir.String':split/2 1 0.00 2 [ 2.00] 'Elixir.Enum':filter/2 1 0.00 3 [ 3.00] 'Elixir.Profilers.NTriples':'-profile/0-fun-0-'/0 1 0.00 4 [ 4.00] binary:get_opts_split/2 2 0.00 7 [ 3.50] lists:reverse/2 1 0.01 60 [ 60.00] 'Elixir.Regex':named_captures/2 6021 0.06 403 [ 0.07] 'Elixir.Keyword':'-delete/2-lists^filter/1-0-'/2 6021 0.07 449 [ 0.07] 'Elixir.Regex':named_captures/3 6021 0.07 461 [ 0.08] 'Elixir.String.Graphemes':length/1 6022 0.07 464 [ 0.08] 'Elixir.NTriples.Parser':capture_triple_map/1 6021 0.09 573 [ 0.10] 'Elixir.Enum':into/2 6021 0.10 617 [ 0.10] 'Elixir.NTriples.Parser':process_capture/2 6021 0.10 654 [ 0.11] binary:part/2 6022 0.11 684 [ 0.11] maps:put/3 6021 0.12 763 [ 0.13] 'Elixir.Enum':zip/2 6021 0.14 872 [ 0.14] 'Elixir.Enum':'-map/2-lists^map/1-0-'/2 6022 0.14 896 [ 0.15] 'Elixir.NTriples.Parser':'-parse/1-fun-1-'/1 6021 0.14 902 [ 0.15] 'Elixir.String':length/1 6022 0.15 941 [ 0.16] 'Elixir.NTriples.Parser':'-parse/1-fun-2-'/2 6021 0.15 960 [ 0.16] 'Elixir.Enum':'-filter/2-fun-0-'/3 6022 0.15 969 [ 0.16] binary:do_split/5 6022 0.15 970 [ 0.16] lists:keyfind/3 12042 0.15 988 [ 0.08] 'Elixir.NTriples.Parser':'-parse/1-fun-0-'/1 6022 0.15 993 [ 0.16] 'Elixir.Regex':names/1 6021 0.16 1015 [ 0.17] 'Elixir.Keyword':put/3 6021 0.16 1022 [ 0.17] 'Elixir.Regex':run/3 6021 0.16 1042 [ 0.17] 'Elixir.Keyword':delete/2 6021 0.20 1286 [ 0.21] 'Elixir.Enum':'-reduce/3-lists^foldl/2-0-'/3 12045 0.21 1356 [ 0.11] 'Elixir.Access':get/3 18061 0.22 1389 [ 0.08] maps:from_list/1 6021 0.22 1432 [ 0.24] 'Elixir.NTriples.Parser':append_triple/2 6021 0.26 1668 [ 0.28] 'Elixir.String':replace_trailing/3 18057 0.30 1910 [ 0.11] 'Elixir.Keyword':get/3 12042 0.30 1921 [ 0.16] 'Elixir.String':rstrip/2 18057 0.32 2029 [ 0.11] re:inspect/2 6021 0.37 2367 [ 0.39] 'Elixir.NTriples.Parser':process_object/1 6021 0.39 2503 [ 0.42] maps:find/2 18061 0.40 2561 [ 0.14] 'Elixir.Access':get/2 18061 0.43 2771 [ 0.15] 'Elixir.Access':fetch/2 18061 0.49 3170 [ 0.18] 'Elixir.NTriples.Parser':process_subject/1 18057 0.51 3313 [ 0.18] maps:merge/2 12040 0.66 4259 [ 0.35] binary:matches/3 1 0.67 4305 [ 4305.00] 'Elixir.Enum':do_zip/2 42147 0.87 5593 [ 0.13] 'Elixir.String':replace_trailing/6 36114 1.35 8676 [ 0.24] re:run/3 6021 2.78 17915 [ 2.98] erlang:'++'/2 6008 12.55 80808 [ 13.45] 'Elixir.String.Graphemes':do_length/2 1456345 17.98 115788 [ 0.08] 'Elixir.String.Graphemes':next_extend_size/2 1450323 18.96 122101 [ 0.08] 'Elixir.String.Graphemes':next_grapheme_size/1 1456345 36.99 238230 [ 0.16] ------------------------------------------------- ------- ------- ------ [----------] Total: 4778436 100.00% 644072 [ 0.13]
Evaluating Results
The functions which took the largest total amount of time show up at the bottom. In our case the easy win is here:
'Elixir.String.Graphemes':do_length/2 1456345 17.98 115788 [ 0.08]
'Elixir.String.Graphemes':next_extend_size/2 1450323 18.96 122101 [ 0.08]
'Elixir.String.Graphemes':next_grapheme_size/1 1456345 36.99 238230 [ 0.16]
Each “character” in a String is a grapheme, and it’s spending a LONG time finding the length of strings. The only time we do this in the parser is here:
|> Enum.filter(fn(str) -> String.length(str) > 0 end)
so that we don’t run Regex over empty strings. Let’s try just removing it and letting the regex run.
New Results:
ex_fedora master % mix bench
Compiled lib/ntriples/parser.ex
Settings:
duration: 1.0 s
## NTriplesBench
[12:41:05] 1/1: parse large file
Finished in 1.57 seconds
## NTriplesBench
parse large file 10 143116.70 µs/op
Down to 143 ms! (and the tests pass)
Repeat
Profiling again:
ex_fedora master % iex -S mix
Erlang/OTP 18 [erts-7.2.1] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Interactive Elixir (1.2.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Profilers.NTriples.profile
FUNCTION CALLS % TIME [uS / CALLS]
-------- ----- ------- ---- [----------]
'Elixir.NTriples':parse/1 1 0.00 0 [ 0.00]
code:ensure_loaded/1 2 0.00 0 [ 0.00]
binary:split/3 1 0.00 0 [ 0.00]
erlang:send/2 1 0.00 0 [ 0.00]
error_handler:undefined_function/3 2 0.00 1 [ 0.50]
code:call/1 2 0.00 1 [ 0.50]
'Elixir.String':split/3 1 0.00 1 [ 1.00]
'Elixir.String':split/2 1 0.00 1 [ 1.00]
'Elixir.Enum':reduce/3 1 0.00 1 [ 1.00]
'Elixir.NTriples.Parser':parse/1 1 0.00 1 [ 1.00]
'Elixir.Profilers.NTriples':run/0 1 0.00 1 [ 1.00]
erlang:function_exported/3 2 0.00 1 [ 0.50]
error_handler:ensure_loaded/1 2 0.00 2 [ 1.00]
binary:get_opts_split/2 2 0.00 2 [ 1.00]
'Elixir.Enum':map/2 1 0.00 2 [ 2.00]
code_server:call/2 2 0.00 3 [ 1.50]
erlang:whereis/1 2 0.00 3 [ 1.50]
'Elixir.Profilers.NTriples':'-profile/0-fun-0-'/0 1 0.00 4 [ 4.00]
'Elixir.Regex':named_captures/2 6021 0.22 443 [ 0.07]
'Elixir.Keyword':'-delete/2-lists^filter/1-0-'/2 6021 0.22 453 [ 0.08]
'Elixir.Regex':named_captures/3 6021 0.27 555 [ 0.09]
'Elixir.NTriples.Parser':capture_triple_map/1 6022 0.29 585 [ 0.10]
'Elixir.Enum':into/2 6021 0.30 607 [ 0.10]
'Elixir.NTriples.Parser':process_capture/2 6022 0.34 681 [ 0.11]
'Elixir.Enum':'-reduce/3-lists^foldl/2-0-'/3 6023 0.35 704 [ 0.12]
binary:part/2 6022 0.37 754 [ 0.13]
maps:put/3 6021 0.41 824 [ 0.14]
'Elixir.Enum':'-map/2-lists^map/1-0-'/2 6023 0.45 903 [ 0.15]
'Elixir.NTriples.Parser':'-parse/1-fun-0-'/1 6022 0.45 913 [ 0.15]
'Elixir.Keyword':put/3 6021 0.46 931 [ 0.15]
'Elixir.Enum':zip/2 6021 0.47 953 [ 0.16]
lists:keyfind/3 12042 0.49 986 [ 0.08]
'Elixir.Regex':names/1 6021 0.50 1006 [ 0.17]
binary:do_split/5 6022 0.50 1021 [ 0.17]
'Elixir.NTriples.Parser':'-parse/1-fun-1-'/2 6022 0.56 1131 [ 0.19]
'Elixir.Regex':run/3 6021 0.56 1141 [ 0.19]
'Elixir.Keyword':delete/2 6021 0.58 1181 [ 0.20]
'Elixir.Access':get/3 18061 0.79 1609 [ 0.09]
'Elixir.NTriples.Parser':append_triple/2 6021 0.87 1769 [ 0.29]
'Elixir.Keyword':get/3 12042 0.94 1900 [ 0.16]
re:inspect/2 6021 0.96 1949 [ 0.32]
'Elixir.String':replace_trailing/3 18057 1.05 2119 [ 0.12]
'Elixir.String':rstrip/2 18057 1.14 2315 [ 0.13]
'Elixir.NTriples.Parser':process_object/1 6021 1.26 2559 [ 0.43]
maps:from_list/1 6021 1.29 2611 [ 0.43]
maps:find/2 18061 1.37 2780 [ 0.15]
'Elixir.Access':get/2 18061 1.58 3204 [ 0.18]
'Elixir.Access':fetch/2 18061 1.72 3485 [ 0.19]
maps:merge/2 12040 1.84 3731 [ 0.31]
'Elixir.NTriples.Parser':process_subject/1 18057 2.01 4066 [ 0.23]
binary:matches/3 1 2.52 5102 [ 5102.00]
'Elixir.Enum':do_zip/2 42147 3.35 6781 [ 0.16]
'Elixir.String':replace_trailing/6 36114 4.62 9355 [ 0.26]
re:run/3 6021 9.00 18222 [ 3.03]
erlang:'++'/2 6008 55.89 113202 [ 18.84]
------------------------------------------------- ------ ------- ------ [----------]
Total: 385328 100.00% 202555 [ 0.53]
Now the slowest piece is the ++ operator for combining two lists, as called here:
new_value = existing_value ++ [object]
Lists are linked list - PREPENDING a node should be VERY fast, much quicker than adding two lists together. Let’s change that to
new_value = [object | existing_value]
(NOTE: This switches the order of the objects in the parsing from the order they’re in the file, line by line. However, those terms are explicitly unordered, so it’s technically okay for them to be reversed. If we want to, we can reverse them later.)
Results:
ex_fedora master % mix bench
Compiled lib/ntriples/parser.ex
Settings:
duration: 1.0 s
## NTriplesBench
[12:44:49] 1/1: parse large file
Finished in 3.31 seconds
## NTriplesBench
parse large file 50 52658.56 µs/op
52 ms!
The End
That’s all there is to it. At this point if you run the profiler again, the biggest thing taking up time is running the regular expressions. Speeding it up at that point probably means building a real syntax parser - but for now, this solution works.
Hope this helps!
ActiveFedora Relationships
In a recent sprint on ActiveFedora Aggregations I was working on relationships in ActiveFedora and felt like I should catalogue my understanding of how they work.
Entry Point
Let’s start with the has_many method, as used below:
class Collection < ActiveFedora::Base
end
class MyObject < ActiveFedora::Base
has_many :collections, :class_name => "Collection"
end
That has_many method is defined in the associations module which is included into ActiveFedora::Base here: lib/active_fedora/associations.rb#L142-L144
The relevant code looks like this:
def has_many(name, options={})
Builder::HasMany.build(self, name, options)
end
As you can see, when you call has_many on the ActiveFedora class it delegates down to the HasMany builder.
Builder
Builders are responsible for setting up a Reflection (a registry of the metadata for an association) and defining readers/writers on the class it was called on. The class for HasMany is defined here: lib/active_fedora/associations/builder/has_many.rb
The relevant code looks like this:
def self.build(model, name, options)
reflection = new(model, name, options).build
define_accessors(model, reflection)
define_callbacks(model, reflection)
reflection
end
def build
reflection = super
configure_dependency
reflection
end
When the super call is inlined it looks like this:
def self.build(model, name, options)
reflection = new(model, name, options).build
define_accessors(model, reflection)
define_callbacks(model, reflection)
reflection
end
def build
configure_dependency if options[:dependent] # see https://github.com/rails/rails/commit/9da52a5e55cc665a539afb45783f84d9f3607282
reflection = model.create_reflection(self.class.macro, name, options, model)
configure_dependency
reflection
end
A reflection is created from the calling model (MyObject) which allows you to
obtain all relevant information about the association by calling
MyObject.reflections
After that it defines accessors and callbacks. Let’s look at the accessors created.
Accessors
Accessors are defined for has_many in a function that looks like this:
def self.define_readers(mixin, name)
super
mixin.redefine_method("#{name.to_s.singularize}_ids") do
association(name).ids_reader
end
end
If you inline the super call:
def self.define_readers(mixin, name)
mixin.send(:define_method, name) do |*params|
association(name).reader(*params)
end
mixin.redefine_method("#{name.to_s.singularize}_ids") do
association(name).ids_reader
end
end
This makes it so if you do has_many :plums it will define an instance method
called plums on the object which then calls the association’s reader method.
The reader method:
# Implements the reader method, e.g. foo.items for Foo.has_many :items
# @param opts [Boolean, Hash] if true, force a reload
# @option opts [Symbol] :response_format can be ':solr' to return a solr result.
def reader(opts = false)
if opts.kind_of?(Hash)
if opts.delete(:response_format) == :solr
return load_from_solr(opts)
end
raise ArgumentError, "Hash parameter must include :response_format=>:solr (#{opts.inspect})"
else
force_reload = opts
end
reload if force_reload || stale_target?
@proxy ||= CollectionProxy.new(self)
end
Effectively what happens is when you call #plums you get back a
CollectionProxy object, which is defined here:
lib/active_fedora/associations/collection_proxy.rb
On this object are methods you might expect - #find, #first, #last, etc.
If you wanted to change what methods are available on a set of related objects,
that’s where you’d change things.
And that’s it
That’s all there is to it. A builder gets called when you do things like
has_many or belongs_to, that builder stores the metadata about the
association, it defines a reader which delegates down to an association object
(such as
lib/active_fedora/associations/has_many_association.rb),
and the reader often defines a proxy object to handle actions on the associated
items as a whole.
For me the biggest trouble I had was tracing the path. To do so, just start at the associations module and work your way down.
Abusing Decorators
The Decorator Pattern
The Decorator Pattern’s goal is to dynamically add responsibilities to an object instance during runtime. This lets an individual object instance behave differently depending on the circumstances and requirements it may have at the time. This flexibility is powerful, but with power comes danger - so let’s abuse the pattern until it hurts.
Decorators in Ruby
When I implement a new pattern in ruby I like to attach it to an interface I define ahead of time, that way I can easily swap out individual instances and as long as they conform to that interface I can call it an “X”.
The Interface
A Decorator is any object which responds to new, takes one argument (the instance to be decorated), and returns an object that responds to at least the same interface as the previous object and includes additional behavior.
The Code
I like to use SimpleDelegator to accomplish this. Let’s take a simple ActiveRecord class.
class Person < ActiveRecord::Base
end
Now let’s say we want to keep a close look at all people whose names are “Bob.” Whenever those records are saved, log a message.
class LogsBobSaves < SimpleDelegator
def save(*args)
with_logger do
super(*args)
end
end
private
def with_logger
yield.tap do |result|
if name == "Bob"
logger.info "Bob was saved"
end
end
end
def logger
Rails.logger
end
end
This works, but one of the benefits of using the decorator pattern is you can use dependency injection to make even more flexible objects which can use a variety of collaborators.
class LogsBobSaves < SimpleDelegator
attr_reader :logger
def initialize(obj, logger)
@logger = logger
super(obj)
end
def save(*args)
with_logger do
super(*args)
end
end
private
def with_logger
yield.tap do |result|
if name == "Bob"
logger.info "Bob was saved"
end
end
end
end
Now you can initialize a Person which saves logs about Bob by doing the following:
person = LogsBobSaves.new(Person.new(:name => "Bob"), Rails.logger)
person.save # => Logs the message "Bob was saved"
But wait! We said earlier the interface was that there was only one argument to
#new, and this has two. Now begins the abuse: Let’s use the Adapter pattern to
maintain the interface.
class DecoratorWithArguments
attr_reader :decorator, :args
def initialize(decorator, *args)
@decorator = decorator
@args = args
end
def new(obj)
decorator.new(obj, *args)
end
end
person = Person.new(:name => "Bob")
decorator = DecoratorWithArguments.new(LogsBobSaves, Rails.logger)
person = decorator.new(person) # One argument
person.save # => Logs the message "Bob was saved"
Interface Importance
The adapter above is clearly more complicated than just passing the logger in. Why maintain an arbitrary (and simple) interface?
Let’s say I want to also log when the name is Fred.
class LogsNameSaves < SimpleDelegator
attr_reader :logger, :name_check
def initialize(obj, name_check, logger)
@logger = logger
@name_check = name_check
super(obj)
end
def save(*args)
with_logger do
super(*args)
end
end
private
def with_logger
yield.tap do |result|
if name == name_check
logger.info "#{name} was saved"
end
end
end
end
person = Person.new(:name => "Fred")
decorator1 = DecoratorWithArguments.new(LogsNameSaves, "Bob", Rails.logger)
decorator2 = DecoratorWithArguments.new(LogsNameSaves, "Fred", Rails.logger)
person = decorator1.new(person)
person = decorator2.new(person) # Decorate a second time.
person.save # => Logs
person.name = "Bob"
person.save # => Logs
To apply multiple decorators we have to call #new on them multiple times. We
have a consistent interface, and we could end up decorating an arbitrary number
of times, so let’s use the Composite pattern to encapsulate that.
class DecoratorList
attr_reader :decorators
def initialize(*decorators)
@decorators = decorators
end
def new(undecorated_object)
decorators.inject(undecorated_object) do |obj, decorator|
decorator.new(obj)
end
end
end
person = Person.new(:name => "Bob")
decorator = DecoratorList.new(
DecoratorWithArguments.new(LogsNameSaves, "Bob", Rails.logger),
DecoratorWithArguments.new(LogsNameSaves, "Fred", Rails.logger),
)
person = decorator.new(person) # One decoration call results in two decorations.
person.save # => Logs
person.name = "Fred"
person.save # => Logs
Now how about adding Susan when we already have two decorators?
decorator = DecoratorList.new(
DecoratorWithArguments.new(LogsNameSaves, "Bob", Rails.logger),
DecoratorWithArguments.new(LogsNameSaves, "Fred", Rails.logger),
)
decorator = DecoratorList.new(
decorator,
DecoratorWithArguments.new(LogsNameSaves, "Susan", Rails.logger),
)
person = decorator.new(person) # One decoration call results in two decorations.
The beauty of Composites - you can build composites of composites, because they all maintain the same simple interface.
It was a bit of a bother having to decorate at all, why not encapsulate what decorations go into creating a loggable person?
class LoggablePersonFactory
def new(*args)
decorator.new(Person.new(*args))
end
private
def decorator
DecoratorList.new(*decorators)
end
def decorators
logging_names.map do |name|
DecoratorWithArguments.new(LogsNameSaves, name, logger)
end
end
def logging_names
["Bob", "Fred", "Susan"]
end
def logger
Rails.logger
end
end
person = LoggablePersonFactory.new(:name => "Fred")
person.save # => Logs
person.name = "Susan"
person.save # => Logs
The Pros
- Decorators result in immensely configurable behavior enhancement.
- One object can have different behavior from another despite sharing a class.
- Composing behaviors allows for single responsibility and yet powerful objects.
- Dependencies can be injected, but only have to be done at the decorator level.
- You never have to ask “how can I get an object that doesn’t do X?” Just don’t decorate the object with that behavior.
The Cons
- Deciding when and where to apply decorators is difficult.
- SimpleDelegators can sometimes lose context (in the above, if you use “save!” instead of “save” it’s not defined on the decorator, so it won’t log even if the decorated object calls “save” in its “save!” implementation.)
- Instantiating all the decorators is expensive.
- As you abuse the pattern it becomes difficult to keep track of which behaviors are available to you.
To solve 1, 3, and 4 I like to define concrete factories whose responsibility is generating objects with a specific subset of decorators. If you need a combination of factories, just compose them.
Problem 2 is more difficult - all you can do is make sure that your calling objects are using interfaces you’ve defined on the decorator. Ruby’s implementation is a “Delegator”, and lacks more context-aware decoration that’s available in other languages.
Should I Decorate?
I’m not sure. I am. It gives me the strengths of something like multiple inheritance with the flexibility to change my mind. Further, unlike mixins decoration doesn’t allow for two-way interactions - my underlying object will never come to rely on outside code, so I can be sure that it works independent of my decorators and thus gives me a smaller axis to debug.
However, as above, it can be abused. I’ve been leaning on the side of decorating lately, and it’s becoming time to take a step back and see where it doesn’t work. When I have that boundary clearly defined, I’ll post again.
Programming is all about figuring out when and where to apply a solution.
Response to Nothing is Something
Nothing is Something
For RailsConf 2015 Sandi Metz gave a fantastic talk with two parts: the first discussed the null object pattern and the second talked about composition over inheritance. This post is some of my thoughts on the second half.
However, before I get started, please go watch her talk: Nothing is Something
Composition Over Inheritance
That talk is one of the best and most clearly communicated examples of the benefits of composition I’ve seen to date. However, I’m writing this because I feel like things could have gone just a step further.
At the end of the talk we’re left with code that looks like this:
class House
attr_reader :data, :formatter
DATA = [
"the horse and the hound and the horn that belonged to",
"the malt that lay in",
"the house that Jack built"
]
def initialize(orderer: DefaultOrder.new, formatter: DefaultFormatter.new)
@formatter = formatter
@data = orderer.order(DATA)
end
def recite
(1..data.length).map {|i| line(i)}.join("\n")
end
def line(number)
"This is #{phrase(number)}.\n"
end
def phrase(number)
parts(number).join(" ")
end
def parts(number)
formatter.format(data.last(number))
end
end
class DefaultFormatter
def format(parts)
parts
end
end
class EchoFormatter
def format(parts)
parts.zip(parts).flatten
end
end
class RandomOrder
def order(data)
data.shuffle
end
end
class DefaultOrder
def order(data)
data
end
end
Define Responsibilities
This is great code, but in a step towards refactoring even further it’s important to define responsibilities.
House should have one responsibility: recite. Take a dataset, use something to order
and format it, and then output it. However, in order to fulfill that
responsibility it must also know how to join an array of terms such that’s
recitable and extract a certain piece of those terms for recitation.
Formatters take a phrase represented as an array of terms and formats them to
be joined by a space.
Orderers take a full dataset and orders them.
Analyze Dependencies
Formatters have one dependency: an array of terms.
Orderers have one dependency: an array of terms to order.
House has two dependencies: a formatter and an orderer.
Reduce Responsibilities
The Single Responsibility Principle says that each object should only have one
reason to change, and House has three right now. Let’s see if we can get it
down.
First up is to extract the format piece.
class HouseData
attr_reader :data, :formatter
def initialize(data:, formatter: DefaultFormatter.new)
@data = data
@formatter = formatter
end
def length
data.length
end
def phrase(number)
parts(number).join(" ")
end
private
def parts(number)
formatter.format(data.last(number))
end
end
class House
attr_reader :data, :formatter
DATA = [
"the horse and the hound and the horn that belonged to",
"the malt that lay in",
"the house that Jack built"
]
def initialize(orderer: DefaultOrder.new, formatter: DefaultFormatter.new)
@formatter = formatter
@data = HouseData.new(data: orderer.order(DATA), formatter: formatter)
end
def recite
(1..data.length).map {|i| line(i)}.join("\n")
end
def line(number)
"This is #{data.phrase(number)}.\n"
end
end
Now House just knows how to take something that responds to #phrase and #length and recite them. HouseData knows what it takes to turn an array of terms into phrases.
Analyze Dependency Usage
If you look at the constructor for House you’ll notice that orderer is only
used to initialize data and formatter is only used to be passed off into
HouseData. A good rule is that if dependencies are only used in constructors,
extract them and pass the good value in as the parameter instead. This makes
your class just that much more flexible.
class HouseData
attr_reader :data, :formatter
def initialize(data:, formatter: DefaultFormatter.new)
@data = data
@formatter = formatter
end
def length
data.length
end
def phrase(number)
parts(number).join(" ")
end
private
def parts(number)
formatter.format(data.last(number))
end
end
class House
attr_reader :data, :formatter
DATA = [
"the horse and the hound and the horn that belonged to",
"the malt that lay in",
"the house that Jack built"
]
def initialize(data: data)
@data = data
end
def recite
(1..data.length).map {|i| line(i)}.join("\n")
end
def line(number)
"This is #{data.phrase(number)}.\n"
end
end
# Initialization
house_data = HouseData.new(data: House::DATA)
House.new(house_data).recite
You’ll notice that since you’re passing in the data, there’s no need for a “default orderer” - just pass in the data.
Finale
Now you’re left with this:
class HouseData
attr_reader :data, :formatter
def initialize(data:, formatter: DefaultFormatter.new)
@data = data
@formatter = formatter
end
def length
data.length
end
def phrase(number)
parts(number).join(" ")
end
private
def parts(number)
formatter.format(data.last(number))
end
end
class House
attr_reader :data
DATA = [
"the horse and the hound and the horn that belonged to",
"the malt that lay in",
"the house that Jack built"
]
def initialize(data: data)
@data = data
end
def recite
(1..data.length).map {|i| line(i)}.join("\n")
end
def line(number)
"This is #{data.phrase(number)}.\n"
end
end
class DefaultFormatter
def format(parts)
parts
end
end
class EchoFormatter
def format(parts)
parts.zip(parts).flatten
end
end
class RandomOrder
def order(data)
data.shuffle
end
end
# Initialization
house_data = HouseData.new(data: House::DATA)
random_house_data = HouseData.new(data: RandomOrder.new.order(house_data.data))
echo_house_data = HouseData.new(data: house_data.data, formatter: EchoFormatter.new)
random_echo_house_data = HouseData.new(data: random_house_data.data, formatter: echo_house_data.formatter)
# Recitation
House.new(data: house_data).recite
Everything has one responsibility and is completely flexible. The only other thing I might do is simplify the interface for Formatter and Order. Something like
class EchoFormatter
def self.call(parts)
parts.zip(parts).flatten
end
end
class DefaultFormatter
def self.call(data)
data
end
end
class RandomOrder
def self.call(data)
data.shuffle
end
end
would make it easier to remember what to do to use the dependencies.
Compliments
I just want to leave a final note of thanks to Sandi Metz for an excellent talk. It provided a much better example of composition over inheritance for my team than I’ve seen before. Highly recommended.
Isolated Testing
To see the previous post in this series please check out Avoiding Integration Tests
Test Driven Development
When I left off I had fast, predictable, and reliable unit tests which, if I was careful, would ensure that my application worked. However - I’d been promised something: if I wrote the tests first, they would tell me if I was architecting the application well.
I didn’t feel that way yet.
Finding Complexity
Experience and (a lot) of reading later I decided that a good application architecture is one which fulfills the use cases with the smallest possible amount of complexity and the lowest amount of coupling. Complexity in applications comes, primarily, from two places:
- Branches
- Dependencies
My tests were showing me the branches - if I had too many, I would have too many tests over a web of possibilities. +1 to unit tests for this.
However, writing the tests was not showing the dependencies it would have to call for them to pass. If they could do that, then the tests could show me if my architecture could be better.
Surfacing Dependencies
RSpec-Mocks has some great tools for mocking out dependencies. I won’t go into
detail here, but I suggest reading some
documentation.
Let’s go through a quick example of the two methods of testing. This is a test to make sure that an object is valid if its name is “Bob” and it has “banana” set to true.
require 'rails_helper'
RSpec.describe MyObject do
describe "#valid?" do
context "when name is Bob" do
it "should be valid" do
obj = MyObject.new
obj.name = "Bob"
expect(obj).to be_valid
end
end
end
end
No dependencies have been mocked out and there’s no clear interface about what’s going on in the background. However, it was -very- easy to write.
Imagine the class getting bigger and there being more validations. Do you test them with MyObject? Do you extract validators? What should the interface be? What was the interface you committed to in the first place?
The power of showing the dependencies can be seen when the pattern you’ve chosen is less than ideal:
# app/models/my_object.rb
class MyObject
def valid?
BobValidator.new.validate(self) && BananaValidator.new.validate(self)
end
end
# spec/models/my_object_spec.rb
require 'rails_helper'
RSpec.describe MyObject do
subject { MyObject.new }
describe "#valid?" do
context "when name is Bob and banana is true" do
it "should be valid" do
subject.name = "Bob"
subject.banana = true
expect(subject).to be_valid
end
end
context "when name is not Bob and banana is true" do
it "should be invalid" do
subject.name = "Joe"
subject.banana = true
expect(subject).not_to be_valid
end
end
context "when name is not Bob and banana is false" do
it "should be invalid" do
subject.name = "Joe"
subject.banana = false
expect(subject).not_to be_valid
end
end
# etc..
end
end
It’s pretty easy to keep this up, and seems to make sense. Set the parameters, check the result.
If you’d mocked dependencies it would have looked like this:
require 'rails_helper'
RSpec.describe MyObject do
subject { MyObject.new }
context "when name is not Bob and banana is false" do
it "should be invalid" do
bob_validator = instance_double(BobValidator)
allow(BobValidator).to receive(:new).and_return(bob_validator)
allow(bob_validator).to receive(:validate).with(subject).and_return(false)
banana_validator = instance_double(BananaValidator)
allow(BananaValidator).to receive(:new).and_return(banana_validator)
allow(banana_validator).to receive(:validate).with(subject).and_return(false)
expect(subject).not_to be_valid
end
end
end
This right here is ridiculous. Six lines of setup? For one test with two validations? That hurt to write.
Something must be wrong.
The test said there was something wrong with the architecture, and now there’s an easy place to iterate. Maybe we should pass it in via dependency injection?
require 'rails_helper'
RSpec.describe MyObject do
context "when name is not Bob and banana is false" do
it "should be invalid" do
bob_validator = instance_double(BobValidator)
banana_validator = instance_double(BananaValidator)
obj = MyObject.new(bob_validator, banana_validator)
allow(bob_validator).to receive(:validate).with(obj).and_return(false)
allow(banana_validator).to receive(:validate).with(obj).and_return(false)
expect(obj).not_to be_valid
end
end
end
That’s better..now we don’t have to stub that the items get created. I only want to inject one thing though, this could easily get out of control. What if there was a validator which took validators as an argument and returned true if both of ITS validators were true? Then you just pass that one validator in, and only have one dependency. Then you just test each collaborator, and the behavior of the dependency which takes dependencies, and you’re good!
And now you have the composite pattern, and a clean set of dependencies, because you could easily see the interfaces and dependency graph.
Why I Do This
I’ve only been a professional programmer for a few years. Before that I had very little formal training, little understanding of patterns and practices, and had a flawed idea of what “good” architecture was. I’d always learned via one simple method: I beat my head against something until it finally works.
That’s great for hacking, but not for architecture. The wall to hit against was too far away, required too much experience to find, and was often hazy - determining why something was a good practice took a tremendous time investment.
Bringing the interface and dependencies to the front in my tests DEFINES the wall and brings it closer. I get nearly immediate feedback on whether or not I’ve chosen correctly. In this way I can beat my way towards better software.
Isolated testing enables me to develop better applications than my experience says I
should be able to.