Profiling Elixir

Posted in and tagged elixir , profiling , benchmarking , n-triples on Feb 28, 2016

I just started playing with Elixir and I’ve been very impressed with the tooling that exists for it, already. One of my first projects was getting a very basic N-Triples Parser working, and I wanted to document how I went about improving its performance.

Initial Implementation

I’m going to skip through to the first implementation, and not talk much about how the code works. It basically just splits n-triples by newlines, runs regex on each line, and then organizes all the statements into a big Map.

defmodule NTriples.Parser do
  alias RDF.Literal
  @ntriples_regex ~r/(?<subject><[^\s]+>|_:([A-Za-z][A-Za-z0-9\-_]*))[
  ]*(?<predicate><[^\s]+>)[
  ]*(?<object><[^\s]+>|_:([A-Za-z][A-Za-z0-9\-_]*)|"(?<literal_string>(?:\\"|[^"])*)"(@(?<literal_language>[a-z]+[\-A-Za-z0-9]*)|\^\^<(?<literal_type>[^>]+)>)?)[ ]*./i
  def parse(content) do
    content
    |> String.split(".\n")
    |> Enum.filter(fn(str) -> String.length(str) > 0 end)
    |> Enum.map(&capture_triple_map/1)
    |> Enum.reduce(%{}, &process_capture/2)
  end

  defp process_capture(nil, accumulator) do
    accumulator
  end

  defp capture_triple_map("") do
    %{}
  end

  defp capture_triple_map(string) do
    Regex.named_captures(@ntriples_regex, string)
  end

  defp process_capture(map, accumulator) when map == %{} do
    accumulator
  end

  defp process_capture(capture_map, accumulator) do
    subject = process_subject(capture_map["subject"])
    predicate = process_subject(capture_map["predicate"])
    object = process_object(capture_map)
    append_triple(accumulator, {subject, predicate, object})
  end

  defp append_triple(map, {subject, predicate, object}) do
    case map do
      %{^subject => %{^predicate => existing_value}} when is_list(existing_value) ->
        new_value = existing_value ++ [object]
        Map.put(map, subject, Map.merge(map[subject], %{predicate => new_value}))
      %{^subject => %{^predicate => existing_value}} ->
        new_value = [existing_value, object]
        Map.put(map, subject, Map.merge(map[subject], %{predicate => new_value}))
      %{^subject => existing_subject_graph} ->
        Map.put(map, subject, Map.merge(map[subject], %{predicate => object}))
      _ ->
        Map.put(map, subject, %{predicate => object})
    end
  end

  defp process_subject("<" <> subject) do
    subject
    |> String.rstrip(?>)
  end

  defp process_object(%{"object" => "<" <> object_uri}) do
    uri = process_subject("<" <> object_uri)
    %{"@id" => uri}
  end

  defp process_object(%{"literal_string" => value, "literal_language" => language}) do
      %Literal{value: value, language: language}
  end
end

Initial Benchmark

Let’s get an initial benchmark, using the Benchfella tool.

Add the dependency to your mix.exs file

  defp deps do
    [
      {:benchfella, "~> 0.3.0", only: [:dev, :test]}
    ]
  end

mix deps.get

Create a benchmark file at bench/ntriples_bench.exs

  defmodule NTriplesBench do
    use Benchfella
    @content elem(File.read("test/fixtures/content.nt"),1)
     
    bench "parse large file" do
      NTriples.parse(@content)
    end
  end

Run first benchmark via mix bench

  Settings:
    duration:      1.0 s
     
  ## NTriplesBench
  [12:28:22] 1/1: parse large file
     
  Finished in 1.47 seconds
     
  ## NTriplesBench
  parse large file           5   223998.60 µs/op

So there’s our first data point: 224 ms. That’s not awful, but in the context of a web application it’s not great.

Profiling Slowness

Let’s use ExProf to profile where the slow bits are.

Add the dependency to mix.exs

  defp deps do
    [
      {:benchfella, "~> 0.3.0", only: [:dev, :test]},
      {:exprof, "~> 0.2.0", only: [:dev, :test]}
    ]
  end

mix deps.get
Pop open an IEX terminal via iex -s mix

Make a little profiler at lib/profilers/ntriples.ex

  defmodule Profilers.NTriples do
    import ExProf.Macro
    @content elem(File.read("test/fixtures/content.nt"),1)
     
    def profile do
      profile do
        run
      end
    end
     
    def run do
      NTriples.parse(@content)
    end
  end

Launch a shell with iex -S mix

Run it

iex(2)> Profilers.NTriples.profile
FUNCTION                                             CALLS        %    TIME  [uS / CALLS]
--------                                             -----  -------    ----  [----------]
'Elixir.NTriples.Parser':parse/1                         1     0.00       0  [      0.00]
'Elixir.Profilers.NTriples':run/0                        1     0.00       0  [      0.00]
erlang:send/2                                            1     0.00       0  [      0.00]
'Elixir.NTriples':parse/1                                1     0.00       1  [      1.00]
binary:split/3                                           1     0.00       1  [      1.00]
'Elixir.String':split/3                                  1     0.00       1  [      1.00]
'Elixir.Enum':map/2                                      1     0.00       1  [      1.00]
'Elixir.Enum':reduce/3                                   2     0.00       1  [      0.50]
lists:reverse/1                                          1     0.00       2  [      2.00]
'Elixir.String':split/2                                  1     0.00       2  [      2.00]
'Elixir.Enum':filter/2                                   1     0.00       3  [      3.00]
'Elixir.Profilers.NTriples':'-profile/0-fun-0-'/0        1     0.00       4  [      4.00]
binary:get_opts_split/2                                  2     0.00       7  [      3.50]
lists:reverse/2                                          1     0.01      60  [     60.00]
'Elixir.Regex':named_captures/2                       6021     0.06     403  [      0.07]
'Elixir.Keyword':'-delete/2-lists^filter/1-0-'/2      6021     0.07     449  [      0.07]
'Elixir.Regex':named_captures/3                       6021     0.07     461  [      0.08]
'Elixir.String.Graphemes':length/1                    6022     0.07     464  [      0.08]
'Elixir.NTriples.Parser':capture_triple_map/1         6021     0.09     573  [      0.10]
'Elixir.Enum':into/2                                  6021     0.10     617  [      0.10]
'Elixir.NTriples.Parser':process_capture/2            6021     0.10     654  [      0.11]
binary:part/2                                         6022     0.11     684  [      0.11]
maps:put/3                                            6021     0.12     763  [      0.13]
'Elixir.Enum':zip/2                                   6021     0.14     872  [      0.14]
'Elixir.Enum':'-map/2-lists^map/1-0-'/2               6022     0.14     896  [      0.15]
'Elixir.NTriples.Parser':'-parse/1-fun-1-'/1          6021     0.14     902  [      0.15]
'Elixir.String':length/1                              6022     0.15     941  [      0.16]
'Elixir.NTriples.Parser':'-parse/1-fun-2-'/2          6021     0.15     960  [      0.16]
'Elixir.Enum':'-filter/2-fun-0-'/3                    6022     0.15     969  [      0.16]
binary:do_split/5                                     6022     0.15     970  [      0.16]
lists:keyfind/3                                      12042     0.15     988  [      0.08]
'Elixir.NTriples.Parser':'-parse/1-fun-0-'/1          6022     0.15     993  [      0.16]
'Elixir.Regex':names/1                                6021     0.16    1015  [      0.17]
'Elixir.Keyword':put/3                                6021     0.16    1022  [      0.17]
'Elixir.Regex':run/3                                  6021     0.16    1042  [      0.17]
'Elixir.Keyword':delete/2                             6021     0.20    1286  [      0.21]
'Elixir.Enum':'-reduce/3-lists^foldl/2-0-'/3         12045     0.21    1356  [      0.11]
'Elixir.Access':get/3                                18061     0.22    1389  [      0.08]
maps:from_list/1                                      6021     0.22    1432  [      0.24]
'Elixir.NTriples.Parser':append_triple/2              6021     0.26    1668  [      0.28]
'Elixir.String':replace_trailing/3                   18057     0.30    1910  [      0.11]
'Elixir.Keyword':get/3                               12042     0.30    1921  [      0.16]
'Elixir.String':rstrip/2                             18057     0.32    2029  [      0.11]
re:inspect/2                                          6021     0.37    2367  [      0.39]
'Elixir.NTriples.Parser':process_object/1             6021     0.39    2503  [      0.42]
maps:find/2                                          18061     0.40    2561  [      0.14]
'Elixir.Access':get/2                                18061     0.43    2771  [      0.15]
'Elixir.Access':fetch/2                              18061     0.49    3170  [      0.18]
'Elixir.NTriples.Parser':process_subject/1           18057     0.51    3313  [      0.18]
maps:merge/2                                         12040     0.66    4259  [      0.35]
binary:matches/3                                         1     0.67    4305  [   4305.00]
'Elixir.Enum':do_zip/2                               42147     0.87    5593  [      0.13]
'Elixir.String':replace_trailing/6                   36114     1.35    8676  [      0.24]
re:run/3                                              6021     2.78   17915  [      2.98]
erlang:'++'/2                                         6008    12.55   80808  [     13.45]
'Elixir.String.Graphemes':do_length/2              1456345    17.98  115788  [      0.08]
'Elixir.String.Graphemes':next_extend_size/2       1450323    18.96  122101  [      0.08]
'Elixir.String.Graphemes':next_grapheme_size/1     1456345    36.99  238230  [      0.16]
-------------------------------------------------  -------  -------  ------  [----------]
Total:                                             4778436  100.00%  644072  [      0.13]

Evaluating Results

The functions which took the largest total amount of time show up at the bottom. In our case the easy win is here:

   'Elixir.String.Graphemes':do_length/2              1456345    17.98  115788  [      0.08]
   'Elixir.String.Graphemes':next_extend_size/2       1450323    18.96  122101  [      0.08]
   'Elixir.String.Graphemes':next_grapheme_size/1     1456345    36.99  238230  [      0.16]

Each “character” in a String is a grapheme, and it’s spending a LONG time finding the length of strings. The only time we do this in the parser is here:

    |> Enum.filter(fn(str) -> String.length(str) > 0 end)

so that we don’t run Regex over empty strings. Let’s try just removing it and letting the regex run.

New Results:

ex_fedora master % mix bench
Compiled lib/ntriples/parser.ex
Settings:
  duration:      1.0 s

## NTriplesBench
[12:41:05] 1/1: parse large file

Finished in 1.57 seconds

## NTriplesBench
parse large file          10   143116.70 µs/op

Down to 143 ms! (and the tests pass)

Repeat

Profiling again:

ex_fedora master % iex -S mix
Erlang/OTP 18 [erts-7.2.1] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Interactive Elixir (1.2.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Profilers.NTriples.profile
FUNCTION                                            CALLS        %    TIME  [uS / CALLS]
--------                                            -----  -------    ----  [----------]
'Elixir.NTriples':parse/1                               1     0.00       0  [      0.00]
code:ensure_loaded/1                                    2     0.00       0  [      0.00]
binary:split/3                                          1     0.00       0  [      0.00]
erlang:send/2                                           1     0.00       0  [      0.00]
error_handler:undefined_function/3                      2     0.00       1  [      0.50]
code:call/1                                             2     0.00       1  [      0.50]
'Elixir.String':split/3                                 1     0.00       1  [      1.00]
'Elixir.String':split/2                                 1     0.00       1  [      1.00]
'Elixir.Enum':reduce/3                                  1     0.00       1  [      1.00]
'Elixir.NTriples.Parser':parse/1                        1     0.00       1  [      1.00]
'Elixir.Profilers.NTriples':run/0                       1     0.00       1  [      1.00]
erlang:function_exported/3                              2     0.00       1  [      0.50]
error_handler:ensure_loaded/1                           2     0.00       2  [      1.00]
binary:get_opts_split/2                                 2     0.00       2  [      1.00]
'Elixir.Enum':map/2                                     1     0.00       2  [      2.00]
code_server:call/2                                      2     0.00       3  [      1.50]
erlang:whereis/1                                        2     0.00       3  [      1.50]
'Elixir.Profilers.NTriples':'-profile/0-fun-0-'/0       1     0.00       4  [      4.00]
'Elixir.Regex':named_captures/2                      6021     0.22     443  [      0.07]
'Elixir.Keyword':'-delete/2-lists^filter/1-0-'/2     6021     0.22     453  [      0.08]
'Elixir.Regex':named_captures/3                      6021     0.27     555  [      0.09]
'Elixir.NTriples.Parser':capture_triple_map/1        6022     0.29     585  [      0.10]
'Elixir.Enum':into/2                                 6021     0.30     607  [      0.10]
'Elixir.NTriples.Parser':process_capture/2           6022     0.34     681  [      0.11]
'Elixir.Enum':'-reduce/3-lists^foldl/2-0-'/3         6023     0.35     704  [      0.12]
binary:part/2                                        6022     0.37     754  [      0.13]
maps:put/3                                           6021     0.41     824  [      0.14]
'Elixir.Enum':'-map/2-lists^map/1-0-'/2              6023     0.45     903  [      0.15]
'Elixir.NTriples.Parser':'-parse/1-fun-0-'/1         6022     0.45     913  [      0.15]
'Elixir.Keyword':put/3                               6021     0.46     931  [      0.15]
'Elixir.Enum':zip/2                                  6021     0.47     953  [      0.16]
lists:keyfind/3                                     12042     0.49     986  [      0.08]
'Elixir.Regex':names/1                               6021     0.50    1006  [      0.17]
binary:do_split/5                                    6022     0.50    1021  [      0.17]
'Elixir.NTriples.Parser':'-parse/1-fun-1-'/2         6022     0.56    1131  [      0.19]
'Elixir.Regex':run/3                                 6021     0.56    1141  [      0.19]
'Elixir.Keyword':delete/2                            6021     0.58    1181  [      0.20]
'Elixir.Access':get/3                               18061     0.79    1609  [      0.09]
'Elixir.NTriples.Parser':append_triple/2             6021     0.87    1769  [      0.29]
'Elixir.Keyword':get/3                              12042     0.94    1900  [      0.16]
re:inspect/2                                         6021     0.96    1949  [      0.32]
'Elixir.String':replace_trailing/3                  18057     1.05    2119  [      0.12]
'Elixir.String':rstrip/2                            18057     1.14    2315  [      0.13]
'Elixir.NTriples.Parser':process_object/1            6021     1.26    2559  [      0.43]
maps:from_list/1                                     6021     1.29    2611  [      0.43]
maps:find/2                                         18061     1.37    2780  [      0.15]
'Elixir.Access':get/2                               18061     1.58    3204  [      0.18]
'Elixir.Access':fetch/2                             18061     1.72    3485  [      0.19]
maps:merge/2                                        12040     1.84    3731  [      0.31]
'Elixir.NTriples.Parser':process_subject/1          18057     2.01    4066  [      0.23]
binary:matches/3                                        1     2.52    5102  [   5102.00]
'Elixir.Enum':do_zip/2                              42147     3.35    6781  [      0.16]
'Elixir.String':replace_trailing/6                  36114     4.62    9355  [      0.26]
re:run/3                                             6021     9.00   18222  [      3.03]
erlang:'++'/2                                        6008    55.89  113202  [     18.84]
-------------------------------------------------  ------  -------  ------  [----------]
Total:                                             385328  100.00%  202555  [      0.53]

Now the slowest piece is the ++ operator for combining two lists, as called here:

        new_value = existing_value ++ [object]

Lists are linked list - PREPENDING a node should be VERY fast, much quicker than adding two lists together. Let’s change that to

        new_value = [object | existing_value]

(NOTE: This switches the order of the objects in the parsing from the order they’re in the file, line by line. However, those terms are explicitly unordered, so it’s technically okay for them to be reversed. If we want to, we can reverse them later.)

Results:

ex_fedora master % mix bench
Compiled lib/ntriples/parser.ex
Settings:
  duration:      1.0 s

## NTriplesBench
[12:44:49] 1/1: parse large file

Finished in 3.31 seconds

## NTriplesBench
parse large file          50   52658.56 µs/op

52 ms!

The End

That’s all there is to it. At this point if you run the profiler again, the biggest thing taking up time is running the regular expressions. Speeding it up at that point probably means building a real syntax parser - but for now, this solution works.

Hope this helps!

ActiveFedora Relationships

Posted in and tagged ruby , rails , active-fedora on Jun 1, 2015

In a recent sprint on ActiveFedora Aggregations I was working on relationships in ActiveFedora and felt like I should catalogue my understanding of how they work.

Entry Point

Let’s start with the has_many method, as used below:

class Collection < ActiveFedora::Base
end
class MyObject < ActiveFedora::Base
  has_many :collections, :class_name => "Collection"
end

That has_many method is defined in the associations module which is included into ActiveFedora::Base here: lib/active_fedora/associations.rb#L142-L144

The relevant code looks like this:

def has_many(name, options={})
  Builder::HasMany.build(self, name, options)
end

As you can see, when you call has_many on the ActiveFedora class it delegates down to the HasMany builder.

Builder

Builders are responsible for setting up a Reflection (a registry of the metadata for an association) and defining readers/writers on the class it was called on. The class for HasMany is defined here: lib/active_fedora/associations/builder/has_many.rb

The relevant code looks like this:

def self.build(model, name, options)
  reflection = new(model, name, options).build
  define_accessors(model, reflection)
  define_callbacks(model, reflection)
  reflection
end

def build
  reflection = super
  configure_dependency
  reflection
end

When the super call is inlined it looks like this:

def self.build(model, name, options)
  reflection = new(model, name, options).build
  define_accessors(model, reflection)
  define_callbacks(model, reflection)
  reflection
end

def build
  configure_dependency if options[:dependent] # see https://github.com/rails/rails/commit/9da52a5e55cc665a539afb45783f84d9f3607282
  reflection = model.create_reflection(self.class.macro, name, options, model)
  configure_dependency
  reflection
end

A reflection is created from the calling model (MyObject) which allows you to obtain all relevant information about the association by calling MyObject.reflections

After that it defines accessors and callbacks. Let’s look at the accessors created.

Accessors

Accessors are defined for has_many in a function that looks like this:

def self.define_readers(mixin, name)
  super

  mixin.redefine_method("#{name.to_s.singularize}_ids") do
    association(name).ids_reader
  end
end

If you inline the super call:

def self.define_readers(mixin, name)
  mixin.send(:define_method, name) do |*params|
    association(name).reader(*params)
  end

  mixin.redefine_method("#{name.to_s.singularize}_ids") do
    association(name).ids_reader
  end
end

This makes it so if you do has_many :plums it will define an instance method called plums on the object which then calls the association’s reader method.

The reader method:

# Implements the reader method, e.g. foo.items for Foo.has_many :items
# @param opts [Boolean, Hash] if true, force a reload
# @option opts [Symbol] :response_format can be ':solr' to return a solr result.
def reader(opts = false)
  if opts.kind_of?(Hash)
    if opts.delete(:response_format) == :solr
      return load_from_solr(opts)
    end
    raise ArgumentError, "Hash parameter must include :response_format=>:solr (#{opts.inspect})"
  else
    force_reload = opts
  end
  reload if force_reload || stale_target?
  @proxy ||= CollectionProxy.new(self)
end

Effectively what happens is when you call #plums you get back a CollectionProxy object, which is defined here: lib/active_fedora/associations/collection_proxy.rb

On this object are methods you might expect - #find, #first, #last, etc. If you wanted to change what methods are available on a set of related objects, that’s where you’d change things.

And that’s it

That’s all there is to it. A builder gets called when you do things like has_many or belongs_to, that builder stores the metadata about the association, it defines a reader which delegates down to an association object (such as lib/active_fedora/associations/has_many_association.rb), and the reader often defines a proxy object to handle actions on the associated items as a whole.

For me the biggest trouble I had was tracing the path. To do so, just start at the associations module and work your way down.

Abusing Decorators

Posted in and tagged ruby , patterns , decorators on May 14, 2015

The Decorator Pattern

The Decorator Pattern’s goal is to dynamically add responsibilities to an object instance during runtime. This lets an individual object instance behave differently depending on the circumstances and requirements it may have at the time. This flexibility is powerful, but with power comes danger - so let’s abuse the pattern until it hurts.

Decorators in Ruby

When I implement a new pattern in ruby I like to attach it to an interface I define ahead of time, that way I can easily swap out individual instances and as long as they conform to that interface I can call it an “X”.

The Interface

A Decorator is any object which responds to new, takes one argument (the instance to be decorated), and returns an object that responds to at least the same interface as the previous object and includes additional behavior.

The Code

I like to use SimpleDelegator to accomplish this. Let’s take a simple ActiveRecord class.

class Person < ActiveRecord::Base
end

Now let’s say we want to keep a close look at all people whose names are “Bob.” Whenever those records are saved, log a message.

class LogsBobSaves < SimpleDelegator
  def save(*args)
    with_logger do
      super(*args)
    end
  end

  private

  def with_logger
    yield.tap do |result|
      if name == "Bob"
        logger.info "Bob was saved"
      end
    end
  end

  def logger
    Rails.logger
  end
end

This works, but one of the benefits of using the decorator pattern is you can use dependency injection to make even more flexible objects which can use a variety of collaborators.

class LogsBobSaves < SimpleDelegator
  attr_reader :logger
  def initialize(obj, logger)
    @logger = logger
    super(obj)
  end

  def save(*args)
    with_logger do
      super(*args)
    end
  end

  private

  def with_logger
    yield.tap do |result|
      if name == "Bob"
        logger.info "Bob was saved"
      end
    end
  end
end

Now you can initialize a Person which saves logs about Bob by doing the following:

person = LogsBobSaves.new(Person.new(:name => "Bob"), Rails.logger)
person.save # => Logs the message "Bob was saved"

But wait! We said earlier the interface was that there was only one argument to #new, and this has two. Now begins the abuse: Let’s use the Adapter pattern to maintain the interface.

class DecoratorWithArguments
  attr_reader :decorator, :args
  def initialize(decorator, *args)
    @decorator = decorator
    @args = args
  end

  def new(obj)
    decorator.new(obj, *args)
  end
end

person = Person.new(:name => "Bob")
decorator = DecoratorWithArguments.new(LogsBobSaves, Rails.logger)
person = decorator.new(person) # One argument
person.save # => Logs the message "Bob was saved"

Interface Importance

The adapter above is clearly more complicated than just passing the logger in. Why maintain an arbitrary (and simple) interface?

Let’s say I want to also log when the name is Fred.

class LogsNameSaves < SimpleDelegator
  attr_reader :logger, :name_check
  def initialize(obj, name_check, logger)
    @logger = logger
    @name_check = name_check
    super(obj)
  end

  def save(*args)
    with_logger do
      super(*args)
    end
  end

  private

  def with_logger
    yield.tap do |result|
      if name == name_check
        logger.info "#{name} was saved"
      end
    end
  end
end

person = Person.new(:name => "Fred")
decorator1 = DecoratorWithArguments.new(LogsNameSaves, "Bob", Rails.logger)
decorator2 = DecoratorWithArguments.new(LogsNameSaves, "Fred", Rails.logger)
person = decorator1.new(person)
person = decorator2.new(person) # Decorate a second time.
person.save # => Logs
person.name = "Bob"
person.save # => Logs

To apply multiple decorators we have to call #new on them multiple times. We have a consistent interface, and we could end up decorating an arbitrary number of times, so let’s use the Composite pattern to encapsulate that.

class DecoratorList
  attr_reader :decorators

  def initialize(*decorators)
    @decorators = decorators
  end

  def new(undecorated_object)
    decorators.inject(undecorated_object) do |obj, decorator|
      decorator.new(obj)
    end
  end
end

person = Person.new(:name => "Bob")
decorator = DecoratorList.new(
  DecoratorWithArguments.new(LogsNameSaves, "Bob", Rails.logger),
  DecoratorWithArguments.new(LogsNameSaves, "Fred", Rails.logger),
)
person = decorator.new(person) # One decoration call results in two decorations.
person.save # => Logs
person.name = "Fred"
person.save # => Logs

Now how about adding Susan when we already have two decorators?

decorator = DecoratorList.new(
  DecoratorWithArguments.new(LogsNameSaves, "Bob", Rails.logger),
  DecoratorWithArguments.new(LogsNameSaves, "Fred", Rails.logger),
)
decorator = DecoratorList.new(
  decorator,
  DecoratorWithArguments.new(LogsNameSaves, "Susan", Rails.logger),
)
person = decorator.new(person) # One decoration call results in two decorations.

The beauty of Composites - you can build composites of composites, because they all maintain the same simple interface.

It was a bit of a bother having to decorate at all, why not encapsulate what decorations go into creating a loggable person?

class LoggablePersonFactory
  def new(*args)
    decorator.new(Person.new(*args))
  end

  private

  def decorator
    DecoratorList.new(*decorators)
  end

  def decorators
    logging_names.map do |name|
      DecoratorWithArguments.new(LogsNameSaves, name, logger)
    end
  end

  def logging_names
    ["Bob", "Fred", "Susan"]
  end

  def logger
    Rails.logger
  end
end
person = LoggablePersonFactory.new(:name => "Fred")
person.save # => Logs
person.name = "Susan"
person.save # => Logs

The Pros

Decorators result in immensely configurable behavior enhancement.
One object can have different behavior from another despite sharing a class.
Composing behaviors allows for single responsibility and yet powerful objects.
Dependencies can be injected, but only have to be done at the decorator level.
You never have to ask “how can I get an object that doesn’t do X?” Just don’t decorate the object with that behavior.

The Cons

Deciding when and where to apply decorators is difficult.
SimpleDelegators can sometimes lose context (in the above, if you use “save!” instead of “save” it’s not defined on the decorator, so it won’t log even if the decorated object calls “save” in its “save!” implementation.)
Instantiating all the decorators is expensive.
As you abuse the pattern it becomes difficult to keep track of which behaviors are available to you.

To solve 1, 3, and 4 I like to define concrete factories whose responsibility is generating objects with a specific subset of decorators. If you need a combination of factories, just compose them.

Problem 2 is more difficult - all you can do is make sure that your calling objects are using interfaces you’ve defined on the decorator. Ruby’s implementation is a “Delegator”, and lacks more context-aware decoration that’s available in other languages.

Should I Decorate?

I’m not sure. I am. It gives me the strengths of something like multiple inheritance with the flexibility to change my mind. Further, unlike mixins decoration doesn’t allow for two-way interactions - my underlying object will never come to rely on outside code, so I can be sure that it works independent of my decorators and thus gives me a smaller axis to debug.

However, as above, it can be abused. I’ve been leaning on the side of decorating lately, and it’s becoming time to take a step back and see where it doesn’t work. When I have that boundary clearly defined, I’ll post again.

Programming is all about figuring out when and where to apply a solution.

Response to Nothing is Something

Posted in and tagged ruby , rails , composition on May 6, 2015

Nothing is Something

For RailsConf 2015 Sandi Metz gave a fantastic talk with two parts: the first discussed the null object pattern and the second talked about composition over inheritance. This post is some of my thoughts on the second half.

However, before I get started, please go watch her talk: Nothing is Something

Composition Over Inheritance

That talk is one of the best and most clearly communicated examples of the benefits of composition I’ve seen to date. However, I’m writing this because I feel like things could have gone just a step further.

At the end of the talk we’re left with code that looks like this:

class House
  attr_reader :data, :formatter
  DATA = [
    "the horse and the hound and the horn that belonged to",
    "the malt that lay in",
    "the house that Jack built"
  ]
  def initialize(orderer: DefaultOrder.new, formatter: DefaultFormatter.new)
    @formatter = formatter
    @data = orderer.order(DATA)
  end

  def recite
    (1..data.length).map {|i| line(i)}.join("\n")
  end

  def line(number)
    "This is #{phrase(number)}.\n"
  end

  def phrase(number)
    parts(number).join(" ")
  end

  def parts(number)
    formatter.format(data.last(number))
  end
end

class DefaultFormatter
  def format(parts)
    parts
  end
end

class EchoFormatter
  def format(parts)
    parts.zip(parts).flatten
  end
end

class RandomOrder
  def order(data)
    data.shuffle
  end
end

class DefaultOrder
  def order(data)
    data
  end
end

Define Responsibilities

This is great code, but in a step towards refactoring even further it’s important to define responsibilities.

House should have one responsibility: recite. Take a dataset, use something to order and format it, and then output it. However, in order to fulfill that responsibility it must also know how to join an array of terms such that’s recitable and extract a certain piece of those terms for recitation.

Formatters take a phrase represented as an array of terms and formats them to be joined by a space.

Orderers take a full dataset and orders them.

Analyze Dependencies

Formatters have one dependency: an array of terms.

Orderers have one dependency: an array of terms to order.

House has two dependencies: a formatter and an orderer.

Reduce Responsibilities

The Single Responsibility Principle says that each object should only have one reason to change, and House has three right now. Let’s see if we can get it down.

First up is to extract the format piece.

class HouseData
  attr_reader :data, :formatter
  def initialize(data:, formatter: DefaultFormatter.new)
    @data = data
    @formatter = formatter
  end

  def length
    data.length
  end

  def phrase(number)
    parts(number).join(" ")
  end

  private

  def parts(number)
    formatter.format(data.last(number))
  end
end
class House
  attr_reader :data, :formatter
  DATA = [
    "the horse and the hound and the horn that belonged to",
    "the malt that lay in",
    "the house that Jack built"
  ]
  def initialize(orderer: DefaultOrder.new, formatter: DefaultFormatter.new)
    @formatter = formatter
    @data = HouseData.new(data: orderer.order(DATA), formatter: formatter)
  end

  def recite
    (1..data.length).map {|i| line(i)}.join("\n")
  end

  def line(number)
    "This is #{data.phrase(number)}.\n"
  end

end

Now House just knows how to take something that responds to #phrase and #length and recite them. HouseData knows what it takes to turn an array of terms into phrases.

Analyze Dependency Usage

If you look at the constructor for House you’ll notice that orderer is only used to initialize data and formatter is only used to be passed off into HouseData. A good rule is that if dependencies are only used in constructors, extract them and pass the good value in as the parameter instead. This makes your class just that much more flexible.

class HouseData
  attr_reader :data, :formatter
  def initialize(data:, formatter: DefaultFormatter.new)
    @data = data
    @formatter = formatter
  end

  def length
    data.length
  end

  def phrase(number)
    parts(number).join(" ")
  end

  private

  def parts(number)
    formatter.format(data.last(number))
  end
end
class House
  attr_reader :data, :formatter
  DATA = [
    "the horse and the hound and the horn that belonged to",
    "the malt that lay in",
    "the house that Jack built"
  ]
  def initialize(data: data)
    @data = data
  end

  def recite
    (1..data.length).map {|i| line(i)}.join("\n")
  end

  def line(number)
    "This is #{data.phrase(number)}.\n"
  end
end

# Initialization
house_data = HouseData.new(data: House::DATA)
House.new(house_data).recite

You’ll notice that since you’re passing in the data, there’s no need for a “default orderer” - just pass in the data.

Finale

Now you’re left with this:

class HouseData
  attr_reader :data, :formatter
  def initialize(data:, formatter: DefaultFormatter.new)
    @data = data
    @formatter = formatter
  end

  def length
    data.length
  end

  def phrase(number)
    parts(number).join(" ")
  end

  private

  def parts(number)
    formatter.format(data.last(number))
  end
end

class House
  attr_reader :data
  DATA = [
    "the horse and the hound and the horn that belonged to",
    "the malt that lay in",
    "the house that Jack built"
  ]
  def initialize(data: data)
    @data = data
  end

  def recite
    (1..data.length).map {|i| line(i)}.join("\n")
  end

  def line(number)
    "This is #{data.phrase(number)}.\n"
  end
end

class DefaultFormatter
  def format(parts)
    parts
  end
end

class EchoFormatter
  def format(parts)
    parts.zip(parts).flatten
  end
end

class RandomOrder
  def order(data)
    data.shuffle
  end
end

# Initialization
house_data = HouseData.new(data: House::DATA)

random_house_data = HouseData.new(data: RandomOrder.new.order(house_data.data))

echo_house_data = HouseData.new(data: house_data.data, formatter: EchoFormatter.new)

random_echo_house_data = HouseData.new(data: random_house_data.data, formatter: echo_house_data.formatter)

# Recitation

House.new(data: house_data).recite

Everything has one responsibility and is completely flexible. The only other thing I might do is simplify the interface for Formatter and Order. Something like

class EchoFormatter
  def self.call(parts)
    parts.zip(parts).flatten
  end
end

class DefaultFormatter
  def self.call(data)
    data
  end
end

class RandomOrder
  def self.call(data)
    data.shuffle
  end
end

would make it easier to remember what to do to use the dependencies.

Compliments

I just want to leave a final note of thanks to Sandi Metz for an excellent talk. It provided a much better example of composition over inheritance for my team than I’ve seen before. Highly recommended.

Isolated Testing

Posted in and tagged ruby , rails , rspec , capybara , unit-tests , mocks on Apr 30, 2015

To see the previous post in this series please check out Avoiding Integration Tests

Test Driven Development

When I left off I had fast, predictable, and reliable unit tests which, if I was careful, would ensure that my application worked. However - I’d been promised something: if I wrote the tests first, they would tell me if I was architecting the application well.

I didn’t feel that way yet.

Finding Complexity

Experience and (a lot) of reading later I decided that a good application architecture is one which fulfills the use cases with the smallest possible amount of complexity and the lowest amount of coupling. Complexity in applications comes, primarily, from two places:

Branches
Dependencies

My tests were showing me the branches - if I had too many, I would have too many tests over a web of possibilities. +1 to unit tests for this.

However, writing the tests was not showing the dependencies it would have to call for them to pass. If they could do that, then the tests could show me if my architecture could be better.

Surfacing Dependencies

RSpec-Mocks has some great tools for mocking out dependencies. I won’t go into detail here, but I suggest reading some documentation.

Let’s go through a quick example of the two methods of testing. This is a test to make sure that an object is valid if its name is “Bob” and it has “banana” set to true.

require 'rails_helper'

RSpec.describe MyObject do
  describe "#valid?" do
    context "when name is Bob" do
      it "should be valid" do
        obj = MyObject.new
        obj.name = "Bob"

        expect(obj).to be_valid
      end
    end
  end
end

No dependencies have been mocked out and there’s no clear interface about what’s going on in the background. However, it was -very- easy to write.

Imagine the class getting bigger and there being more validations. Do you test them with MyObject? Do you extract validators? What should the interface be? What was the interface you committed to in the first place?

The power of showing the dependencies can be seen when the pattern you’ve chosen is less than ideal:

# app/models/my_object.rb
class MyObject
  def valid?
    BobValidator.new.validate(self) && BananaValidator.new.validate(self)
  end
end

# spec/models/my_object_spec.rb
require 'rails_helper'

RSpec.describe MyObject do
  subject { MyObject.new }
  describe "#valid?" do
    context "when name is Bob and banana is true" do
      it "should be valid" do
        subject.name = "Bob"
        subject.banana = true

        expect(subject).to be_valid
      end
    end
    context "when name is not Bob and banana is true" do
      it "should be invalid" do
        subject.name = "Joe"
        subject.banana = true

        expect(subject).not_to be_valid
      end
    end
    context "when name is not Bob and banana is false" do
      it "should be invalid" do
        subject.name = "Joe"
        subject.banana = false

        expect(subject).not_to be_valid
      end
    end
    # etc..
  end
end

It’s pretty easy to keep this up, and seems to make sense. Set the parameters, check the result.

If you’d mocked dependencies it would have looked like this:

require 'rails_helper'

RSpec.describe MyObject do
  subject { MyObject.new }
  context "when name is not Bob and banana is false" do
    it "should be invalid" do
      bob_validator = instance_double(BobValidator)
      allow(BobValidator).to receive(:new).and_return(bob_validator)
      allow(bob_validator).to receive(:validate).with(subject).and_return(false)

      banana_validator = instance_double(BananaValidator)
      allow(BananaValidator).to receive(:new).and_return(banana_validator)
      allow(banana_validator).to receive(:validate).with(subject).and_return(false)

      expect(subject).not_to be_valid
    end
  end
end

This right here is ridiculous. Six lines of setup? For one test with two validations? That hurt to write.

Something must be wrong.

The test said there was something wrong with the architecture, and now there’s an easy place to iterate. Maybe we should pass it in via dependency injection?

require 'rails_helper'

RSpec.describe MyObject do
  context "when name is not Bob and banana is false" do
    it "should be invalid" do
      bob_validator = instance_double(BobValidator)
      banana_validator = instance_double(BananaValidator)
      obj = MyObject.new(bob_validator, banana_validator)

      allow(bob_validator).to receive(:validate).with(obj).and_return(false)
      allow(banana_validator).to receive(:validate).with(obj).and_return(false)

      expect(obj).not_to be_valid
    end
  end
end

That’s better..now we don’t have to stub that the items get created. I only want to inject one thing though, this could easily get out of control. What if there was a validator which took validators as an argument and returned true if both of ITS validators were true? Then you just pass that one validator in, and only have one dependency. Then you just test each collaborator, and the behavior of the dependency which takes dependencies, and you’re good!

And now you have the composite pattern, and a clean set of dependencies, because you could easily see the interfaces and dependency graph.

Why I Do This

I’ve only been a professional programmer for a few years. Before that I had very little formal training, little understanding of patterns and practices, and had a flawed idea of what “good” architecture was. I’d always learned via one simple method: I beat my head against something until it finally works.

That’s great for hacking, but not for architecture. The wall to hit against was too far away, required too much experience to find, and was often hazy - determining why something was a good practice took a tremendous time investment.

Bringing the interface and dependencies to the front in my tests DEFINES the wall and brings it closer. I get nearly immediate feedback on whether or not I’ve chosen correctly. In this way I can beat my way towards better software.

Isolated testing enables me to develop better applications than my experience says I should be able to.