In a previous blogpost I argued for the use of the Shake build system, and for writing Shake rules like they are recipes. I also recommended the use of newCache
, addOracle
, and addOracleCache
in situations where it's tricky to do recipes. newCache
, addOracle
, and addOracleCache
are super useful, but they have imposing types, not the most helpful names, and they look similar, making it hard to know which one we need. In this post we're going to look at each in a little more detail. Let's start by briefly introducing these functions.
Three functions with different uses
newCache
allows us to memoize a function. A memoized function stores all the results it calculates. When a memoized function gets called with arguments it has seen before, it returns an earlier calculated result rather than repeating work. We can use newCache
to prevent duplicate work during a build but not across builds, because Shake throws away memoized results after each run.
addOracle
allows us to define dependencies that aren't files. Say that we want a rule to rebuild when the date changes. We can write some Haskell to get the current date and make it available as a dependency using addOracle
. These oracles run on every build because Shake needs to know if the values they return have changed, in which case the rules that depend on them need to be re-evaluated.
addOracleCache
, whatever its name implies, has a use case different from either newCache
or addOracle
. We can use it to create a rule that produces a Haskell value instead of a file. Suppose we have an expensive calculation, like recursively finding all the dependencies of a source file. We could define a regular build rule using %>
that performs the calculation and stores the result in a sources.json
file. Other rules could need ["sources.json"]
, decode its contents and use the result. addOracleCache
allows us to do the same thing without the encoding and decoding steps. Like other build rules and unlike addOracle
, a rule defined using addOracleCache
reruns if any of its dependencies changes.
To summarize what these three functions do before looking at each in more detail:
-
newCache
memoizes a function call within the current build. -
addOracle
defines dependencies that arenβt files. -
addOracleCache
creates a rule that produces a Haskell value instead of a file.
Now let's look at these in more detail.
How to use newCache
With newCache
we can create a memoized function. This is a function that when called several times with the same argument will run just once. The memoized function stores the result of the first run for use in future calls.
newCache
has the following type.
newCache :: (Eq k, Hashable k) =>
(k -> Action v) -> Rules (k -> Action v)
^^^^^^^^^^^^^ ^^^^^^^^^^^^^
the function the memoized
to memoize function
If we skip over the type constraints before the =>
first (we'll get to them in a moment!), then we can see that newCache
takes a function and then returns a memoized version of it. Wherever we were using the original function we can use the memoized version too, because they have the same type.
To add memoization behavior Shake needs to know whether we called the function with a particular argument before. That requires comparing the input argument with input arguments from previous calls, and so we need to insist the input argument is of a type that allows such comparisons. Shake does this by requiring the input type to meet the Eq
and Hashable
constraints.
One tricky situation where newCache
can help us out is when writing a rule for a command that produces more than one file, but where we don't want to hardcode which files it produces. Suppose we have a script generate-schemas.sh
that generates JSON schemas for common types.
rules :: Rules ()
rules = do
"schemas/*.schema.json" %> \_ -> do
schemaSrcs <- getDirectoryFiles "" ["elm/src/ApiTypes//*.elm"]
need ("generate-schemas.sh" : schemaSrcs)
cmd_ "generate-schemas.sh" schemaSrcs
The rule above works but is pretty inefficient. generate-schemas.sh
creates all the schema files in a single run, but the rule above will rerun it every time we need
a different schema. Let's use newCache
to remove this duplication of work.
rules :: Rules ()
rules = do
generateSchemasCached <- newCache $ \() -> do
schemaSrcs <- getDirectoryFiles "" ["elm/src/ApiTypes//*.elm"]
need ("generate-schemas.sh" : schemaSrcs)
cmd_ "generate-schemas.sh" schemaSrcs
"schemas/*.schema.json" %> \_ -> generateSchemasCached ()
Now if more than one schema files is need
ed during a run, generate-schemas.sh
is ran just once. Later runs might run generate-schema.sh
again, because newCache
doesn't save results across builds. That's good though, because we might have deleted schema files in the interim.
How to use addOracle
Rules can signal they have a dependency on one or more files using need
. This causes the rule to rebuild if the file they depend on changes. But what if you'd like your rule to depend on the day of the week, or the version of a tool? Using addOracle
you can define such dependencies.
Say we have a rule that creates a letter from a template.
rules :: Rules ()
rules = do
"letter.txt" %> \out -> do
date <- liftIO Date.current
need ["letter.template"]
cmd_
"templateer letter.template"
"--out" [out]
"--options" ["date=" <> date]
Imagine we're working on this until late in the evening. The next day we want to post it, so we run Shake one more time to get today's date on there. Shake skips the update though because it doesn't know it should rebuild the letter when the date changes.
If we had a file somewhere on our computer that always contained the current date then we could need
that. As it stands the current date is not a file, it's the result of a call to the (made up) Date.current
function. We can use addOracle
to turn it into a dependency.
Let's start again by looking at addOracle
's type. Notice how apart from the functions' constraints (the part of the type before the =>
) addOracle
has the exact same type as newCache
.
addOracle :: (RuleResult q ~ a, ShakeValue q, ShakeValue a) =>
(q -> Action a) -> Rules (q -> Action a)
^^^^^^^^^^^^^ ^^^^^^^^^^^^^
type ShakeValue a = (Show a, Typeable a, Eq a, Hashable a, Binary a, NFData a)
Like newCache
addOracle
takes a function and returns a function of the same type. This new function adds a dependency in any rule it gets called in. Shake requires a bunch of constraints on the function's argument type q
and return type a
to pull this off. Check out the ShakeValue
documentation if you're interested in learning what these constraints are for. We'll see in a moment what RuleResult q ~ a
is about.
We can use addOracle
to fix our letter templating rule like so.
rules :: Rules ()
rules = do
currentDate <- addOracle $ \CurrentDate -> liftIO Date.current
"letter.txt" %> \out -> do
date <- currentDate CurrentDate
need ["letter.template"]
cmd_
"templateer letter.template"
"--out" [out]
"--options" ["date=" <> date]
data CurrentDate = CurrentDate
deriving (Show, Eq, Generic)
instance Hashable CurrentDate
instance Binary CurrentDate
instance NFData CurrentDate
type instance RuleResult CurrentDate = String
We made a similar change when we introduced newCache
: We extract into a function the lines we want to add special behavior to (memoization behavior in case of newCache
, dependency-tracking behavior now). We wrap our extracted function using the right helper, and use function this returns in our rule.
What's different this time is that Shake requires each oracle created using addOracle
or addOracleCache
to have a unique type (we'll see why in a moment). We create one called CurrentDate
and use Generic
to generate all the instances Shake requires of the type. We also have to tell Shake that the oracle associated with the CurrentDate
input type always returns a String
result type (the return value of our imaginary Date.current
function).
Intermezzo: What's up with these boilerplate types?
The reason oracles argument types need to be unique and the reason we for the RuleResult q ~ a
constraint is to support the askOracle
function in Shake's APIs. Using it our letter templating example looks like this:
rules :: Rules ()
rules = do
void . addOracle $ \CurrentDate -> liftIO Date.current
"letter.txt" %> \out -> do
date <- askOracle CurrentDate
need ["letter.template"]
cmd_
"templateer letter.template"
"--out" [out]
"--options" ["date=" <> date]
data CurrentDate deriving (Show, Eq, Generic)
instance Hashable CurrentDate
instance Binary CurrentDate
instance NFData CurrentDate
type instance RuleResult CurrentDate = String
We can pass askOracle
any of the types we have defined oracles for. Because all our oracles have unique input types askOracle
can figure out which one to call. And because we have explicitly defined the return types for each oracle input type using the RuleResult
type family the type checker knows the type of askOracle SomeOracleType
.
The whole thing is pretty magical, so I like to wrap up oracles in an API that exposes regular functions. As an example, we could wrap up the date oracle like this:
module Rules.Date (rules, current)
import Development.Shake
import qualified Date
current :: Action String
current = askOracle Current
rules :: Rules ()
rules =
void . addOracle $ \Current -> currentOracle
currentOracle :: Action String
currentOracle = liftIO Date.current
data Current deriving (Show, Eq, Generic)
instance Hashable Current
instance Binary Current
instance NFData Current
type instance RuleResult Current = String
Our letter templating example could use this module like so.
rules :: Rules ()
rules = do
Rules.Date.rules
"letter.txt" %> \out -> do
date <- Rules.Date.current
need ["letter.template"]
cmd_
"templateer letter.template"
"--out" [out]
"--options" ["date=" <> date]
How to use addOracleCache
Suppose we have a project that contains several Elm applications. The Elm applications share some modules between them. If we change an Elm module we'd like Shake to recompile just those projects that use the module. We could write a rule like this:
rules :: Rules ()
rules = do
"assets/*.elm.js" %> \out -> do
let (Just [name]) = filePattern "assets/*.elm.js" out
let main = name <.> "elm"
let srcFiles = recursiveDependencies main
need ("elm.json" : srcFiles)
cmd_ "elm make --output" [out] [main]
recursiveDependencies :: FilePath -> Action [FilePath]
recursiveDependencies src = do
direct <- directDependencies src
recursive <- traverse recursiveDependencies direct
pure (src : mconcat recursive)
directDependencies :: FilePath -> Action [FilePath]
directDependencies src =
need [src]
contents <- readFile src
pure $ Elm.imports (Elm.parse contents)
This works, but it's not super efficient. For each Elm entrypoint it recalculates the full dependency tree. It would be nice if after calculating the dependency for a particular Elm module we could reuse that result in future builds, until any of the recursive dependencies of a module change.
The function recursiveDependencies
looks a lot like a rule. The result it produces is a list of file paths corresponding to Elm modules. Its dependencies are the contents of those Elm modules, because a change in an Elm module might mean that its imports have changed, requiring a recalculation of the dependency tree. Let's use addOracleCache
to write it as a rule.
rules :: Rules ()
rules = do
void $ addOracleCache recursiveDependencies
"assets/*.elm.js" %> \out -> do
let (Just [name]) = filePattern "assets/*.elm.js" out
let main = name <.> "elm"
let srcFiles = askOracle (RecursiveDependenciesFor main)
need ("elm.json" : srcFiles)
cmd_ "elm make --output" [out] [main]
recursiveDependencies :: RecursiveDependencies -> Action [FilePath]
recursiveDependencies (RecursiveDependenciesFor src) = do
direct <- directDependencies src
recursive <- traverse (askOracle . RecursiveDependenciesFor) direct
pure (src : mconcat recursive)
directDependencies :: FilePath -> Action [FilePath]
directDependencies src = do
contents <- readFile' src -- This takes a dependency on `src`.
pure $ Elm.imports (Elm.parse contents)
newtype RecursiveDependencies
= RecursiveDependenciesFor FilePath
deriving (Show, Eq, Hashable, Binary, NFData)
type instance RuleResult RecursiveDependenciesFor = [FilePath]
Done! Now we'll cache the calculation of module dependencies between builds. One further possible optimization would be to use addOracleCache
to turn directDependencies
into a build rule as well. That way changes to Elm modules that don't touch imports won't trigger recalculation of a module's dependencies. Give it a try!
It's worth emphasizing that although addOracleCache
has an identical type to addOracle
, it behaves quite differently. Remember that addOracle
is for defining dependencies. Shake runs addOracle
functions pre-emptively to check if their return values have changed. Had we used addOracle
here performance would be worse than the non-oracle-based version of the code we started with, because Shake would rerun it even if none of the Elm source files in the entire project had changed.
Closing thoughts
I hope this post has been helpful in understanding when and how to use Shake's newCache
, addOracle
, and addOracleCached
functions. One final tip: make your oracle types nice and verbose because Shake uses them in its logs. It will make debugging oracles easier.
That's it. Happy shaking!
Top comments (0)