Getting Started
Haetae is incremental task runner.
The task can be test, lint, build, or anything.
It can be used in any project, no matter what language, framework, test runner, linter/formatter, build system, or CI you use.
For now, in this 'Getting Started' article, we are starting from an example of incremental testing.
Why?
Let's say you're building a calculator project, named 'my-calculator'.
my-calculator
├── package.json
├── src
│ ├── add.js
│ ├── exponent.js
│ ├── multiply.js
│ └── subtract.js
└── test
├── add.test.js
├── exponent.test.js
├── multiply.test.js
└── subtract.test.js
The dependency graph is like this.
exponent.js
depends on multiply.js
, which depends on add.js
and so on.
When testing, we should take the dependency graph into account.
We do NOT have to test all files (*.test.js
) for every single tiny change (Waste of your CI resources and time).
Rather, we should do it incrementally, which means testing only files affected by the changes.
For example, when multiply.js
is changed, test only exponent.test.js
and multiply.test.js
.
When add.js
is changed, test all files (exponent.test.js
, multiply.test.js
, subtract.test.js
and add.test.js
).
When test file (e.g. add.test.js
) is changed, then just only execute the test file itself (e.g. add.test.js
).
Then how can we do it, automatically?
Here's where Haetae comes in.
By just a simple config, Haetae can automatically detect the dependency graph and test only affected files.
(You do not have to change your test runner. In this article, Jest (opens in a new tab) is used just as an example.)
Installation
So, let's install Haetae. (Node 16 or higher is required.)
It doesn't matter whether your project is new or existing (Haetae can be incrementally adapted).
It's so good for monorepo as well. (Guided later in other part of docs.)
Literally any project is proper.
npm install --save-dev haetae
Are you developing a library (e.g. plugin) for Haetae?
You can depend on @haetae/core
, @haetae/utils
,
@haetae/git
, @haetae/javascript
,
@haetae/cli
independently. Note that the package haetae
include all of them.
Basic configuration
Now, we are ready to configure Haetae.
Let's create a config file haetae.config.js
.
my-calculator
├── haetae.config.js # <--- Haetae config file
├── package.json
├── src # contents are omitted for brevity
└── test # contents are omitted for brevity
Typescript Support
If you use typescript, you can name it haetae.config.ts
.
Then install ts-node
(opens in a new tab) as peerDependencies
.
You need ts-node
no matter if you actually use it directly or not.
The peerDependencies
is marked as optional, which means non-typescript users don't have to install it.
CJS/ESM
Haetae supports both CJS and ESM project.
Haetae is written in ESM, but it can be used in CJS projects as well, as long as the config file is ESM.
If your project is CJS, name the config file haetae.config.mjs
or haetae.config.mts
.
If your project is ESM, name the config file haetae.config.js
or haetae.config.ts
.
We can write it down like this.
Make sure you initialized git. Haetae can be used with any other version control systems, but using git is assumed in this article.
import { $, core, git, utils, js } from 'haetae'
export default core.configure({
// Other options are omitted for brevity.
commands: {
myTest: {
run: async () => {
// An array of changed files
const changedFiles = await git.changedFiles()
/* An array of test files.
['/path/to/my-calculator/test/exponent.test.js',
'/path/to/my-calculator/test/multiply.test.js',
'/path/to/my-calculator/test/subtract.test.js',
'/path/to/my-calculator/test/add.test.js'] */
const testFiles = await utils.glob(['**/*.test.js'])
// An array of test files which (transitively) depend on changed files
const affectedTestFiles = testFiles.filter((testFile) =>
js.dependsOn({
dependent: testFile,
dependencies: changedFiles,
})
)
if (affectedTestFiles.length > 0) {
// Equals to "pnpm jest /path/to/foo.test.ts /path/to/bar.test.ts ..."
// Change 'pnpm' and 'jest' to your package manager and test runner.
await $`pnpm jest ${affectedTestFiles}`
}
},
},
},
})
Multiple APIs are used in the config file above.
They all have various options (Check out API docs).
But we are going to use their sensible defaults for now.
The Tagged Template Literal (opens in a new tab) $
can be used to run arbitrary shell commands.
If it receives a placeholder (${...}
) being an array, it automatically joins a whitespace (' '
) between elements.
It has other traits and options as well. Check out the API docs for more detail.
import { $, utils } from 'haetae'
// The following three lines have same effects respectively
await $`pnpm jest ${affectedTestFiles}`
await $`pnpm jest ${affectedTestFiles.join(' ')}`
// $ is a wrapper of utils.exec.
// Use utils.exec if you need a function.
// utils.exec may be easier to pass non-default options
await utils.exec(`pnpm jest ${affectedTestFiles.join(' ')}`)
In the above config, pnpm jest
is used in $
.
Just change them to your package manager and test runner.
Credit to google/zx
$
as a Tagged Template Literal is inspired by google/zx
(opens in a new tab). Thanks!
Then run haetae
like below.
$ haetae myTest
haetae
globally, you should execute it through package manager (e.g. pnpm haetae myTest
))
Note that myTest
in the command above is the name of the command we defined in the config file.
You can name it whatever you want. And as you might guess, you can define multiple commands
(e.g. myLint
, myBuild
, myIntegrationTest
, etc) in the config file.
It will print the result like this.
✔ success Command myTest is successfully executed.
⎡ 🕗 time: 2023 May 28 11:06:06 (timestamp: 1685239566483)
⎜ 🌱 env: {}
⎜ 💾 data:
⎜ "@haetae/git":
⎜ commit: 979f3c6bcafe9f0b81611139823382d615f415fd
⎜ branch: main
⎣ pkgVersion: 0.0.12
As this is the first time of running the command haetae myTest
,
git.changedFiles()
in the config returns every file tracked by git in your project as changed files
(There are options. Check out API docs after reading this article).
This behavior results in running all of the tests.
js.dependsOn()
understands direct or transitive dependencies between files,
by parsing import
or require()
, etc.
So it can be used to detect which test files (transitively) depend on at least one of the changed files.
js.dependsOn
can detect multiple formats
ES6(.js, .mjs), CJS(.js, .cjs), AMD, TypeScript(.ts, .mts, .cts), JSX(.jsx, .tsx), Webpack Loaders, CSS Preprocessors(Sass, Scss, Stylus, Less), PostCSS, RequireJS are all supported.
For Node.js, Subpath Imports (opens in a new tab) and Subpath Exports (opens in a new tab) are supported.
For TypeScript, Path Mapping (opens in a new tab) is also supported.
Check out the API docs and pass additional option(s) if you use Typescript or Webpack.
utils.dependsOn
There is also utils.dependsOn
.
While js.dependsOn
is mainly for javascript ecosystem,
utils.dependsOn
is a general-purpose transitive dependency detector.
Check out the API docs after reading this article.
Note that it can not parse dynamic imports.
Dynamic or extra dependencies can be specified as additionalGraph
option, which is explained later in this article.
May you have noticed, the store file .haetae/store.json
is generated.
It stores history of Haetae executions, which makes incremental tasks possible.
For example, the commit ID 979f3c6
printed from the above output is the current git HEAD haetae myTest
ran on.
This information is logged in the store file to be used later.
my-calculator
├── .haetae/store.json # <--- Generated
├── haetae.config.js
├── package.json
├── src
└── test
Detecting the last commit Haetae ran on successfully
Let's say we made some changes and added 2 commits.
979f3c6
is the last commit Haetae ran on successfully.
What will happen when we run Haetae again?
$ haetae myTest
This time, only exponent.test.js
and multiply.test.js
are executed.
That's because git.changedFiles()
automatically
returns only the files changed since the last successful execution of Haetae.
For another example, if you modify add.js
, then all tests will be executed,
because js.dependsOn()
detects dependency transitively.
If you modify add.test.js
, only the test file itself add.test.js
will be executed,
as every file is treated as depending on itself.
✔ success Command myTest is successfully executed.
⎡ 🕗 time: 2023 May 28 19:03:25 (timestamp: 1685268205443)
⎜ 🌱 env: {}
⎜ 💾 data:
⎜ "@haetae/git":
⎜ commit: 1d17a2f2d75e2ac94f31e53376c549751dca85fb
⎜ branch: main
⎣ pkgVersion: 0.0.12
Accordingly, the new commit 1d17a2f
is logged in the store file.
The output above is an example of successful task.
Conversely, if the test fails, pnpm jest <...>
, which we gave to $
in the config, exits with non-zero exit code.
This lets $
throws an error.
So myTest.run()
is not completed successfully, causing the store file is not renewed.
This behavior is useful for incremental tasks. The failed test (or any incremental task) will be re-executed later again until the problem is fixed.
env
configuration
Sometimes we need to separate several environments.
Simple environment variable example
For example, logic of your project might act differently depending on the environment variable $NODE_ENV
.
So, the history of an incremental task also should be recorded for each environment in a separated manner.
Let's add env
to the config file to achieve this.
import { $, core, git, utils, js } from 'haetae'
export default core.configure({
commands: {
myTest: {
env: { // <--- Add this
NODE_ENV: process.env.NODE_ENV,
},
run: async () => { /* ... */ },
},
},
})
The key name NODE_ENV
is just an example. You can name it as you want.
From now on, the store file will manage the history of each environment separately.
For example, if $NODE_ENV
can have two values, 'development'
or 'production'
,
then Haetae will manage two incremental histories for each environment.
You don't have to care about the past history of myTest
executed without env
.
When a command is configured without env
, it's treated as if configured with env: {}
, which is totally fine.
So there will be 3 env
s to be (or already) recorded in the store file:
{}
{ NODE_ENV: 'production' }
{ NODE_ENV: 'development' }
Though we changed the schema of env
in the config from {}
to { NODE_ENV: 'development' | 'production' }
,
the history of env: {}
already recorded in the store file is NOT automatically deleted.
It just stays in the store file.
This behavior is completely safe so don't worry about the past's vestige.
If you care about disk space, configuring the auto-removal of some obsolete history is guided later in this article.
Multiple keys
You can add more keys in env
object.
For instance, let's change the config to this.
import assert from 'node:assert/strict'
import { $, core, git, utils, js, pkg } from 'haetae'
import semver from 'semver'
export default core.configure({
commands: {
myTest: {
env: async () => { // <--- Changed to async function from object
assert(['development', 'production'].includes(process.env.NODE_ENV))
return {
NODE_ENV: process.env.NODE_ENV,
jestConfig: await utils.hash(['jest.config.js']),
jest: (await js.version('jest')).major,
branch: await git.branch(),
os: process.platform,
node: semver.major(process.version),
haetae: pkg.version.major,
}
},
run: async () => { /* ... */ },
},
},
})
The object has more keys than before, named jestConfig
, jest
, branch
and so on.
If any of $NODE_ENV
, Jest config file, major version of Jest, git branch, OS platform, major version of Node.js,
or major version of the package haetae
is changed, it's treated as a different environment.
And now env
becomes a function. You can even freely write any additional code in it,
like assertion (assert()
) in the line number 9 above. myTest.env()
is executed before myTest.run()
.
Just as like myTest.run()
, when an error is thrown in myTest.env()
,
the store file is not renewed, which is intended design for incremental tasks.
If you just want to check the value the env
function returns, you can use -e, --env
option.
This does not write to the store file, but just prints the value.
$ haetae myTest --env
✔ success Current environment is successfully executed for the command myTest
⎡ NODE_ENV: development
⎜ jestConfig: 642645d6bc72ab14a26eeae881a0fc58e0fb4a25af31e55aa9b0d134160436eb
⎜ jest: 29
⎜ branch: main
⎜ os: darwin
⎜ node: 18
⎣ haetae: 0
Additional dependency graph
Until now, js.dependsOn()
is used for automatic detection of dependency graph.
But sometimes, you need to specify some dependencies manually.
Simple integration test
For example, let's say you're developing a project communicating with a database.
your-project
├── haetae.config.js
├── package.json
├── src
│ ├── external.js
│ ├── logic.js
│ └── index.js
└── test
├── data.sql
├── external.test.js
├── logic.test.js
└── index.test.js
The explicit dependency graph is like this.
logic.js
contains business logic, including communicating with a database.
external.js
communicates with a certain external service, regardless of the database.
But there is a SQL file named data.sql
for an integration test.
It's not (can't be) imported (e.g. import
, require()
) by any source code file.
Let Haetae think logic.js
depends on data.sql
, by utils.graph()
and
options.additionalGraph
of js.dependsOn()
.
import { $, core, git, utils, js } from 'haetae'
export default core.configure({
commands: {
myTest: {
env: { /* ... */ },
run: async () => {
const changedFiles = await git.changedFiles()
const testFiles = await utils.glob(['**/*.test.js'])
// A graph of additional dependencies specified manually
const additionalGraph = utils.graph({
edges: [
{
dependents: ['src/logic.js'],
dependencies: ['test/data.sql'],
},
],
})
const affectedTestFiles = testFiles.filter((testFile) =>
js.dependsOn({
dependent: testFile,
dependencies: changedFiles,
additionalGraph, // <--- New option
}),
)
if (affectedTestFiles.length > 0) {
await $`pnpm jest ${affectedTestFiles}`
}
},
},
},
})
Then the implicit dependency graph becomes explicit.
From now on, when the file data.sql
is changed, index.test.js
and logic.test.js
. are executed.
As external.test.js
doesn't transitively depend on data.sql
, it's not executed.
Unlike this general and natural flow, if you decide that index.test.js
should never be affected by data.sql
,
you can change the config.
// Other content is omitted for brevity
const additionalGraph = utils.graph({
edges: [
{
dependents: ['test/logic.test.js'], // 'src/logic.js' to 'test/logic.test.js'
dependencies: ['test/data.sql'],
},
],
})
By this, data.sql
doesn't affect index.test.js
anymore.
But I recommend this practice only when you're firmly sure that index.test.js
will not be related to data.sql
.
Because, otherwise, you should update the config again when the relation is changed.
env
vs additionalGraph
The effect of addtionalGraph
is different from env
.
env
is like defining parallel universes, where history is recorded separately.
If you place data.sql
in env
(e.g. with utils.hash()
) instead of additonalGraph
,
every test file will be executed when data.sql
changes,
unless the change is a rollback to past content which can be matched with a past value of env
logged in the store file (.haetae/store.json
).
external.js
and external.test.js
are regardless of database.
That's why data.sql
is applied as addtionalGraph
, not as env
.
But that's case by case. In many situations, env
is beneficial.
- If
data.sql
affects 'most' of your integration test files,
or
- If which test file does and doesn't depend on
data.sql
is not clear or the relation changes frequently,
or
- If
data.sql
is not frequently changed,
then env
is a good place.
import { $, core, git, utils, js } from 'haetae'
export default core.configure({
commands: {
myTest: {
env: async () => ({
testData: await utils.hash(['test/data.sql']),
}),
run: async () => { /* ... */ }, // without additionalGraph
},
},
})
Cartesian product
You can specify dependency graph from a chunk of files to another chunk.
// Other content is omitted for brevity
const additionalGraph = utils.graph({
edges: [
{
dependents: await utils.glob(['test/db/*.test.js']),
dependencies: [
'test/docker-compose.yml',
...(await utils.glob(['test/db/*.sql'])),
],
},
],
})
This means that any test file under test/db/
depends on any SQL file under test/db/
and test/docker-compose.yml
.
Distributed notation
You don't have to specify a dependent's dependencies all at once. It can be done in a distributed manner.
// Other content is omitted for brevity
const additionalGraph = utils.graph({
edges: [
{
dependents: ['foo', 'bar'],
dependencies: ['one', 'two'],
},
{
dependents: ['foo', 'qux'], // 'foo' appears again, and it's fine
dependencies: ['two', 'three', 'bar'], // 'two' and 'bar' appear again, and it's fine
},
{
dependents: ['one', 'two', 'three'],
dependencies: ['two'], // 'two' depends on itself, and it's fine
},
{
dependents: ['foo'],
dependencies: ['one'], // 'foo' -> 'one' appears again, and it's fine
},
],
})
On line number 13-14, we marked two
depending on two
itself.
That's OK, as every file is treated as depending on itself.
So foo
depends on foo
. bar
also depends on bar
, and so on.
Circular dependency
Haetae supports circular dependency as well. Although circular dependency is, in general, considered not a good practice, it's fully up to you to decide whether to define it. Haetae does not prevent you from defining it.
// Other content is omitted for brevity
const additionalGraph = utils.graph({
edges: [
{
dependents: ['index.js'],
dependencies: ['foo'],
},
{
dependents: ['foo'],
dependencies: ['bar'],
},
{
dependents: ['bar'],
dependencies: ['index.js'],
},
],
})
Assume the relations between index.js
, foo
, and bar
are given by additionalGraph
,
and the rests are automatically detected.
In this situation, index.test.js
is executed when any of files are changed, including foo
, and bar
.
On the other hand, utils.test.js
is executed only when utils.js
or utils.test.js
itself is changed.
Record Data
Haetae has a concept of 'Record' (type: HaetaeRecord
) and 'Record Data' (type: HaetaeRecord.data
).
In the previous sections, we've already seen terminal outputs like this.
$ haetae myTest
✔ success Command myTest is successfully executed.
⎡ 🕗 time: 2023 May 28 11:06:06 (timestamp: 1685239566483)
⎜ 🌱 env: {}
⎜ 💾 data:
⎜ "@haetae/git":
⎜ commit: 979f3c6bcafe9f0b81611139823382d615f415fd
⎜ branch: main
⎣ pkgVersion: 0.0.12
This information is logged in the store file (.haetae/store.json
), and called 'Record'.
The data
field is called 'Record Data'. To query 'Records', you can use the CLI option -r, --record
.
$ haetae myTest --record --json
-j, --json
The option -j, --json
is purely optional.
It lets the CLI print the result in JSON format.
Check out the CLI docs for more details.
The output is like this.
{
"status": "success",
"message": "5 records are found for the command myTest",
"result": [
{
"data": {
"@haetae/git": {
"commit": "979f3c6bcafe9f0b81611139823382d615f415fd",
"branch": "main",
"pkgVersion": "0.0.12"
}
},
"env": {},
"time": 1685239566483
},
{
"data": {
"@haetae/git": {
"commit": "a4f4e7e83eedbf2269fbf29d91f08289bdeece91",
"branch": "main",
"pkgVersion": "0.0.12"
}
},
"env": {
"NODE_ENV": "production"
},
"time": 1685458529856
},
{
"data": {
"@haetae/git": {
"commit": "442fefc582889bdaee5ec2bd8b74804680fc30ee",
"branch": "main",
"pkgVersion": "0.0.12"
}
},
"env": {
"NODE_ENV": "development"
},
"time": 1685452061199
},
{
"data": {
"@haetae/git": {
"commit": "ef3fdf88e9fad90396080335096a88633fbe893f",
"branch": "main",
"pkgVersion": "0.0.12"
}
},
"env": {
"jestConfig": "642645d6bc72ab14a26eeae881a0fc58e0fb4a25af31e55aa9b0d134160436eb",
"jest": 29,
"branch": "main",
"os": "darwin",
"node": 18,
"haetae": 0
},
"time": 1685455507556
},
{
"data": {
"@haetae/git": {
"commit": "7e3b332f0657272cb277c312ff25d4e1145f895c",
"branch": "main",
"pkgVersion": "0.0.12"
}
},
"env": {
"testData": "b87b8be8df58976ee7da391635a7f45d8dc808357ff63fdcda699df937910227"
},
"time": 1685451151035
}
]
}
5 Records are found in total.
These are what we've done in this article so far.
Each of these is the last history of Records executed in each env
respectively.
For example, the command myTest
was executed with env: {}
on several commits, and 979f3c6
is the last commit.
Custom Record Data
Configuration files for your application is a good example showing the usefulness of Record Data.
I mean a config file not for Haetae, but for your project itself.
To say, dotenv (.env
), .yaml, .properties, .json, etc.
Usually, an application config file satisfies these 2 conditions.
- It's not explicitly imported (e.g.
import
,require()
) in the source code. Rather, the source code 'reads' it on runtime. --->options.additionalGraph
ofjs.dependsOn()
orenv
are useful. - It's ignored by git. ---> 'Record Data' is useful.
Let's see how it works, with a simple example project using .env
as the application config.
dotenv
.env
is a configuration file for environment variables, and NOT related to Haetae's env
at all.
your-project
├── .env # <--- dotenv file
├── .gitignore # <--- ignores '.env' file
├── haetae.config.js
├── package.json
├── src
│ ├── config.js
│ ├── utils.js
│ ├── logic.js
│ └── index.js
└── test
├── utils.test.js
├── logic.test.js
└── index.test.js
src/config.js
reads the file .env
, by a library dotenv (opens in a new tab) for example.
import { config } from 'dotenv'
config()
export default {
port: process.env.PORT,
secretKey: process.env.SECRET_KEY,
env: process.env.ENV // e.g. 'development', 'staging', 'production', etc
}
Let's assume logic.js
gets the value of environment variables through config.js
, not directly reading from .env
or process.env
.
The explicit source code dependency graph is like this.
Let Haetae think config.js
depends on .env
.
import { $, core, git, utils, js } from 'haetae'
export default core.configure({
commands: {
myTest: {
env: { /* ... */ },
run: async () => {
const changedFiles = await git.changedFiles()
const testFiles = await utils.glob(['test/*.test.js'])
const additionalGraph = utils.graph({
edges: [
{
dependents: ['src/config.js'],
dependencies: ['.env'],
},
],
})
const affectedTestFiles = testFiles.filter((testFile) =>
js.dependsOn({
dependent: testFile,
dependencies: changedFiles,
additionalGraph,
}),
)
if (affectedTestFiles.length > 0) {
await $`pnpm jest ${affectedTestFiles}`
}
},
},
},
})
Then the implicit dependency graph becomes explicit.
But that's now enough, because .env
is ignored by git.
git.changedFiles()
cannot detect if .env
changed or not.
Let's use 'Record Data' to solve this problem. Add these into the config file like this.
import { $, core, git, utils, js } from 'haetae'
export default core.configure({
commands: {
myTest: {
env: { /* ... */ },
run: async () => {
const changedFiles = await git.changedFiles()
const previousRecord = await core.getRecord()
const dotenvHash = await utils.hash(['.env'])
if (previousRecord?.data?.dotenv !== dotenvHash) {
changedFiles.push('.env')
}
const testFiles = await utils.glob(['**/*.test.js'])
const additionalGraph = utils.graph({
edges: [
{
dependents: ['src/config.js'],
dependencies: ['.env'],
},
],
})
const affectedTestFiles = testFiles.filter((testFile) =>
js.dependsOn({
dependent: testFile,
dependencies: changedFiles,
additionalGraph,
}),
)
if (affectedTestFiles.length > 0) {
await $`pnpm jest ${affectedTestFiles}`
}
return {
dotenv: dotenvHash
}
},
},
},
})
Now, we return an object from myTest.run
.
Let's execute it.
$ haetae myTest
✔ success Command myTest is successfully executed.
⎡ 🕗 time: 2023 Jun 08 09:23:07 (timestamp: 1686183787453)
⎜ 🌱 env: {}
⎜ 💾 data:
⎜ "@haetae/git":
⎜ commit: ac127da6531efa487b8ee35451f24a70dc58aeea
⎜ branch: main
⎜ pkgVersion: 0.0.12
⎣ dotenv: 7f39224e335994886c26ba8c241fcbe1d474aadaa2bd0a8e842983b098cea894
Do you see the last line?
The value we returned from myTest.run
is recorded in the store file, as part of Record Data.
Hash credentials
utils.hash()
is good for credentials like a dotenv file.
By default, it hashes by SHA-256, and you can simply change the cryptographic hash algorithm by its options, like to SHA-512 for example.
Thus, you do not need to worry about if the store file is leaked.
This time, .env
was treated as a changed file, as the key dotenv
did not exist from previousRecord
.
// Other content is omitted for brevity
if (previousRecord?.data?.dotenv !== dotenvHash) {
changedFiles.push('.env')
}
Therefore, index.test.js
and logic.test.js
, which transitively depend on .env
, are executed.
If you run Haetae again immediately,
$ haetae myTest
This time, no test is executed, as nothing is considered changed. .env
is treated as not changed, thanks to the Record Data.
From now on, though the file .env
is ignored by git, changes to it are recorded by custom Record Data.
So it can be used in incremental tasks.