Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
36 changes: 36 additions & 0 deletions .cursor/rules/meaningful-identifiers.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
name: meaningful-identifiers
description: Naming guidance for callback parameters and local variables in TypeScript. Use when writing or refactoring loops, array callbacks (map/filter/find/reduce), or local accumulators.
license: AGPL-3.0
metadata:
triggers:
type: domain
enforcement: suggest
priority: low
keywords:
- naming
- rename
- identifier
- variable name
---

# Meaningful identifiers

Callback parameters and local variables must describe the value they hold.

## Rules

- Name array callback params after the element's domain type:
`batches.map((batch) => …)`, `bins.filter((bin) => …)`,
`checkpoints.find((checkpoint) => …)`. Never reuse `right`/`left`/`column`
for values that are not sort operands or table columns.
- Name `reduce` accumulators after what they accumulate: `sum`/`total` for a
running total, not `step`.
- Name percentile/ratio arguments for the quantity they represent
(`percentileRank`, `multiplier`), not an unrelated domain noun.

## Leave correct uses untouched

- Sort comparators: `(left, right) => left - right`.
- `(column) => column.key` over a real `.columns` array.
- `for (const row of …Rows)` where the items are genuinely rows.
10 changes: 2 additions & 8 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -69,14 +69,8 @@ API_ORIGIN=http://localhost:5001
# @todo H-124 implement a new change subscription service
ENABLE_REALTIME_SYNC=false

# Optional usage telemetry for HASH
HASH_TELEMETRY_ENABLED=false
# Currently our endpoint doesn't have HTTPS so this is set to false
HASH_TELEMETRY_HTTPS=false
# DNS collector endpoint
HASH_TELEMETRY_DESTINATION=REPLACE_ME.aws.com
# Is used for differentiating different apps, can be any value
HASH_TELEMETRY_APP_ID=hash-app
# Optional product analytics via Rudderstack. Leave the write key blank to disable telemetry.
HASH_API_RUDDERSTACK_KEY=

###########################################
# Disable telemetry from third-party dependencies who transmit IP addresses
Expand Down
2 changes: 2 additions & 0 deletions .env.development
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ FILE_UPLOAD_PROVIDER="AWS_S3"

# Feature flags
SHOW_WORKER_COST=true

ENVIRONMENT=development
7 changes: 2 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,12 +294,9 @@ The Postgres information for the graph query layer is configured through:
- `HASH_REDIS_HOST` (default: `localhost`)
- `HASH_REDIS_PORT` (default: `6379`)

#### Snowplow telemetry
#### Rudderstack telemetry

- `HASH_TELEMETRY_ENABLED`: whether Snowplow is used or not. `true` or `false`. (default: `false`)
- `HASH_TELEMETRY_HTTPS`: set to "1" to connect to the Snowplow over an HTTPS connection. `true` or `false`. (default: `false`)
- `HASH_TELEMETRY_DESTINATION`: the hostname of the Snowplow tracker endpoint to connect to. (required)
- `HASH_TELEMETRY_APP_ID`: ID used to differentiate application by. Can be any string. (default: `hash-workspace-app`)
- `HASH_API_RUDDERSTACK_KEY`: Rudderstack write key for product analytics. Leave blank to disable telemetry. The `environment` label on events is derived from `ENVIRONMENT`. (optional)

#### Others

Expand Down
81 changes: 6 additions & 75 deletions apps/hash-api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,7 @@ The HASH Backend API service is configured using the following environment varia
- `STATSD_HOST`: the hostname of the StatsD server.
- `STATSD_PORT`: (default: 8125) the port number the StatsD server is listening on.
- `HASH_INTEGRATION_QUEUE_NAME` The name of the Redis queue which updates to entities are published to used to decide what changes should be written to connected applications (for two-way sync between them and HASH)
- Snowplow telemetry:
- `HASH_TELEMETRY_ENABLED`: (default: `false`) whether Snowplow is used or not. `true` or `false`.
- `HASH_TELEMETRY_HTTPS`: (default: `false`) set to "1" to connect to the Snowplow over an HTTPS connection. `true` or `false`.
- `HASH_TELEMETRY_DESTINATION`: (required) the hostname of the Snowplow tracker endpoint to connect to.
- `HASH_TELEMETRY_APP_ID`: ID used to differentiate application by. Can be any string.
- `HASH_API_RUDDERSTACK_KEY`: (optional) Rudderstack write key for product analytics. Leave blank to disable telemetry. The environment label sent with events is derived from `ENVIRONMENT`.

## Metrics

Expand All @@ -65,75 +61,10 @@ this enabled, from the root of the repo, execute:
yarn serve:hash-backend-statsd
```

## Snowplow
## Telemetry

The API may use Snowplow to collect structured behavioural data. In order to use Snowplow
the `HASH_TELEMETRY_*` environment values should be specified, most importantly
`HASH_TELEMETRY_DESTINATION` should point to a snowplow tracker endpoint and
`HASH_TELEMETRY_ENABLED=true`.
The API sends product-analytics events to [Rudderstack](https://www.rudderstack.com/).
Telemetry is enabled only when `HASH_API_RUDDERSTACK_KEY` is set.

To set up a local Snowplow deployment, [Snowplow mini](https://github.com/snowplow/snowplow-mini) can be used. This requires [Vagrant](https://www.vagrantup.com/) and [VirtualBox](https://www.virtualbox.org/) to be installed.
By default, the Snowplow mini instance uses a fair bit of RAM and CPU.
Snowplow mini exposes a lot of ports to the host through the default [Vagrantfile](https://github.com/snowplow/snowplow-mini/blob/f7dbf73f1e3ba589d2dd1d8b94589c4f610dba1f/Vagrantfile). To make the instance play well wit HASH, comment out the following in the Snowplow mini Vagrantfile:

```ruby
#...
config.vm.network "forwarded_port", guest: 80, host: 2000
# config.vm.network "forwarded_port", guest: 3000, host: 3000
config.vm.network "forwarded_port", guest: 4171, host: 4171
config.vm.network "forwarded_port", guest: 8080, host: 8080
config.vm.network "forwarded_port", guest: 8093, host: 8093
config.vm.network "forwarded_port", guest: 9200, host: 9200
config.vm.network "forwarded_port", guest: 5601, host: 5601
# config.vm.network "forwarded_port", guest: 8081, host: 8081
config.vm.network "forwarded_port", guest: 10000, host: 10000
#...
```

The default credentials for the basic auth at [http://localhost:2000/home](http://localhost:2000/home) for the local Snowplow mini instance is:

- username: USERNAME_PLACEHOLDER
- password: PASSWORD_PLACEHOLDER

With a local Snowplow deployment, the following destination can be used in the HASH env:
`HASH_TELEMETRY_DESTINATION=localhost:2000`

### Troubleshooting Vagrant/VirtualBox

On a fresh install of Vagrant/VirtualBox, some kernel modules might be unloaded for VirtualBox
which makes the `vagrant up` command error out on various steps along the way.
It might be necessary to load the following kernel modules to prevent errors:

Linux:

```sh
sudo modprobe vboxnetflt
sudo modprobe vboxnetadp
sudo modprobe vboxdrv
```

macOS:

```sh
sudo kmutil load -b org.virtualbox.kext.VBoxNetFlt
sudo kmutil load -b org.virtualbox.kext.VBoxNetAdp
sudo kmutil load -b org.virtualbox.kext.VBoxDrv
```

If you encounter an error such as
`mount.nfs: access denied by server while mounting ...`
It might be necessary to make the following changes to the Snowplow mini Vagrantfile to disable NFS.

```ruby
# Use NFS for shared folders for better performance
# config.vm.network :private_network, ip: '192.168.56.56' # Uncomment to use NFS
config.vm.synced_folder '.', '/vagrant' #, nfs: true # Uncomment to use NFS
```

Here the `config.vm.network` line and the `, nfs: true` argument are commented out.

If you see `uninitialized constant VagrantPlugins::HostDarwin::Cap::Version` on MacOS, see [this issue](https://github.com/hashicorp/vagrant/issues/12583).

If you encounter an issue on the `npm install` step (5), try commenting out lines 8 & 9 of the Vagrantfile (i.e. disable using NFS for shared folders).

You may need to run `vagrant reload --provision` after applying fixes.
The `environment` property attached to every event is derived from `ENVIRONMENT` (`production` / `staging`,
otherwise `development`).
4 changes: 2 additions & 2 deletions apps/hash-api/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@
"build:docker": "docker buildx build --tag hash-api --file docker/Dockerfile ../../ --load",
"codegen": "rimraf './src/**/*.gen.*'; graphql-codegen --config codegen.config.ts",
"dev": "NODE_ENV=development NODE_OPTIONS=--max-old-space-size=2048 tsx watch --clear-screen=false --import ./src/instrument.mjs ./src/index.ts",
"dev:import-supply-chain-data": "NODE_ENV=development tsx ./src/seed-data/import-supply-chain-dataset.ts",
"dev:seed-crm-data": "NODE_ENV=development tsx ./src/seed-data/seed-crm-data.ts",
"fix:eslint": "eslint --fix .",
"generate-hash-gpt-schema": "tsx ./src/ai/gpt/generate-hashgpt-schema.ts",
"generate-ontology-type-ids": "tsx ./src/generate-ontology-type-ids.ts; oxfmt --write --ignore-path /dev/null ../../libs/@local/hash-isomorphic-utils/src/ontology-type-ids.ts",
"generate-ontology-type-ids": "tsx ./src/generate-ontology-type-ids.ts; cd ../../libs/@local/hash-isomorphic-utils && oxfmt --write --ignore-path /dev/null src/ontology-type-ids.ts",
"lint:eslint": "eslint --report-unused-disable-directives .",
"lint:tsc": "tsc --noEmit",
"setup:admin": "NODE_ENV=production OTEL_SERVICE_NAME=\"Setup Admin\" tsx --import ./src/instrument.mjs ./src/setup-admin.ts",
Expand Down Expand Up @@ -65,7 +66,6 @@
"@ory/kratos-client": "25.4.0",
"@rudderstack/rudder-sdk-node": "3.0.8",
"@sentry/node": "10.54.0",
"@snowplow/node-tracker": "4.8.2",
"@temporalio/client": "1.18.1",
"@temporalio/proto": "1.18.1",
"@types/ws": "8.18.1",
Expand Down
128 changes: 128 additions & 0 deletions apps/hash-api/src/analysis/analyses/supply-chain.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
import { AnalysisNotFoundError } from "../shared/errors";
import { requireSlugArg } from "../shared/storage-key";
import { resolveDataset } from "./supply-chain/dataset";

import type { NamedAnalysis } from "../shared/analysis-registry";

/**
* Named analyses backing the supply chain views. Every analysis is scoped to
* a single web (the dataset owner) and resolves to one or more JSON artifacts.
*
* Storage layout (under `analysis/{webId}/supply-chain/{version}/`):
* products.json
* sites.json
* {productId}/graph.json
* {productId}/steps/{stepId}.json
* _global/supplier_performance.json
* _global/supplier-lines.json
* site/{siteId}/summary.json
*/

const listProducts: NamedAnalysis = {
name: "listProducts",
resolve: async (ctx) => {
const { base } = await resolveDataset(ctx);
return {
status: "ready",
artifacts: [{ name: "products", key: `${base}/products.json` }],
};
},
};

const listSites: NamedAnalysis = {
name: "listSites",
resolve: async (ctx) => {
const { base } = await resolveDataset(ctx);
return {
status: "ready",
artifacts: [{ name: "sites", key: `${base}/sites.json` }],
};
},
};

const productGraph: NamedAnalysis = {
name: "productGraph",
resolve: async (ctx) => {
const productId = requireSlugArg(ctx.args, "productId");
const { base, manifest } = await resolveDataset(ctx);

if (!manifest.products.includes(productId)) {
throw new AnalysisNotFoundError(`Unknown product "${productId}"`);
}

return {
status: "ready",
artifacts: [{ name: "graph", key: `${base}/${productId}/graph.json` }],
};
},
};

const stepDetail: NamedAnalysis = {
name: "stepDetail",
resolve: async (ctx) => {
const productId = requireSlugArg(ctx.args, "productId");
const stepId = requireSlugArg(ctx.args, "stepId");
const { base, manifest } = await resolveDataset(ctx);

if (!manifest.products.includes(productId)) {
throw new AnalysisNotFoundError(`Unknown product "${productId}"`);
}
if (!(manifest.steps[productId] ?? []).includes(stepId)) {
throw new AnalysisNotFoundError(
`Unknown step "${stepId}" for product "${productId}"`,
);
}

return {
status: "ready",
artifacts: [
{ name: "step", key: `${base}/${productId}/steps/${stepId}.json` },
],
};
},
};

const supplierPerformance: NamedAnalysis = {
name: "supplierPerformance",
resolve: async (ctx) => {
const { base } = await resolveDataset(ctx);
return {
status: "ready",
artifacts: [
{
name: "performance",
key: `${base}/_global/supplier_performance.json`,
},
{ name: "lines", key: `${base}/_global/supplier-lines.json` },
],
};
},
};

const siteSummary: NamedAnalysis = {
name: "siteSummary",
resolve: async (ctx) => {
const siteId = requireSlugArg(ctx.args, "siteId");
const { base, manifest } = await resolveDataset(ctx);

if (!manifest.sites.includes(siteId)) {
throw new AnalysisNotFoundError(`Unknown site "${siteId}"`);
}

return {
status: "ready",
artifacts: [
{ name: "summary", key: `${base}/site/${siteId}/summary.json` },
],
};
},
};

export const supplyChainAnalyses: readonly NamedAnalysis[] = [
listProducts,
listSites,
productGraph,
stepDetail,
supplierPerformance,
siteSummary,
];
Loading
Loading