Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,11 @@ data-in/
data-out/
node_modules/
.DS_Store/
tsconfig.tsbuildinfo
dist
.nx/cache
*.d.ts
storybook-static/
packages/cli-web/
packages/cli-sec/
packages/ngraph/
4 changes: 3 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
"search.exclude": {
"data-files/**": true,
"data-out/**": true,
"dist/**": true,
"**/package-lock.json": true
},
"editor.codeActionsOnSave": {
Expand All @@ -27,5 +28,6 @@
"html.format.enable": true,
"json.format.enable": true,
"javascript.format.enable": true,
"editor.wordWrapColumn": 100
"editor.wordWrapColumn": 100,
"prettier.printWidth": 100
}
Binary file modified bun.lockb
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/01_motivations.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@ I have two very different side projects that I am working on, where intelligent
This project is an attempt to build a framework that can be used to build intelligent retrieval systems. The main requirements are:

- I need the ability to test and iterate quickly on ideas, models, and data using both local and cloud resources. And compare the results of different approaches.
- Be able to put into production and decide to change approaches later without having to rewrite everything.
- Be able to put into production and decide to change approaches later without having to rewrite everything. So switching embedding models part way through a project should be easy.
4 changes: 4 additions & 0 deletions docs/03_matrix_operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ We need to lookup data based on the query.
- We will use the same embedding model as the data embeddings
- Different pre-processing methods of the query to embed. We will call this a query rewriter. This does not need to use the same generative model as the data rewriter, though it likely would.

## Reranking

We need to re-rank the results based on the query.

## Storage and Retrieval

We can start with these options:
Expand Down
27 changes: 7 additions & 20 deletions docs/05_task_chains.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Tasks, Task Lists, and Strategies
# Tasks and Task Graphs

## Requirements

Expand Down Expand Up @@ -76,12 +76,12 @@ Uses:
- ApplyPromptTask
- TextGenerationTask

### TextRewriterTaskList
### TextRewriter with Multiple Models

Inputs

- content
- model
- model[]
- parameters
- prompt

Expand All @@ -93,7 +93,7 @@ Uses:

- TextRewriterTask

### TextEmbeddingStrategy
### TextEmbedding Strategy

Inputs

Expand All @@ -118,14 +118,7 @@ Example:
```ts
new TextEmbeddingStrategy({
content: "This is a test",
embedding_model: [
{
name: "Xenova/distilbert-base-uncased",
model_parameters: {
temperature: 0.7,
},
},
],
embedding_model: name: "Xenova/distilbert-base-uncased"
rewriter: [
{
prompt_model: "Xenova/gpt2",
Expand All @@ -145,12 +138,6 @@ A task is a single step in the chain where most tasks output will be input for t

Tasks get posted to a job queue and are run by a job queue runner.

## TaskList

A strategy is a list of tasks that are chained together to look like a single task.

## Strategy

A strategy is a list of tasks that are chained together to look like a single task. Parts can be run in series or in parallel. It orchestrates variations of the same task.
## CompoundTask

Strategies get a name and are saved in the database, both as a parent all the variations. The variation names are based on the spefic parameters used rather than the parent name.
A compound task is a groups of tasks (in DAG format) that are chained together to look like a single task.
59 changes: 53 additions & 6 deletions docs/06_run_graph_orchestration.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,66 @@ The pipline DAG is defined by the end user and saved in the database (nodes and

The graph is a DAG. It is a list of nodes and a list of edges. The nodes are the tasks and the edges are the inputs and outputs of the tasks plus some other instrumetation data.

We might want to have events based on what happens in the graph (and a suspend/resume for bulk creation/etc). This will be needed to keep UI in sync with the as it runs.

### Node

- Task
- TaskList
- Strategy
- SimpleTaks
- CompoundTask (has a sub-graph)

Notes about requirements for the nodes:

- Must have input list and output list
- the input or output will have a type object that JS can read, and not a TS type (though that should get derived from the type object)
- We need to convert the inputs/outputs to a TypeScript type

### Edge

- Input
- Output
- DataFlow
- Instrumentation
- Events

Notes about requirements for the edges:

- There can be multiple outputs that go to multiple inputs
- I.g., there can and will be multiple edges between two nodes

### Graph Runner

The graph runner is a simple recursive function that takes a graph and a node and runs the node. If the node is a task, it runs the task. If the node is a TaskList or Strategy, it runs the subgraph.
The graph runner is a simple recursive function that takes a graph and a node and runs the node. If the node is a task, it runs the task. If the node is a CompoundTask, it runs the subgraph.

# User Task Graph

```mermaid
erDiagram
TaskGraph ||--o{ Task : nodes
TaskGraph ||--o{ DataFlow : edges
Task ||--o{ TaskInput : inputs
Task ||--o{ TaskOutput: outputs
TaskInput ||--|| ValueType : valueType
TaskOutput ||--|| ValueType : valueType
DataFlow ||--|| TaskInput : handle
DataFlow ||--|| TaskOutput : handle
DataFlow ||--|| Task : source
DataFlow ||--|| Task : target

TaskGraph{
Task[] nodes
DataFlow[] edges
}

Task{
string name
string id
TaskInput[] inputs
TaskOutput[] outputs
}

DataFlow{
string id
Task sourceTaskId
Task targetTaskId
TaskInput sourceTaskInput
TaskOutput targetTaskOutput
}
```
5 changes: 5 additions & 0 deletions lerna.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"$schema": "node_modules/lerna/schemas/lerna-schema.json",
"version": "0.0.0",
"packages": ["packages/*"]
}
13 changes: 13 additions & 0 deletions nx.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"targetDefaults": {
"build": {
"cache": true,
"dependsOn": [],
"outputs": ["{projectRoot}/dist"]
},
"test": {
"cache": true,
"dependsOn": []
}
}
}
49 changes: 38 additions & 11 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,23 +1,50 @@
{
"name": "ellmers",
"module": "ellmers",
"type": "module",
"devDependencies": {
"@types/bun": "^1.0.4",
"@types/uuid": "^9.0.7"
},
"peerDependencies": {
"typescript": "^5.3.3"
"version": "0.0.1",
"description": "Ellmers is a tool for building and running DAG pipelines of AI tasks.",
"workspaces": [
"./packages/*"
],
"scripts": {
"build": "lerna run build",
"clean": "rm -rf node_modules packages/*/node_modules packages/*/dist",
"watch": "lerna run watch --parallel --stream",
"docs": "typedoc",
"format": "eslint \"{packages}/*/src/**/*.{js,ts,tsx,json}\" --fix && prettier \"{packages}/*/src/**/*.{js,ts,tsx,json}\" --check --write",
"release": "npm run build && npm publish",
"test": "jest"
},
"dependencies": {
"@mediapipe/tasks-text": "^0.10.9",
"@sroussey/transformers": "^2.14.3",
"@sroussey/typescript-graph": "^0.3.6",
"@sroussey/typescript-graph": "^0.3.12",
"@xyflow/react": "12.0.0-next.9",
"chalk": "^5.3.0",
"commander": "^11.1.0",
"eventemitter3": "^5.0.1",
"listr2": "^8.0.1",
"listr2": "^8.0.2",
"nanoid": "^5.0.6",
"postcss": "^8.4.35",
"react-hotkeys-hook": "^4.5.0",
"react-icons": "^5.0.1",
"rxjs": "^7.8.1",
"uuid": "^9.0.1"
},
"devDependencies": {
"@types/bun": "^1.0.6",
"@types/uuid": "^9.0.8",
"autoprefixer": "^10.4.17",
"lerna": "^8.1.2",
"react": "^18.2.0",
"react-dom": "^18.2.0",
"tailwindcss": "^3.4.1",
"typescript": "^5.3.3",
"vite": "^5.1.4"
},
"peerDependencies": {
"@mediapipe/tasks-text": "^0.10.9",
"@sroussey/transformers": "^2.15.1"
},
"engines": {
"bun": "^1.0.5"
}
}
18 changes: 18 additions & 0 deletions packages/cli/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"name": "ellmers-cli",
"type": "module",
"version": "0.0.1",
"description": "Ellmers is a tool for building and running DAG pipelines of AI tasks.",
"scripts": {
"watch": "",
"build": "",
"test": "echo \"Error: no test specified\" && exit 1"
},
"bin": "src/elmers.js",
"files": [
"src"
],
"dependencies": {
"ellmers-core": "workspace:*"
}
}
Loading