workglow-dev · sroussey · Feb 22, 2024 · Feb 22, 2024
diff --git a/.gitignore b/.gitignore
@@ -2,3 +2,11 @@ data-in/
 data-out/
 node_modules/
 .DS_Store/
+tsconfig.tsbuildinfo
+dist
+.nx/cache
+*.d.ts
+storybook-static/
+packages/cli-web/
+packages/cli-sec/
+packages/ngraph/
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -10,6 +10,7 @@
   "search.exclude": {
     "data-files/**": true,
     "data-out/**": true,
+    "dist/**": true,
     "**/package-lock.json": true
   },
   "editor.codeActionsOnSave": {
@@ -27,5 +28,6 @@
   "html.format.enable": true,
   "json.format.enable": true,
   "javascript.format.enable": true,
-  "editor.wordWrapColumn": 100
+  "editor.wordWrapColumn": 100,
+  "prettier.printWidth": 100
 }
diff --git a/bun.lockb b/bun.lockb
diff --git a/docs/01_motivations.md b/docs/01_motivations.md
@@ -16,4 +16,4 @@ I have two very different side projects that I am working on, where intelligent
 This project is an attempt to build a framework that can be used to build intelligent retrieval systems. The main requirements are:
 
 - I need the ability to test and iterate quickly on ideas, models, and data using both local and cloud resources. And compare the results of different approaches.
-- Be able to put into production and decide to change approaches later without having to rewrite everything.
+- Be able to put into production and decide to change approaches later without having to rewrite everything. So switching embedding models part way through a project should be easy.
diff --git a/docs/03_matrix_operations.md b/docs/03_matrix_operations.md
@@ -27,6 +27,10 @@ We need to lookup data based on the query.
 - We will use the same embedding model as the data embeddings
 - Different pre-processing methods of the query to embed. We will call this a query rewriter. This does not need to use the same generative model as the data rewriter, though it likely would.
 
+## Reranking
+
+We need to re-rank the results based on the query.
+
 ## Storage and Retrieval
 
 We can start with these options:

diff --git a/docs/05_task_chains.md b/docs/05_task_chains.md
@@ -1,4 +1,4 @@
-# Tasks, Task Lists, and Strategies
+# Tasks and Task Graphs
 
 ## Requirements
 
@@ -76,12 +76,12 @@ Uses:
 - ApplyPromptTask
 - TextGenerationTask
 
-### TextRewriterTaskList
+### TextRewriter with Multiple Models
 
 Inputs
 
 - content
-- model
+- model[]
 - parameters
 - prompt
 
@@ -93,7 +93,7 @@ Uses:
 
 - TextRewriterTask
 
-### TextEmbeddingStrategy
+### TextEmbedding Strategy
 
 Inputs
 
@@ -118,14 +118,7 @@ Example:
 ```ts
 new TextEmbeddingStrategy({
   content: "This is a test",
-  embedding_model: [
-    {
-      name: "Xenova/distilbert-base-uncased",
-      model_parameters: {
-        temperature: 0.7,
-      },
-    },
-  ],
+  embedding_model: name: "Xenova/distilbert-base-uncased"
   rewriter: [
     {
       prompt_model: "Xenova/gpt2",
@@ -145,12 +138,6 @@ A task is a single step in the chain where most tasks output will be input for t
 
 Tasks get posted to a job queue and are run by a job queue runner.
 
-## TaskList
-
-A strategy is a list of tasks that are chained together to look like a single task.
-
-## Strategy
-
-A strategy is a list of tasks that are chained together to look like a single task. Parts can be run in series or in parallel. It orchestrates variations of the same task.
+## CompoundTask
 
-Strategies get a name and are saved in the database, both as a parent all the variations. The variation names are based on the spefic parameters used rather than the parent name.
+A compound task is a groups of tasks (in DAG format) that are chained together to look like a single task.
diff --git a/docs/06_run_graph_orchestration.md b/docs/06_run_graph_orchestration.md
@@ -14,19 +14,66 @@ The pipline DAG is defined by the end user and saved in the database (nodes and
 
 The graph is a DAG. It is a list of nodes and a list of edges. The nodes are the tasks and the edges are the inputs and outputs of the tasks plus some other instrumetation data.
 
+We might want to have events based on what happens in the graph (and a suspend/resume for bulk creation/etc). This will be needed to keep UI in sync with the as it runs.
+
 ### Node
 
 - Task
-- TaskList
-- Strategy
+- SimpleTaks
+- CompoundTask (has a sub-graph)
+
+Notes about requirements for the nodes:
+
+- Must have input list and output list
+  - the input or output will have a type object that JS can read, and not a TS type (though that should get derived from the type object)
+- We need to convert the inputs/outputs to a TypeScript type
 
 ### Edge
 
-- Input
-- Output
+- DataFlow
 - Instrumentation
-- Events
+
+Notes about requirements for the edges:
+
+- There can be multiple outputs that go to multiple inputs
+  - I.g., there can and will be multiple edges between two nodes
 
 ### Graph Runner
 
-The graph runner is a simple recursive function that takes a graph and a node and runs the node. If the node is a task, it runs the task. If the node is a TaskList or Strategy, it runs the subgraph.
+The graph runner is a simple recursive function that takes a graph and a node and runs the node. If the node is a task, it runs the task. If the node is a CompoundTask, it runs the subgraph.
+
+# User Task Graph
+
+```mermaid
+erDiagram
+    TaskGraph ||--o{ Task : nodes
+    TaskGraph ||--o{ DataFlow : edges
+    Task ||--o{ TaskInput : inputs
+    Task ||--o{ TaskOutput: outputs
+    TaskInput ||--|| ValueType : valueType
+    TaskOutput ||--|| ValueType : valueType
+    DataFlow ||--|| TaskInput : handle
+    DataFlow ||--|| TaskOutput : handle
+    DataFlow ||--|| Task : source
+    DataFlow ||--|| Task : target
+
+    TaskGraph{
+      Task[] nodes
+      DataFlow[] edges
+    }
+
+    Task{
+      string name
+      string id
+      TaskInput[] inputs
+      TaskOutput[] outputs
+    }
+
+    DataFlow{
+      string id
+      Task sourceTaskId
+      Task targetTaskId
+      TaskInput sourceTaskInput
+      TaskOutput targetTaskOutput
+    }
+```
diff --git a/lerna.json b/lerna.json
@@ -0,0 +1,5 @@
+{
+  "$schema": "node_modules/lerna/schemas/lerna-schema.json",
+  "version": "0.0.0",
+  "packages": ["packages/*"]
+}
diff --git a/nx.json b/nx.json
@@ -0,0 +1,13 @@
+{
+  "targetDefaults": {
+    "build": {
+      "cache": true,
+      "dependsOn": [],
+      "outputs": ["{projectRoot}/dist"]
+    },
+    "test": {
+      "cache": true,
+      "dependsOn": []
+    }
+  }
+}
diff --git a/package.json b/package.json
@@ -1,23 +1,50 @@
 {
   "name": "ellmers",
-  "module": "ellmers",
   "type": "module",
-  "devDependencies": {
-    "@types/bun": "^1.0.4",
-    "@types/uuid": "^9.0.7"
-  },
-  "peerDependencies": {
-    "typescript": "^5.3.3"
+  "version": "0.0.1",
+  "description": "Ellmers is a tool for building and running DAG pipelines of AI tasks.",
+  "workspaces": [
+    "./packages/*"
+  ],
+  "scripts": {
+    "build": "lerna run build",
+    "clean": "rm -rf node_modules packages/*/node_modules packages/*/dist",
+    "watch": "lerna run watch --parallel --stream",
+    "docs": "typedoc",
+    "format": "eslint \"{packages}/*/src/**/*.{js,ts,tsx,json}\" --fix && prettier \"{packages}/*/src/**/*.{js,ts,tsx,json}\" --check --write",
+    "release": "npm run build && npm publish",
+    "test": "jest"
   },
   "dependencies": {
-    "@mediapipe/tasks-text": "^0.10.9",
-    "@sroussey/transformers": "^2.14.3",
-    "@sroussey/typescript-graph": "^0.3.6",
+    "@sroussey/typescript-graph": "^0.3.12",
+    "@xyflow/react": "12.0.0-next.9",
     "chalk": "^5.3.0",
     "commander": "^11.1.0",
     "eventemitter3": "^5.0.1",
-    "listr2": "^8.0.1",
+    "listr2": "^8.0.2",
+    "nanoid": "^5.0.6",
+    "postcss": "^8.4.35",
+    "react-hotkeys-hook": "^4.5.0",
+    "react-icons": "^5.0.1",
     "rxjs": "^7.8.1",
     "uuid": "^9.0.1"
+  },
+  "devDependencies": {
+    "@types/bun": "^1.0.6",
+    "@types/uuid": "^9.0.8",
+    "autoprefixer": "^10.4.17",
+    "lerna": "^8.1.2",
+    "react": "^18.2.0",
+    "react-dom": "^18.2.0",
+    "tailwindcss": "^3.4.1",
+    "typescript": "^5.3.3",
+    "vite": "^5.1.4"
+  },
+  "peerDependencies": {
+    "@mediapipe/tasks-text": "^0.10.9",
+    "@sroussey/transformers": "^2.15.1"
+  },
+  "engines": {
+    "bun": "^1.0.5"
   }
 }
diff --git a/packages/cli/package.json b/packages/cli/package.json
@@ -0,0 +1,18 @@
+{
+  "name": "ellmers-cli",
+  "type": "module",
+  "version": "0.0.1",
+  "description": "Ellmers is a tool for building and running DAG pipelines of AI tasks.",
+  "scripts": {
+    "watch": "",
+    "build": "",
+    "test": "echo \"Error: no test specified\" && exit 1"
+  },
+  "bin": "src/elmers.js",
+  "files": [
+    "src"
+  ],
+  "dependencies": {
+    "ellmers-core": "workspace:*"
+  }
+}