Developer API

OpenCypher Query Engine.

KGL includes an in-memory OpenCypher query engine. There is no external database — the .kgl file system is the store. Queries run directly against a Graph loaded from disk.

Python API

python
from kgl.model import Graph
from kgl.query.engine import QueryEngine

graph = Graph.load("./")
engine = QueryEngine(graph)

results = engine.run("""
    MATCH (p:Person)-[r:mentors]->(q:Person)
    WHERE p.name = "Alice Nguyen"
    RETURN p.name, q.name, r.started
""")

for row in results:
    print(row)
# {"p.name": "Alice Nguyen", "q.name": "Diana Park", "r.started": "2023-06"}

engine.run(query_str) returns list[dict[str, Any]]. Each dict maps return variable names (or aliases) to their values.

Supported syntax

Phase 1 — core implemented

ClauseExample
MATCH (n:Type)MATCH (n:Person)
MATCH (n)-[r:REL]->(m)Directed edge with type
MATCH (n)-[r]->(m)Directed edge, any type
MATCH (n)-[r]-(m)Undirected edge
WHERE n.field = "value"Equality filter
WHERE n.field != "value"Inequality filter
WHERE n.field CONTAINS "val"Substring filter
WHERE n.field STARTS WITH "val"Prefix filter
WHERE expr AND exprLogical AND
WHERE expr OR exprLogical OR
RETURN nReturn full node binding
RETURN n.fieldReturn field value
ORDER BY n.field ASC|DESCSort results
LIMIT nTruncate results

Phase 2 — extended planned

Phase 3 — aggregations planned

CLI usage

bash
kgl query ./ "MATCH (p:Person)-[r:mentors]->(q:Person) RETURN p.name, q.name, r.started"
output — aligned table
p.name        q.name      r.started
────────────  ──────────  ──────────
Alice Nguyen  Diana Park  2023-06
json output
kgl query ./ "MATCH (n:Person) RETURN n.name" --format json
csv output
kgl query ./ "MATCH (n:Person) RETURN n.name" --format csv

Architecture

The query engine has three independently testable components:

  1. Lexer (kgl/query/lexer.py) — tokenises an OpenCypher string into typed tokens
  2. Parser (kgl/query/parser.py) — recursive-descent parser producing an AST
  3. Engine (kgl/query/engine.py) — executes a parsed query against a Graph

Execution strategy

For v0.1, the engine uses brute-force enumeration. This is acceptable for human-scale graphs (thousands of nodes, not millions).

  1. Seed — enumerate all candidate nodes/edges matching the first MATCH pattern
  2. Extend — for each candidate, extend the pattern along each additional path segment
  3. Filter — apply WHERE clause to each binding
  4. Project — evaluate RETURN expressions
  5. Post-process — apply ORDER BY and LIMIT

Known limitations vs full OpenCypher

FeatureStatus
Variable-length paths [*1..n]not supported
Aggregations (COUNT, COLLECT)not supported
WITH clausenot supported
OPTIONAL MATCHnot supported
MERGE, CREATE, DELETEnot applicable (read-only)
UNWINDnot supported
Property filters in patterns (n {key: "val"})not supported
IN operatornot supported