Chapter 7 of 16
4. KNOWLEDGE GRAPH ENGINEERING MODULE
(Knowledge graphs are where structured reasoning lives. RAG gives you semantic search, but knowledge graphs give you logical traversal - "find the manager's manager's direct reports who work on ML projects." You can't do that with vector similarity alone. The challenge isn't the tech, it's figuring out what your schema should look like before you've loaded a million nodes.)
Graph Schema Design
What is a Schema?
Definition: The blueprint of your graph - what types of nodes and relationships exist, and what properties they have.
Analogy: Like a database schema, but for graphs.
(Unlike SQL schemas, graph schemas are flexible - you can add new node types and relationships without migrations. This is both a blessing and a curse. Blessing: easy to evolve. Curse: people abuse this flexibility and end up with an unmaintainable mess of ad-hoc relationships. Design your schema properly from the start.)
Schema Design Process
Step 1: Identify Entities (Nodes)
Example Domain: Company Knowledge Base
Entities:
- Person (employees, customers)
- Company
- Product
- Project
- Document
Step 2: Identify Relationships (Edges)
Relationships:
- Person WORKS_FOR Company
- Person MANAGES Person
- Person AUTHORED Document
- Company PRODUCES Product
- Project USES Product
Step 3: Define Properties
// Node properties
Person: {name, email, role, hire_date}
Company: {name, industry, founded_year}
Product: {name, version, release_date}
Document: {title, content, created_date}
// Relationship properties
WORKS_FOR: {since, position}
MANAGES: {since}
AUTHORED: {date, contribution_type}
Complete Schema Example
// Create constraints (ensures data quality)
CREATE CONSTRAINT person_email IF NOT EXISTS
FOR (p:Person) REQUIRE p.email IS UNIQUE;
CREATE CONSTRAINT company_name IF NOT EXISTS
FOR (c:Company) REQUIRE c.name IS UNIQUE;
// Example data following schema
CREATE (alice:Person {
name: "Alice Smith",
email: "alice@example.com",
role: "Engineer",
hire_date: date("2020-01-15")
})
CREATE (acme:Company {
name: "Acme Corp",
industry: "Technology",
founded_year: 2010
})
CREATE (alice)-[:WORKS_FOR {
since: date("2020-01-15"),
position: "Senior Engineer"
}]->(acme)
Schema Best Practices
- Use Clear Labels:
PersonnotP,WORKS_FORnotW4 - Normalize Data: Store shared properties once
- Plan for Queries: Design schema around your query patterns
- Use Constraints: Enforce uniqueness and data integrity