Introduction
Welcome to the third installment of our Neo4j tutorial series! In this guide, we'll dive deep into Cypher, Neo4j's powerful query language designed specifically for working with graph databases. If you've been following along with our previous tutorials, you're now familiar with graph database concepts and the Neo4j platform. Now it's time to master the language that brings these graph databases to life.
Cypher is to Neo4j what SQL is to relational databases—an elegant, declarative query language that allows you to describe what you want to retrieve from your database without having to specify exactly how to retrieve it. What makes Cypher special is its visual ASCII art syntax that mimics the diagram patterns it's meant to match in your graph.
Let's explore Cypher through practical examples, building on real-world scenarios that demonstrate its power and flexibility.
Getting Started with Cypher Syntax
Cypher's syntax is designed to be visual and intuitive, representing nodes as parentheses ()
and relationships as arrows -[]->
. This makes queries readable and similar to drawing the pattern you want to find on a whiteboard.
Basic Node and Relationship Patterns
// Basic pattern: Find a person named "John"
MATCH (p:Person {name: "John"})
RETURN p
// Pattern with relationship: Find John's friends
MATCH (p:Person {name: "John"})-[:FRIENDS_WITH]->(friend)
RETURN friend.name
In these examples:
-
()
represents a node -
:Person
is a label that categorizes the node -
{name: "John"}
is a property constraint -
-[:FRIENDS_WITH]->
represents a directed relationship with the type "FRIENDS_WITH"
Creating Data with Cypher
Let's start by creating a small social network dataset to work with throughout this tutorial.
Creating Nodes
// Create Person nodes
CREATE (alice:Person {name: "Alice", age: 32, occupation: "Data Scientist"})
CREATE (bob:Person {name: "Bob", age: 35, occupation: "Software Engineer"})
CREATE (charlie:Person {name: "Charlie", age: 28, occupation: "UX Designer"})
CREATE (diana:Person {name: "Diana", age: 41, occupation: "Project Manager"})
CREATE (edward:Person {name: "Edward", age: 25, occupation: "Data Analyst"})
// Create Interest nodes
CREATE (graphdb:Interest {name: "Graph Databases", category: "Technology"})
CREATE (cycling:Interest {name: "Cycling", category: "Sports"})
CREATE (cooking:Interest {name: "Cooking", category: "Hobby"})
CREATE (photography:Interest {name: "Photography", category: "Arts"})
CREATE (travel:Interest {name: "Travel", category: "Lifestyle"})
Creating Relationships
// Create friendship relationships
MATCH (alice:Person {name: "Alice"}), (bob:Person {name: "Bob"})
CREATE (alice)-[:FRIENDS_WITH {since: 2018}]->(bob)
MATCH (alice:Person {name: "Alice"}), (charlie:Person {name: "Charlie"})
CREATE (alice)-[:FRIENDS_WITH {since: 2020}]->(charlie)
MATCH (bob:Person {name: "Bob"}), (diana:Person {name: "Diana"})
CREATE (bob)-[:FRIENDS_WITH {since: 2015}]->(diana)
MATCH (charlie:Person {name: "Charlie"}), (diana:Person {name: "Diana"})
CREATE (charlie)-[:FRIENDS_WITH {since: 2019}]->(diana)
MATCH (diana:Person {name: "Diana"}), (edward:Person {name: "Edward"})
CREATE (diana)-[:FRIENDS_WITH {since: 2021}]->(edward)
// Create interest relationships
MATCH (alice:Person {name: "Alice"}), (graphdb:Interest {name: "Graph Databases"})
CREATE (alice)-[:INTERESTED_IN {level: "Expert"}]->(graphdb)
MATCH (alice:Person {name: "Alice"}), (cycling:Interest {name: "Cycling"})
CREATE (alice)-[:INTERESTED_IN {level: "Intermediate"}]->(cycling)
MATCH (bob:Person {name: "Bob"}), (graphdb:Interest {name: "Graph Databases"})
CREATE (bob)-[:INTERESTED_IN {level: "Beginner"}]->(graphdb)
MATCH (bob:Person {name: "Bob"}), (cooking:Interest {name: "Cooking"})
CREATE (bob)-[:INTERESTED_IN {level: "Advanced"}]->(cooking)
MATCH (charlie:Person {name: "Charlie"}), (photography:Interest {name: "Photography"})
CREATE (charlie)-[:INTERESTED_IN {level: "Expert"}]->(photography)
MATCH (diana:Person {name: "Diana"}), (travel:Interest {name: "Travel"})
CREATE (diana)-[:INTERESTED_IN {level: "Advanced"}]->(travel)
MATCH (diana:Person {name: "Diana"}), (cooking:Interest {name: "Cooking"})
CREATE (diana)-[:INTERESTED_IN {level: "Intermediate"}]->(cooking)
MATCH (edward:Person {name: "Edward"}), (graphdb:Interest {name: "Graph Databases"})
CREATE (edward)-[:INTERESTED_IN {level: "Beginner"}]->(graphdb)
MATCH (edward:Person {name: "Edward"}), (cycling:Interest {name: "Cycling"})
CREATE (edward)-[:INTERESTED_IN {level: "Advanced"}]->(cycling)
Querying Data with Cypher
Now that we have our dataset, let's explore different types of queries to extract valuable insights.
Basic READ Operations
Finding Nodes by Label and Properties
// Find all Person nodes
MATCH (p:Person)
RETURN p
// Find people over 30 years old
MATCH (p:Person)
WHERE p.age > 30
RETURN p.name, p.age, p.occupation
Finding Relationships
// Find who is friends with whom
MATCH (p1:Person)-[r:FRIENDS_WITH]->(p2:Person)
RETURN p1.name, p2.name, r.since
// Find Alice's friends
MATCH (p:Person {name: "Alice"})-[:FRIENDS_WITH]->(friend)
RETURN friend.name
Advanced Pattern Matching
Finding Friends of Friends
// Find friends of Alice's friends (who aren't directly connected to Alice)
MATCH (alice:Person {name: "Alice"})-[:FRIENDS_WITH]->(friend)-[:FRIENDS_WITH]->(fof)
WHERE NOT (alice)-[:FRIENDS_WITH]->(fof) AND alice <> fof
RETURN DISTINCT fof.name as FriendOfFriend
Finding Common Interests
// Find people who share interests with Alice
MATCH (alice:Person {name: "Alice"})-[:INTERESTED_IN]->(interest)<-[:INTERESTED_IN]-(other)
WHERE alice <> other
RETURN other.name as Person, interest.name as SharedInterest
Aggregations and Sorting
// Count friends for each person
MATCH (p:Person)-[:FRIENDS_WITH]->(friend)
RETURN p.name as Person, COUNT(friend) as NumberOfFriends
ORDER BY NumberOfFriends DESC
// Find the most popular interests
MATCH (i:Interest)<-[:INTERESTED_IN]-(p:Person)
RETURN i.name as Interest, COUNT(p) as Popularity
ORDER BY Popularity DESC
Updating Data with Cypher
Cypher is not just for querying data; it's also used for updating your graph.
Updating Node Properties
// Update Alice's age
MATCH (p:Person {name: "Alice"})
SET p.age = 33
RETURN p
// Add a new property to all Person nodes
MATCH (p:Person)
SET p.active = true
RETURN p
Creating New Relationships
// Create a new friendship between Alice and Diana
MATCH (alice:Person {name: "Alice"}), (diana:Person {name: "Diana"})
WHERE NOT (alice)-[:FRIENDS_WITH]->(diana)
CREATE (alice)-[:FRIENDS_WITH {since: 2022}]->(diana)
Removing Properties and Relationships
// Remove the 'level' property from an INTERESTED_IN relationship
MATCH (alice:Person {name: "Alice"})-[r:INTERESTED_IN]->(i:Interest {name: "Cycling"})
REMOVE r.level
RETURN alice, r, i
// Delete a friendship
MATCH (bob:Person {name: "Bob"})-[r:FRIENDS_WITH]->(diana:Person {name: "Diana"})
DELETE r
Advanced Cypher Techniques
Path Variables and Functions
// Find the shortest path between Alice and Edward
MATCH path = shortestPath((alice:Person {name: "Alice"})-[:FRIENDS_WITH*]-(edward:Person {name: "Edward"}))
RETURN path
// Return the length of the path
MATCH path = shortestPath((alice:Person {name: "Alice"})-[:FRIENDS_WITH*]-(edward:Person {name: "Edward"}))
RETURN [node in nodes(path) | node.name] as People, length(path) as PathLength
Working with Collections
// Collect all interests for each person
MATCH (p:Person)-[:INTERESTED_IN]->(i:Interest)
RETURN p.name as Person, COLLECT(i.name) as Interests
// Find people who are interested in ALL of these interests
MATCH (p:Person)
WHERE ALL(interest IN ["Graph Databases", "Cycling"]
WHERE (p)-[:INTERESTED_IN]->(:Interest {name: interest}))
RETURN p.name
Using CASE Expressions
// Categorize people by age group
MATCH (p:Person)
RETURN p.name,
CASE
WHEN p.age < 30 THEN "Young Professional"
WHEN p.age >= 30 AND p.age < 40 THEN "Mid-career"
ELSE "Senior Professional"
END AS AgeCategory
Query Optimization in Cypher
As your graph grows, optimizing queries becomes important. Here are some techniques:
Using Indexes
// Create an index on Person name
CREATE INDEX person_name FOR (p:Person) ON (p.name)
// Create an index on Interest name
CREATE INDEX interest_name FOR (i:Interest) ON (i.name)
Query Profiling
// Profile a query to see execution plan
PROFILE MATCH (p:Person {name: "Alice"})-[:FRIENDS_WITH*1..3]-(other)
RETURN other.name
Query Optimization Tips
- Start with specific nodes: Begin your queries with the most specific node patterns to reduce the initial working set.
- Use parameters: Instead of hardcoding values, use parameters for better query plan caching.
- Limit relationship depth: Be careful with unbounded variable-length paths, as they can explore large portions of your graph.
- Filter early: Apply WHERE clauses as early as possible in your query to reduce the amount of data processed.
Practical Examples: Solving Real-World Problems
Let's apply what we've learned to solve some common graph problems:
Recommendation System
// Recommend new interests to Alice based on what her friends like
MATCH (alice:Person {name: "Alice"})-[:FRIENDS_WITH]->(friend)-[:INTERESTED_IN]->(interest)
WHERE NOT (alice)-[:INTERESTED_IN]->(interest)
RETURN interest.name as RecommendedInterest, COUNT(friend) as CommonFriends
ORDER BY CommonFriends DESC
Network Analysis
// Find the most central person (with most connections) within 2 steps from Alice
MATCH (alice:Person {name: "Alice"})-[:FRIENDS_WITH*1..2]-(person)-[:FRIENDS_WITH]-(connection)
RETURN person.name, COUNT(DISTINCT connection) as Connections
ORDER BY Connections DESC
LIMIT 1
Pattern Detection
// Find triangles of friendship (groups of 3 mutual friends)
MATCH (p1:Person)-[:FRIENDS_WITH]-(p2:Person)-[:FRIENDS_WITH]-(p3:Person)-[:FRIENDS_WITH]-(p1)
WHERE p1.name < p2.name AND p2.name < p3.name // To avoid duplicate results
RETURN p1.name, p2.name, p3.name
Best Practices for Writing Cypher Queries
- Start small and build up: Begin with simple patterns and gradually add complexity.
- Use meaningful aliases: Choose descriptive variable names that make your queries readable.
- Test with LIMIT: When developing queries for large datasets, use LIMIT to test with a smaller result set first.
- Comment your queries: Add comments to explain complex logic, especially for queries that will be reused.
- Use parameters: Instead of hardcoding values, use parameters for better security and performance:
// Using parameters
MATCH (p:Person)
WHERE p.age > $minAge
RETURN p.name
- Keep queries focused: Each query should have a single, clear purpose rather than trying to do too much at once.
Conclusion
Cypher is a powerful, expressive language that makes working with graph data intuitive and efficient. By mastering Cypher, you unlock the full potential of Neo4j and graph databases, enabling you to solve complex connected data problems with elegant, readable queries.
This tutorial has covered the fundamentals of Cypher syntax as well as advanced techniques, all through practical examples. As you continue your Neo4j journey, experiment with these patterns on your own data and explore how the expressiveness of Cypher can simplify your most complex data challenges.