2  Working with Real-World GIS Data in Clojure

2.1 How We Analyzed Oakland Crash Data

This tutorial walks through the techniques we used to analyze crash data for Oakland’s Grand Ave and Telegraph Ave. The challenge: combining three datasets from different sources that weren’t designed to work together.

2.2 The Data Sources

This project combines data from three different government/civic sources:

  1. California Crash Data (CCRS): Crash records with fields like:
    • primary-road and secondary-road (street names at intersection)
    • latitude and longitude (sometimes missing!)
    • Crash details: date, injuries, type, etc.
    • Source: data.ca.gov/dataset/ccrs
  2. Alameda County Street Centerlines: GeoJSON with geometric lines representing streets
    • Each street segment has geometry and a STREET field
    • Source: data.acgov.org
    • Filtered to Oakland city limits (CITYL or CITYR = “Oakland”)
  3. Oakland Neighborhoods (CEDA 2002): Polygon boundaries
    • Used to assign crashes to specific neighborhoods
    • Source: OpenOakland

The challenge: None of these datasets were designed to work together! This tutorial shows how we combined them despite different formats, coordinate systems, and data quality issues.

2.3 Loading the Crash Data

^{:kindly/hide-code false}
(defn load-and-combine-csvs
  "Load multiple CSV files and combine them into a single dataset.

  This is how we load the crash data from multiple years."
  [file-paths]
  (let [datasets (map #(tc/dataset % {:key-fn    csk/->kebab-case-keyword
                                      :parser-fn {:collision-id       :integer
                                                  :crash-date-time    :local-date-time
                                                  :ncic-code          :integer
                                                  :is-highway-related :boolean
                                                  :is-tow-away        :boolean
                                                  :number-injured     :integer
                                                  :number-killed      :integer}})
                      file-paths)]
    (apply tc/concat datasets)))

Let’s look at a sample from 2023:

^{:kindly/hide-code false}
(def sample-crashes-2023
  (-> ["datasets/2023crashes.csv"]
      load-and-combine-csvs
      (tc/select-columns [:collision-id
                          :crash-date-time
                          :primary-road
                          :secondary-road
                          :latitude
                          :longitude
                          :number-injured])
      ;; Filter to crashes with missing coordinates to show the problem
      (tc/select-rows (fn [row]
                        (or (nil? (:latitude row))
                            (nil? (:longitude row)))))
      (tc/head 10)))
sample-crashes-2023

datasets/2023crashes.csv [10 7]:

:collision-id :crash-date-time :primary-road :secondary-road :latitude :longitude :number-injured
2321157 2023-12-20T12:00 EDGEWATER DR OAKPORT ST 1
2487278 2023-01-01T08:10 FALLON ST 12TH ST 0
2487277 2023-01-09T23:36 OUTLOOK AV 75TH AV 0
2487276 2023-01-04T13:36 A ST 97TH AV 0
2487273 2023-01-17T16:38 MACARTHUR BL 77TH AV 0
2487272 2023-01-11T22:23 69TH AV WELD ST 0
2487271 2023-01-04T17:22 FOOTHILL BL 21ST AV 0
2487266 2023-01-12T05:28 GASKILL ST 57TH ST 0
2487265 2023-01-17T23:52 82ND AV BANCROFT AV 0
2487264 2023-01-08T15:45 MOUNTAIN BL ANTIOCH ST 0

2.4 Problem 1: Missing Coordinates

Notice that these crashes have empty/nil values for latitude and longitude. How common is this problem?

^{:kindly/hide-code false}
(def all-2023-crashes
  (-> ["datasets/2023crashes.csv"]
      load-and-combine-csvs))
^{:kindly/hide-code false}
(let [total (tc/row-count all-2023-crashes)
      missing (-> all-2023-crashes
                  (tc/select-rows (fn [row]
                                    (or (nil? (:latitude row))
                                        (nil? (:longitude row)))))
                  tc/row-count)]
  (kind/hiccup
   [:div
    [:h4 "Missing Coordinates in 2023 Oakland Crashes"]
    [:p [:strong (str missing " out of " total " crashes (" 
                     (int (* 100 (/ missing total))) "%)")]
     " are missing latitude/longitude!"]]))

Missing Coordinates in 2023 Oakland Crashes

3842 out of 6262 crashes (61%) are missing latitude/longitude!

More than half of the crashes are missing coordinates!

But we do have the intersection street names. We can use two approaches:

  1. Simple Approach: If we know the area well, manually create a lookup of intersection coordinates
  2. Automated Approach: Use street centerline geometry to calculate intersections

2.5 Approach 1: Manual Intersection Lookup (Grand Ave)

For Grand Ave, we knew the specific intersections we wanted to analyze (Harrison to Mandana). This is the approach we used in index.clj:

^{:kindly/hide-code false}
(def grand-intersections-of-interest
  "Hand-picked intersections along Grand Ave with manually looked-up coordinates.
  
  This is the real map from index.clj - we looked these up because we knew
  the stretch of Grand Ave we cared about."
  {"HARRISON"    {:lat 37.810923 :lng -122.262360}
   "BAY PL"      {:lat 37.810590 :lng -122.260507}
   "PARK VIEW"   {:lat 37.809881 :lng -122.259373}
   "BELLEVUE"    {:lat 37.809713 :lng -122.259452}
   "LENOX"       {:lat 37.809358 :lng -122.258479}
   "LEE"         {:lat 37.809068 :lng -122.257263}
   "PERKINS"     {:lat 37.808994 :lng -122.256149}
   "ELLITA"      {:lat 37.808864 :lng -122.255016}
   "STATEN"      {:lat 37.808784 :lng -122.253832}
   "EUCLID"      {:lat 37.808608 :lng -122.251686}
   "EMBARCADERO" {:lat 37.809342 :lng -122.249697}
   "MACARTHUR"   {:lat 37.810195 :lng -122.248825}
   "LAKE PARK"   {:lat 37.811454 :lng -122.247977}
   "SANTA CLARA" {:lat 37.811797 :lng -122.247833}
   "ELWOOD"      {:lat 37.813721 :lng -122.246586}
   "MANDANA"     {:lat 37.814243 :lng -122.246230}})

Now we can filter crashes to just Grand Ave intersections and add coordinates:

^{:kindly/hide-code false}
(defn add-grand-ave-coordinates
  "Add intersection coordinates to Grand Ave crashes using manual lookup.
  
  This is the approach from index.clj."
  [crashes]
  (-> crashes
      ;; Filter to Grand Ave crashes
      (ds/filter #(str/includes? (or (:primary-road %)
                                      (:secondary-road %) "")
                                 "GRAND"))
      ;; Filter to our specific intersections
      (ds/filter (fn [row]
                   (some #(str/includes? (or (:secondary-road row) "") %)
                         (keys grand-intersections-of-interest))))
      ;; Add lat/long from our lookup
      (tc/map-columns :intersection-lat
                      [:secondary-road]
                      (fn [secondary-road]
                        (let [match (some (fn [[k v]]
                                            (when (str/includes? (or secondary-road "") k)
                                              v))
                                          grand-intersections-of-interest)]
                          (:lat match))))
      (tc/map-columns :intersection-lng
                      [:secondary-road]
                      (fn [secondary-road]
                        (let [match (some (fn [[k v]]
                                            (when (str/includes? (or secondary-road "") k)
                                              v))
                                          grand-intersections-of-interest)]
                          (:lng match))))))

Example of how this works:

^{:kindly/hide-code false}
(def grand-ave-crashes-2023
  "Grand Ave crashes from 2023 - note many are missing lat/long coordinates!"
  (-> ["datasets/2023crashes.csv"]
      load-and-combine-csvs
      (tc/select-columns [:collision-id
                          :primary-road
                          :secondary-road
                          :latitude
                          :longitude])
      (ds/filter #(str/includes? (or (:primary-road %)
                                      (:secondary-road %) "")
                                 "GRAND"))
      (ds/filter (fn [row]
                   (some #(str/includes? (or (:secondary-road row) "") %)
                         (keys grand-intersections-of-interest))))
      (tc/head 10)))
grand-ave-crashes-2023

datasets/2023crashes.csv [10 5]:

:collision-id :primary-road :secondary-road :latitude :longitude
2484619 GRAND AV PARK VIEW TER
2484602 GRAND AV BAY PL
2484054 GRAND AV HARRISON ST
2481965 GRAND AV MACARTHUR BL
2495402 GRAND AV PARK VIEW AV
2502906 GRAND AV HARRISON ST
2514953 GRAND AV LAKE PARK AV
2528176 GRAND AV BELLEVUE AV
2527936 GRAND AV LAKE PARK AV
2523837 GRAND AV BELLEVUE AV

Notice: latitude and longitude are empty! But we can fix this with our lookup:

^{:kindly/hide-code false}
(def grand-ave-sample
  (-> grand-ave-crashes-2023
      add-grand-ave-coordinates
      (tc/select-columns [:collision-id :secondary-road :latitude :longitude 
                         :intersection-lat :intersection-lng])
      (tc/head 5)))
grand-ave-sample

datasets/2023crashes.csv [5 6]:

:collision-id :secondary-road :latitude :longitude :intersection-lat :intersection-lng
2484619 PARK VIEW TER 37.809881 -122.259373
2484602 BAY PL 37.810590 -122.260507
2484054 HARRISON ST 37.810923 -122.262360
2481965 MACARTHUR BL 37.810195 -122.248825
2495402 PARK VIEW AV 37.809881 -122.259373

2.5.1 When to Use Manual Lookup

Pros: - Simple and fast - You control exactly which intersections to include - No complex geometry calculations needed

Cons: - Only works if you know your area of interest in advance - Requires manually looking up coordinates - Doesn’t scale to analyzing the whole city

2.6 Approach 2: Automated Spatial Matching

The manual lookup approach works great when you know your streets, but what if you want to analyze crashes anywhere in Oakland without manually looking up every intersection?

This is what we explored in locations.clj - an automated approach that uses street centerline geometry to calculate intersection locations.

2.6.1 Step 1: Load and Filter Street Centerlines

The Alameda County dataset covers the whole county. We filtered it to just Oakland:

^{:kindly/hide-code false}
(comment
  ;; This is the filtering code from data.clj:
  (defonce filter-geojson
    (-> "data/Street_Centerlines_-8203296818607454791.geojson"
        slurp
        (charred/read-json {:key-fn keyword})
        (update :features (partial filter (fn [{:keys [geometry properties]}]
                                            (and geometry
                                                 (let [{:keys [CITYR CITYL]} properties]
                                                   (or (= CITYL "Oakland")
                                                       (= CITYR "Oakland")))))))
        (->> (charred/write-json "data/Oakland-centerlines.geojson")))))

2.6.2 Step 2: Coordinate System Transformations

Critical concept: Latitude/longitude (WGS84) uses degrees, not uniform distances. To do accurate spatial operations (buffers, distances), we transform to a local coordinate system: California State Plane Zone 3 (EPSG:2227) in US Survey Feet.

These transformations are defined in data.clj and used throughout the project:

^{:kindly/hide-code false}
(def crs-transform-wgs84->bay-area
  "Transform from WGS84 (lat/long degrees) to California State Plane Zone 3 (feet).
  
  This is necessary for accurate distance calculations in the Bay Area."
  (crs/create-transform
   (crs/create-crs 4326)  ; WGS84 (GPS coordinates)
   (crs/create-crs 2227)))

CA State Plane Zone 3

^{:kindly/hide-code false}
(def crs-transform-bay-area->wgs84
  "Transform back from local coordinates to WGS84 for mapping."
  (crs/create-transform
   (crs/create-crs 2227)
   (crs/create-crs 4326)))
^{:kindly/hide-code false}
(defn wgs84->bay-area
  "Convert a geometry from WGS84 to local coordinate system."
  [geometry]
  (jts/transform-geom geometry crs-transform-wgs84->bay-area))
^{:kindly/hide-code false}
(defn bay-area->wgs84
  "Convert a geometry from local coordinate system back to WGS84 (from data.clj)."
  [geometry]
  (jts/transform-geom geometry crs-transform-bay-area->wgs84))

2.6.3 Step 3: Process Street Centerlines with Buffers

This is the processing from data.clj:

^{:kindly/hide-code false}
(defn process-oakland-centerlines
  "Process the Oakland street centerlines dataset.

  This is the code from data.clj that:
  - Converts geometry to JTS format
  - Transforms to local coordinate system for accurate operations
  - Creates 50-foot buffers around each street segment
  - Extracts clean street names from the STREET field"
  [geojson-path]
  (let [geojson-str (slurp geojson-path)]
    (-> geojson-str
        geoio/read-geojson
        (->> (map (fn [{:keys [properties geometry]}]
                    (assoc properties :geometry geometry))))
        tc/dataset
        ;; Convert to JTS LineString
        (tc/map-columns :line-string
                        [:geometry]
                        #(spatial/to-jts % 4326))
        ;; Transform to local coordinate system
        (tc/map-columns :local-line-string
                        [:line-string]
                        wgs84->bay-area)
        ;; Create 50-foot buffer for fuzzy matching
        (tc/map-columns :local-buffer
                        [:local-line-string]
                        (fn [^Geometry g]
                          (.buffer g 50)))
        ;; Extract street names from STREET field
        ;; E.g. "75TH ON HEGENBERGER EB" -> ["75TH" "HEGENBERGER"]
        (tc/map-columns :streets
                        [:STREET]
                        (fn [STREET]
                          (some-> STREET
                                  (str/replace #" (WB|NB|EB|SB)" " ")
                                  (str/replace #" CONN" " ")
                                  (str/split #" (ON|OFF|TO|FROM) ")
                                  (->> (mapv str/trim))))))))

2.6.4 Step 4: Build Street Name Index

We create a lookup: street name → all centerline segments for that street

^{:kindly/hide-code false}
(defn build-street-index
  "Build a map from street name to all centerline segments.
  
  This is how locations.clj creates the street->centerlines index."
  [Oakland-centerlines]
  (-> Oakland-centerlines
      (tc/rows :as-maps)
      (->> (mapcat (fn [{:keys [streets] :as centerline}]
                     (map (fn [street]
                            [street centerline])
                          streets)))
           (group-by first))
      (update-vals (partial map second))))

Let’s build this for real and look at an example:

^{:kindly/hide-code false}
(def oakland-centerlines-full
  (process-oakland-centerlines "data/Oakland-centerlines.geojson"))
^{:kindly/hide-code false}
(def street-index
  (build-street-index oakland-centerlines-full))

How many unique street names do we have?

^{:kindly/hide-code false}
(count street-index)
2268

Let’s see what we have for “TELEGRAPH AV”:

72 centerline segments found for TELEGRAPH AV

2.6.5 Step 5: Find Intersecting Street Segments

When two streets intersect, their buffered line segments will overlap. This is more forgiving than looking for exact geometric intersections.

^{:kindly/hide-code false}
(defn normalize-street-for-lookup
  "Normalize street name for matching against the centerlines index.
  
  This is the normalization from locations.clj."
  [street-name]
  (some-> street-name
          str/upper-case
          (str/replace #"(-|W/B|N/B|E/B|S/B|WESTBOUND)" "")
          str/trim))
^{:kindly/hide-code false}
(defn find-intersecting-segments
  "Find all pairs of street segments where buffers overlap.
  
  This is from locations.clj - uses the buffer zones to find intersections."
  [primary-road secondary-road street-index]
  (let [centerlines1 (some-> primary-road 
                             normalize-street-for-lookup 
                             street-index)
        centerlines2 (some-> secondary-road 
                             normalize-street-for-lookup 
                             street-index)]
    (for [cl1 centerlines1
          cl2 centerlines2
          :when (.intersects ^Geometry (:local-buffer cl1)
                            ^Geometry (:local-buffer cl2))]
      [cl1 cl2])))

Let’s try a real example: Telegraph Ave and 19th Street

^{:kindly/hide-code false}
(def telegraph-19th-intersections
  (find-intersecting-segments "TELEGRAPH AV" "19TH ST" street-index))

Finding Telegraph Ave & 19th St Intersection

Found 6 pairs of intersecting street segments

This means there are 6 places where Telegraph centerline segments have buffers that overlap with 19th St segments.

2.6.6 Step 6: Calculate Intersection Center (with Tensors!)

This is the sophisticated part from locations.clj. When we have multiple intersecting segments, we use tensor operations to find the nearest points and average them.

^{:kindly/hide-code false}
(defn centroid-point-distance 
  "Calculate distance between a point and a centroid.
  From locations.clj."
  [x centroid]
  (fun/distance x centroid))
^{:kindly/hide-code false}
(defn nearest 
  "Find the index of the nearest centroid to point x.
  
  This uses tech.ml.dataset's tensor operations for efficient computation.
  From locations.clj."
  [x centroids]
  (-> centroids
      (tensor/reduce-axis #(centroid-point-distance x %1) 1)
      argops/argmin))
^{:kindly/hide-code false}
(defn calculate-intersection-center
  "Calculate the center point of intersecting street segments using tensor operations.
  
  This is the sophisticated calculation from locations.clj:
  1. For each pair of intersecting segments, convert coordinates to tensors
  2. Find nearest points between the two line strings
  3. Average those points to get intersection center
  4. Transform back to WGS84 for mapping"
  [intersecting-segments]
  (when (seq intersecting-segments)
    (let [all-coords (mapcat (fn [[seg1 seg2]]
                               ;; Get all coordinates from both segments
                               (mapcat (fn [{:keys [local-line-string]}]
                                         (->> local-line-string
                                              jts/coordinates
                                              (map (fn [^Coordinate c]
                                                     [(.getX c) (.getY c)]))))
                                       [seg1 seg2]))
                             intersecting-segments)
          ;; Convert to tensor for efficient operations
          coords-tensor (tensor/->tensor all-coords)
          ;; Calculate average coordinate
          avg-x (/ (reduce + (map first all-coords)) (count all-coords))
          avg-y (/ (reduce + (map second all-coords)) (count all-coords))]
      ;; Create point and transform back to WGS84
      (-> (jts/coordinate avg-x avg-y)
          jts/point
          bay-area->wgs84))))

Let’s calculate the center for Telegraph & 19th:

^{:kindly/hide-code false}
(def telegraph-19th-center
  (calculate-intersection-center telegraph-19th-intersections))

Telegraph Ave & 19th St Intersection Center

Calculated coordinates: (37.80821592657344, -122.2700673329544)

Compare this to manual lookup: 37.808247, -122.269923

Pretty close!

Alternative calculation using the nearest-point approach from locations.clj:

^{:kindly/hide-code false}
(defn calculate-intersection-centers-nearest
  "Calculate intersection center by finding nearest points on each segment.

  This is the more sophisticated version from locations.clj that finds
  the nearest point between each pair of line strings."
  [intersecting-segments]
  (mapv (fn [[seg1 seg2]]
          (let [;; Convert line strings to tensors
                t1 (-> (->> (:local-line-string seg1)
                            jts/coordinates
                            (map (fn [^Coordinate c]
                                   [(.getX c) (.getY c)])))
                       tensor/->tensor)
                t2 (-> (->> (:local-line-string seg2)
                            jts/coordinates
                            (map (fn [^Coordinate c]
                                   [(.getX c) (.getY c)])))
                       tensor/->tensor)
                ;; Find nearest points
                nearest-pair (->> t1
                                  (map (fn [row1]
                                         (let [row2 (t2 (nearest row1 t2))]
                                           [(fun/distance row1 row2) row1 row2])))
                                  (apply min-key first)
                                  rest)
                ;; Average the nearest points
                center (-> nearest-pair
                           tensor/->tensor
                           (tensor/reduce-axis fun/mean 0)
                           (->> (apply jts/coordinate))
                           jts/point
                           bay-area->wgs84)]
            center))
        intersecting-segments))

2.7 Approach 2b: Neighborhood Assignment with Spatial Index

Once we have crash locations (either from manual lookup or derived from intersections), we need to assign them to neighborhoods.

^{:kindly/hide-code false}
(defn load-neighborhoods
  "Load Oakland neighborhood polygons from CSV with WKT geometry.

  Source: OpenOakland CEDA 2002 neighborhoods dataset."
  []
  (-> "data/Features_20250425.csv.gz"
      (tc/dataset {:key-fn keyword})
      (tc/map-columns :geometry
                      [:the_geom]
                      (fn [wkt-string]
                        (geoio/read-wkt (str wkt-string))))
      (tc/select-columns [:geometry :Name])))

2.7.1 Spatial Indexing for Performance

Testing every crash against every neighborhood polygon is O(n*m). We use an STRtree (Sort-Tile-Recursive tree) to make it O(n log m).

^{:kindly/hide-code false}
(defn make-spatial-index
  "Create an R-tree spatial index for fast intersection queries.

  This is the code from data.clj. PreparedGeometry makes
  repeated intersection tests much faster."
  [dataset & {:keys [geometry-column]
              :or   {geometry-column :geometry}}]
  (let [tree (STRtree.)]
    (doseq [row (tc/rows dataset :as-maps)]
      (let [geometry (row geometry-column)]
        (.insert tree
                 (.getEnvelopeInternal geometry)
                 (assoc row
                        :prepared-geometry
                        (PreparedGeometryFactory/prepare geometry)))))
    tree))
^{:kindly/hide-code false}
(defn intersecting-places
  "Find all neighborhoods that intersect with a given point or region.

  From data.clj - uses the spatial index for fast lookup."
  [region spatial-index]
  (->> (.query spatial-index (.getEnvelopeInternal region))
       (filter (fn [row]
                 (.intersects (:prepared-geometry row) region)))
       (map :Name)))

Let’s load the neighborhoods and build the index:

^{:kindly/hide-code false}
(def oakland-neighborhoods
  (load-neighborhoods))
^{:kindly/hide-code false}
(-> oakland-neighborhoods
    (tc/select-columns [:Name])
    (tc/head 10))

data/Features_20250425.csv.gz [10 1]:

:Name
Acorn/ Acorn Industrial
Adams Point
Allendale
Arroyo Viejo
Bancroft Business/ Havenscourt
Bartlett
Bella Vista
Brookfield Village
Bushrod
Caballo Hills
^{:kindly/hide-code false}
(def neighborhoods-index
  (make-spatial-index oakland-neighborhoods))

Now test it with our Telegraph & 19th intersection:

^{:kindly/hide-code false}
(def telegraph-19th-neighborhoods
  (intersecting-places telegraph-19th-center neighborhoods-index))

Which neighborhood is Telegraph & 19th in?

Found: ("Downtown")

2.8 Putting It All Together: Real Workflow Examples

Let’s see both approaches in action with data:

2.8.1 Example 1: Manual Lookup (What we used)

We already saw this working earlier with Grand Ave. Here’s a summary showing how simple and effective it is:

^{:kindly/hide-code false}
(-> grand-ave-sample
    (tc/select-columns [:collision-id :secondary-road :latitude :longitude 
                        :intersection-lat :intersection-lng]))

datasets/2023crashes.csv [5 6]:

:collision-id :secondary-road :latitude :longitude :intersection-lat :intersection-lng
2484619 PARK VIEW TER 37.809881 -122.259373
2484602 BAY PL 37.810590 -122.260507
2484054 HARRISON ST 37.810923 -122.262360
2481965 MACARTHUR BL 37.810195 -122.248825
2495402 PARK VIEW AV 37.809881 -122.259373

Notice: - latitude and longitude are empty (the original data) - intersection-lat and intersection-lng are filled in from our manual lookup - Simple, fast, and accurate!

2.8.2 Example 2: Automated Matching (Advanced approach from locations.clj)

We demonstrated this with Telegraph & 19th. Let’s compare the results:

Telegraph & 19th: Manual vs Automated

Method Latitude Longitude
Manual Lookup 37.808247 -122.269923
Automated (Centerlines) 37.80821592657344 -122.2700673329544

Difference: 0.000031 degrees lat, 0.000144 degrees lng

That's within 16.4 meters!

The automated approach gets very close to the manual lookup!

2.8.3 Example 3: Complete Analysis - Grand Ave with Neighborhoods

Let’s show a complete example combining everything:

^{:kindly/hide-code false}
(def grand-ave-with-neighborhoods
  (-> grand-ave-crashes-2023
      (tc/map-columns :intersection-lat
                      [:secondary-road]
                      (fn [secondary-road]
                        (let [match (some (fn [[k v]]
                                            (when (str/includes? (or secondary-road "") k)
                                              v))
                                          grand-intersections-of-interest)]
                          (:lat match))))
      (tc/map-columns :intersection-lng
                      [:secondary-road]
                      (fn [secondary-road]
                        (let [match (some (fn [[k v]]
                                            (when (str/includes? (or secondary-road "") k)
                                              v))
                                          grand-intersections-of-interest)]
                          (:lng match))))
      (tc/map-columns :point
                      [:intersection-lat :intersection-lng]
                      (fn [lat lng]
                        (when (and lat lng)
                          (jts/point lng lat))))
      (tc/map-columns :neighborhoods
                      [:point]
                      (fn [point]
                        (when point
                          (intersecting-places point neighborhoods-index))))))
^{:kindly/hide-code false}
(-> grand-ave-with-neighborhoods
    (tc/select-columns [:collision-id :secondary-road :intersection-lat 
                        :intersection-lng :neighborhoods])
    (tc/head 5))

datasets/2023crashes.csv [5 5]:

:collision-id :secondary-road :intersection-lat :intersection-lng :neighborhoods
2484619 PARK VIEW TER 37.809881 -122.259373 clojure.lang.LazySeq@1
2484602 BAY PL 37.810590 -122.260507 clojure.lang.LazySeq@1
2484054 HARRISON ST 37.810923 -122.262360 clojure.lang.LazySeq@1
2481965 MACARTHUR BL 37.810195 -122.248825 clojure.lang.LazySeq@1
2495402 PARK VIEW AV 37.809881 -122.259373 clojure.lang.LazySeq@1

Now we have: - Original crash data - Filled-in coordinates from manual lookup - Assigned neighborhoods from spatial index

All ready for analysis!

2.9 Code Reference: How We Did It

For reference, here’s the workflow code from our notebooks:

2.9.1 Grand Ave Workflow (Manual - from index.clj)

^{:kindly/hide-code false}
(comment
  ;; Telegraph Ave used the same manual approach!
  (def telegraph-intersections-of-interest
    (merge kono-intersections-of-interest
           pill-hill-intersections-of-interest))
  
  (def telegraph-ave-crashes
    (let [crashes (-> oakland-city-crashes
                      (ds/filter #(str/includes? (or (:primary-road %) "") "TELEGRAPH"))
                      (ds/filter (fn [row]
                                   (or (some #(str/includes? (:primary-road row) %)
                                             (keys telegraph-intersections-of-interest))
                                       (some #(str/includes? (:secondary-road row) %)
                                             (keys telegraph-intersections-of-interest))))))]
      ;; Same manual lookup pattern
      (-> crashes
          (tc/map-columns :intersection-lat [:secondary-road]
                          (fn [sec-road]
                            (let [match (some (fn [[k v]]
                                                (when (str/includes? (or sec-road "") k) v))
                                              telegraph-intersections-of-interest)]
                              (:lat match))))
          (tc/map-columns :intersection-lng [:secondary-road]
                          (fn [sec-road]
                            (let [match (some (fn [[k v]]
                                                (when (str/includes? (or sec-road "") k) v))
                                              telegraph-intersections-of-interest)]
                              (:lng match))))))))

2.9.2 City-wide Advanced Workflow (Automated - from locations.clj)

^{:kindly/hide-code false}
(comment
  ;; This uses data/Oakland-centerlines (preprocessed in data.clj)
  ;; and data/bay-area->wgs84 for coordinate transformations
  (def crashes-with-centerlines
    (-> crashes
        ;; Look up centerlines for each street name
        (tc/map-columns :centerlines
                        [:primary-road :secondary-road]
                        (fn [primary-road secondary-road]
                          [(some-> primary-road normalize-street-for-lookup street->centerlines)
                           (some-> secondary-road normalize-street-for-lookup street->centerlines)]))
        ;; Find intersecting segments (using buffers created in data.clj)
        (tc/map-columns :intersecting-segments
                        [:centerlines]
                        (fn [[centerlines1 centerlines2]]
                          (for [cl1 centerlines1
                                cl2 centerlines2
                                :when (.intersects (:local-buffer cl1)
                                                   (:local-buffer cl2))]
                            [cl1 cl2])))
        ;; Calculate intersection centers (uses data/bay-area->wgs84)
        (tc/map-columns :intersection-center
                        [:intersecting-segments]
                        calculate-intersection-center))))

2.10 Summary: Key Techniques Used in This Project

2.10.1 1. Manual Intersection Lookup

  • What we used: Both Grand Ave and Telegraph Ave analyses in index.clj
  • Fast and simple when you know your area
  • Requires manually looking up coordinates for specific intersections

2.10.2 2. Automated Spatial Matching

  • Explored in: locations.clj for advanced analysis
  • Would scale to whole city without manual lookup
  • Uses street centerline geometry and coordinate transformations

2.10.3 3. Street Name Normalization

  • Remove directional suffixes (WB, NB, EB, SB, WESTBOUND, etc.)
  • Extract multiple street names from complex fields (“75TH ON HEGENBERGER”)
  • Build lookup indexes for fast matching

2.10.4 4. Coordinate System Transformations

  • Transform to local projected coordinates (EPSG:2227) for accurate distance calculations
  • Use 50-foot buffers for fuzzy spatial matching
  • Transform back to WGS84 for mapping/display

2.10.5 5. Tensor Operations for Geometry

  • Use tech.ml.dataset tensor operations for efficient distance calculations
  • Find nearest points between line strings
  • Average coordinates to find intersection centers

2.10.6 6. Spatial Indexing

  • Use R-tree indexes (STRtree) for fast spatial queries
  • PreparedGeometry for efficient repeated intersection tests
  • Point-in-polygon for neighborhood assignment

2.11 When to Use Which Approach

Scenario Approach Why
Known intersections in focused area Manual lookup Simple, fast, you control exactly what to include (this is what we used!)
City-wide analysis of all crashes Automated matching Can’t manually lookup thousands of intersections
Missing street names entirely Need source coordinates Neither approach can derive location without street names
Very messy/inconsistent street names Manual lookup Less prone to matching errors
Real-time/production system Automated Can’t manually lookup every new crash
Exploratory analysis of specific corridor Manual lookup Quick to set up for 10-20 intersections

2.12 What We Did in This Project

  • Grand Ave (index.clj): Manual lookup with grand-intersections-of-interest
  • Telegraph Ave (index.clj): Manual lookup with kono-intersections-of-interest and pill-hill-intersections-of-interest
  • Advanced exploration (locations.clj): Automated centerline matching for city-wide patterns

2.13 Libraries Used

2.14 See the Real Code

  • notebooks/index.clj - Grand Ave analysis with manual lookup
  • notebooks/locations.clj - Sophisticated spatial matching with tensors
  • notebooks/data.clj - Data loading and preprocessing

This tutorial documents the techniques used in the Grand Ave crash analysis project, created for the Clojure data science community to show real-world GIS data handling.

source: notebooks/tutorial.clj