Browse Source

more notes

tarfeef101 6 years ago
parent
commit
ddc9223746
1 changed files with 40 additions and 0 deletions
  1. 40 0
      oct_4

+ 40 - 0
oct_4

@@ -0,0 +1,40 @@
+Everything is a graph. Except for graphs. Those are charts.
+
+We're going to talk about sparse graphs. This means there aren't too many edges compared to the possible amt of edges.
+CS people are masochists. (Because we love graphs, which are hard). Why:
+  they can have weird structures (can be fun to make data structs for it)
+  weird data access patters (Can be fun to make infrastructure to adjust for that)
+  you can optimize things with graph things
+
+What do graphs have to do with MR/Spark?
+  lots of graph algos do:
+    compute stuff at each node, and continue traversing (propogate results) (often these computations have to do with parents/children)
+  
+How to represent a graph in MR and Spark?
+  Adjacency matrices (n * n matrix, each row/column represents a vertex, each cell is 1 or 0 if there is an edge from row to column)
+    while easy to understand, it uses lots of space and is hard to actually code/compute on them
+  Adjacency Lists: basically the above but just have a list of n lists, each list is a list of outgoing edges (this is like a postings list for text processing/search)
+    compressible just like those postings lists, easy to compute things as your traverse from start -> end (compute over outgoing links). But rather hard to do the opposite without flipping the entire data strcuture beforehand
+  Edge Lists: Just a fucking list of edges. Why? Cause it's fucking easy. Also you can always append shit, so good for processing a stream of input. But it uses lots of space, and computing on them efficiently is hard cause it's not in traversal order or anything.
+
+How to process them:
+  split by edges or vertices??
+
+How to deal with undirected graphs?
+  1) you could store the edge both ways, and ensure your algo handles the dupes
+  2) store one edge using a static rule (eg start is always the lower ID)
+
+Things we can do with MR/Spark:
+  Invert the graph: flatMap: emit each edge with the pair reversed, regroup as needed depending on which representation you used
+  Going from types of representations to each other is pretty self-explanatory
+
+  Make those cool graphical representations of data and relations!:
+    treat all vertices like "like charges" (they repel each other)
+    treat edges like spring connecting them (so they pull vertices together)
+    this aint scalable though. 10,000+ nodes makes this useless and a giant mess to see
+
+  How to do that with say, the interwebs?
+    painfully
+
+SSSP:
+  read the slides