<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Visualization, etc.</title><atom:link href="http://cscheid.net/rss/blog.xml" rel="self" type="application/rss+xml" /><link>http://cscheid.net/blog</link><description>cscheid.net | Visualization, etc.</description><language>en-us</language><item><title>Voting in the MLB Hall of Fame</title><link>http://cscheid.net/blog/voting_in_the_mlb_hall_of_fame</link><guid>http://cscheid.net/blog/voting_in_the_mlb_hall_of_fame</guid><pubDate>Wed, 23 Jan 2013 17:43:45 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;Here&amp;rsquo;s a &lt;a href="http://bit.ly/Wgqqfz"&gt;visualization of the MLB Hall of Fame&lt;/a&gt; trajectories my friend &lt;a href="http://www2.research.att.com/~kshirley/"&gt;Kenny Shirley&lt;/a&gt; and I created using &lt;a href="http://d3js.org"&gt;D3&lt;/a&gt;, &lt;a href="http://www.r-project.org"&gt;R&lt;/a&gt; and a bunch of elbow grease.&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;This was a whole lot of fun to create (the whole thing from zero to published happened in under two weeks), and was a great way to teach myself some d3. The source code is on &lt;a href="http://github.com/cscheid/mlb-hall-of-fame-visualization"&gt;Github&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>Window Seat</title><link>http://cscheid.net/blog/window_seat</link><guid>http://cscheid.net/blog/window_seat</guid><pubDate>Wed, 16 Jan 2013 00:05:00 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;What does it look like when you turn a video of a 5-hour flight into a
single image?&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;I finally put together a &lt;a href="http://cscheid.net/static/windowseat"&gt;web page&lt;/a&gt; showing the
results of a fun hack &lt;a href="http://www.cheswick.com"&gt;Bill
Cheswick&lt;/a&gt; and I worked on in 2011. Bill pointed a GoPro camera out the
window of a jetliner, packed a bunch of extra batteries and
high-capacity memory cards, and recorded the whole thing. By stitching
together a single row of pixels from every frame of the video (the
moral equivalent of &lt;a href="http://en.wikipedia.org/wiki/Slit-scan_photography"&gt;slit scan
photography&lt;/a&gt;), you get a view of the whole continental US. We got
lucky it was a mostly sunny day.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>A trivial Python hack to tweet from the command line</title><link>http://cscheid.net/blog/a_trivial_python_hack_to_tweet_from_the_command_line</link><guid>http://cscheid.net/blog/a_trivial_python_hack_to_tweet_from_the_command_line</guid><pubDate>Wed, 14 Nov 2012 23:14:08 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;One of the annoying things about Twitter is that sometimes I want to
tweet something, but I don&amp;rsquo;t want to actually read anything from my
feed (because I&amp;rsquo;ll surely get distracted). So I wrote this trivial,
minimal piece of python code to tweet from the command line.&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;#!/usr/bin/env python&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;import tweepy&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;import sys&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="padding-left: 0.5em" /&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;# Get these at dev.twitter.com&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;consumer_key="your consumer_key goes here"&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;consumer_secret="your consumer_key goes here"&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;access_token="your access_token goes here"&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;access_token_secret="your access_token_secret goes here"&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="padding-left: 0.5em" /&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;auth = tweepy.OAuthHandler(consumer_key, consumer_secret)&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;auth.set_access_token(access_token, access_token_secret)&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;api = tweepy.API(auth)&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="padding-left: 0.5em" /&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;tweet_content = " ".join(sys.argv[1:])&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;api.update_status(tweet_content)&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;p&gt;You&amp;rsquo;ll need &lt;a href="https://github.com/tweepy/tweepy"&gt;tweepy&lt;/a&gt;, which you can get with&lt;/p&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;$ pip install tweepy&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;span class="tt"&gt;&lt;span class="tt"&gt;$ tweet This is mostly useless.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>VisWeek 2012, Monday and Tuesday</title><link>http://cscheid.net/blog/visweek_2012__monday_and_tuesday</link><guid>http://cscheid.net/blog/visweek_2012__monday_and_tuesday</guid><pubDate>Thu, 18 Oct 2012 13:13:09 -0400</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;This continues the set of notes I started on the
&lt;a href="http://cscheid.net/blog/visweek_2012__sunday"&gt;previous post&lt;/a&gt;.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Monday&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;It turns out I spent most of Monday hanging out on hallways and
chatting with people, so I did not really see much of the
sessions. But I will highlight one bit about the BioVis sessions which
I thought was pretty great (Robert
&lt;a href="http://eagereyes.org/blog/2012/visweek-2012-day-one"&gt;mentioned&lt;/a&gt;
this as well on his post). &lt;a href="http://biovis.net/"&gt;BioVis&lt;/a&gt; was run as a traditional symposium,
with a set of closely-related papers on a specific area (in this case,
biological data visualization). The brilliant idea by the organizers
was to start each session with a short presentation by the session
chairs. This presentation briefly mentioned all the papers in the
session, and put them in context of the wider area. The session chair,
having had access to the work and typically more experience in the
area than the presenter, can give the wider context. This way the
audience can connect dots between the individual papers much more
easily. VisWeek should seriously consider doing something like this
instead of fast forwards. As much as I enjoyed Gordon&amp;rsquo;s &amp;ldquo;Tiny
Particles&amp;rdquo; banjo masterpiece, or (shame on me, I forget their name) &amp;ldquo;The devil came down to Baltimore&amp;rdquo;, I think I
would be more likely to get something out of a short presentation by
the session chairs before each set of papers.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Tuesday&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;I started the day with the Infovis session on evaluation, and with
&lt;a href="http://steveharoz.com"&gt;Steve Haroz&lt;/a&gt;&amp;rsquo;s presentation of this year&amp;rsquo;s
&lt;a href="http://visweek.org/visweek/2012/paper/how-capacity-limits-attention-influence-information-visualization-effectiveness"&gt;best
Infovis paper&lt;/a&gt;. Haroz and Whitney designed a set of simple experiments
that show very clearly how information displays are fundamentally
bounded by our limited capacity for attention, and discuss how to optimize
visualizations to avoid squandering this limited resource.
The &lt;a href="http://steveharoz.com/research/attention/papers/Haroz_Whitney_2012_InfoVis.pdf"&gt;paper&lt;/a&gt; includes
all the relevant details, and you should read it. But what really
caught my attention was Haroz&amp;rsquo;s presentation. I don&amp;rsquo;t know if VGTC is recording
these, but if you have the opportunity to see one of his talks, I urge
you to do so: he was clear, persuasive, only explained as much as
needed in a talk (because, really, you should read the paper!), and,
above all, he used the projector to &lt;span class="italic"&gt;show, not tell&lt;/span&gt;. Really
great job. There has been &lt;a href="http://dl.dropbox.com/u/14753707/index.html"&gt;some discussion&lt;/a&gt; on how to interpret the
results, and Haroz &lt;a href="http://steveharoz.com/blog/?p=203"&gt;has written a response&lt;/a&gt;. How crazy is it
that these written discussions can now happen during the course of a
conference? We truly live in the future. Also, and I&amp;rsquo;ll try to put it
mildly, dropbox user 14753707: don&amp;rsquo;t act like a spineless coward, and let us
know who you are! This is not how scholarly discourse happens.&lt;/p&gt;&lt;p&gt;The other work I want to highlight of the morning session is Hofmann, Follett, Majumder and
Cook&amp;rsquo;s
&lt;a href="http://visweek.org/visweek/2012/paper/graphical-tests-power-comparison-competing-designs"&gt;paper&lt;/a&gt;
on evaluating visualization designs by appealing to the notion of
&lt;a href="http://en.wikipedia.org/wiki/Statistical_power"&gt;statistical
power&lt;/a&gt;. This is a followup and a direct application of the techniques
in
&lt;a href="http://stat.wharton.upenn.edu/~buja/PAPERS/Wickham-Cook-Hofmann-Buja-IEEE-TransVizCompGraphics_2010-Graphical%20Inference%20for%20Infovis.pdf"&gt;Graphical
Inference for Infovis&lt;/a&gt;, which if you&amp;rsquo;re at all a reader of this blog,
you&amp;rsquo;re probably sick of hearing. But bear with me some more, because
this is great stuff. The original paper presented the groundbreaking
techniques for turning visual tasks into formal statistical
tests. This paper shows how this idea can be used to &lt;span class="italic"&gt;compare
different visualization designs&lt;/span&gt;. The basic idea is this: a
visualization design is good if it is hard to hide the true data among
&amp;ldquo;impostors&amp;rdquo; which are created by sampling bogus data with similar
distributions to the true dataset. This is almost unbelievably simple,
but it turns out to unlock many powerful (and widely studied!)
tools from statistical inference theory directly so then can be used in
visualization evaluation. As this is an area that in my opinion is in
sore lacking of techniques which generalize effectively, this
development by Hofmann and co-authors is hugely exciting.&lt;/p&gt;&lt;p&gt;Other cool papers I saw today included (but were not
limited to!) the work from
&lt;a href="http://www.cs.sunysb.edu/~mueller/papers/HPU_vis_2012.pdf"&gt;Ahmed,
Zheng and Mueller&lt;/a&gt; in leveraging human computation to design better
compositing operators,
&lt;a href="http://research.microsoft.com/en-us/um/people/ycwu/"&gt;Wu, Yuan
and Ma&lt;/a&gt;&amp;rsquo;s paper on non-independence and interaction modeling for
uncertainty displays, and
&lt;a href="http://www.cs.ubc.ca/nest/imager/tr/2012/dsm/"&gt;Sedlmair, Meyer
and Munzner&lt;/a&gt;&amp;rsquo;s work on collecting best practices for design studies,
which are a powerful, popular, and monstrously hard way to do
applied visualization work.&lt;/p&gt;&lt;p&gt;Finally, the day ended with a phenomenal party, organized by
&lt;a href="http://complexdiagrams.com"&gt;Noah Illinsky&lt;/a&gt; in connection with
the
&lt;a href="http://www.linkedin.com/groups/Next-meetups-Sept-25th-Oct-4421544%2ES%2E163816724"&gt;Seattle
Data Visualization Meetup&lt;/a&gt; and graciously sponsored by
&lt;a href="www.tableausoftware.com/"&gt;Tableau Software&lt;/a&gt; (if I&amp;rsquo;m missing
anyone else, please let me know!). This was, as Noah
put it, an attempt to bring together two groups of people with very
similar interests that would otherwise probably not overlap very
much. There were five or six talks ranging from things like &amp;ldquo;why you
should do &lt;span class="italic"&gt;pro bono&lt;/span&gt; visualization work and make the world a
better place&amp;rdquo; to the intersection of art and
visualization (by
&lt;a href="http://www.francescasamsel.com/home_html/HOME.html"&gt;Francesca
Samsel&lt;/a&gt;, who is actually organizing an
&lt;a href="http://visweek.org/visweek/2012/workshop/visweek/scheherazades-toolbox-artists-meet-visualization"&gt;intriguingly
named&lt;/a&gt; workshop on Thursday, to our own
&lt;a href="http://eagereyes.org"&gt;Robert Kosara&lt;/a&gt; talking about story
telling in visualization. This was possibly the best party I&amp;rsquo;ve ever
attended at VisWeek, and I would love for something like it to become
a tradition.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>VisWeek 2012, Sunday</title><link>http://cscheid.net/blog/visweek_2012__sunday</link><guid>http://cscheid.net/blog/visweek_2012__sunday</guid><pubDate>Wed, 17 Oct 2012 04:31:33 -0400</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;At this point, &lt;a href="http://visweek.org"&gt;VisWeek 2012&lt;/a&gt; is just about
halfway done, and there&amp;rsquo;s tons to write about, but I&amp;rsquo;ll try to keep
these short by sticking to one day at a time.
VisWeek now runs four parallel tracks for the best part
of a week, so there&amp;rsquo;s no way I can tell you about everything that
is happening out here in (today, surprisingly sunny!) Seattle. But I
will tell you about what I think is cool. The usual caveats follow:
omissions and mischaracterizations are all my fault.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Sunday&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;VisWeek started on Sunday, and I saw
&lt;a href="http://graphics.stanford.edu/~hanrahan/"&gt;Pat Hanrahan&lt;/a&gt;&amp;rsquo;s LDAV
keynote on &amp;ldquo;Divide and Recombine: an approach for analyzing large
datasets&amp;rdquo;. This was a nice tour through recent work (some from
Hanrahan and his students, and much from other folks) happening on the big data
community. Pat gave a nice overview of the
reasons surrounding all the buzz around NoSQL databases (performance!
flexibility! distributed computation!), and the challenges
for visualization and data analyst researchers and practitioners:
mostly, the toolchain is slow and clunky. And &lt;a href="http://www.cs.unc.edu/~brooks/Toolsmith-CACM.pdf"&gt;computer scientists are
toolsmiths&lt;/a&gt;, so we should make &lt;a href="https://github.com/jtalbot/riposte"&gt;better&lt;/a&gt;
&lt;a href="http://www.datadr.org"&gt;tools&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;It was fun to realize that his talk was really about
&amp;ldquo;satellite issues&amp;rdquo; surrounding the topics which we traditionally
consider visualization, such as: how can we get to the big data
sources? How do we wrangle all that data into something usable for
visualization, and how do we manage (computationally, and organizationally)
all the analyses which we need to use to get meaningful visualizations?
Much of his talk focused
on &lt;a href="http://www.r-project.org"&gt;R&lt;/a&gt;, the current darling language
of data analysts. Interestingly, across the hall to this talk,
&lt;a href="http://visweek.org/visweek/2012/tutorial/visweek/visualizing-data-r-and-ggobi"&gt;a
tutorial on ggplot2 and ggobi&lt;/a&gt; was being presented by Di Cook, Heike Hofmann (of whom I&amp;rsquo;ll
have more to say soon). &lt;a href="http://ggplot2.org"&gt;ggplot2&lt;/a&gt; has been described as the hipster
plotting library for R. But it really does kick ass. Of course,
I&amp;rsquo;m partial to R since I work in the same building as some
&lt;a href="http://www2.research.att.com/~urbanek/"&gt;good&lt;/a&gt;
&lt;a href="http://www.research.att.com/people/Becker_Richard_A/index.html?fbid=Wto_COA2yWM"&gt;folks&lt;/a&gt;
who are deeply involved with it. Nevertheless, it&amp;rsquo;s nice to see the
VisWeek community warming up to it,
&lt;a href="https://twitter.com/FisherDanyel/status/257524247105253376"&gt;warts
and all&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;The other interesting observation Pat Hanrahan made was that
problems which the (for lack of a better term) &amp;ldquo;industrial big data
community&amp;rdquo; tackle have a lot of overlap with the ones which we
tackle at VisWeek. Somewhat inexplicably, there is minimal overlap of the
communities: how many of us hang out at
&lt;a href="http://strataconf.com/stratany2012/?intcmp=il-strata-stny12-franchise-page"&gt;Strata&lt;/a&gt;,
or the many great
&lt;a href="https://www.facebook.com/events/152674571526368/"&gt;visualization&lt;/a&gt;
and &lt;a href="http://www.meetup.com/nyhackr/"&gt;statistical computing&lt;/a&gt;
meetups? It&amp;rsquo;s something to think about.&lt;/p&gt;&lt;p&gt;At the same time, someone mentioned to me tonight that in the user
interface community, there is an analogous situation with
&lt;a href="http://www.ux-lx.com"&gt;UX&lt;/a&gt;
vs. &lt;a href="http://chi2013.acm.org"&gt;CHI&lt;/a&gt;. So is this simply an
indication that visualization is now actually mainstream? In other
words, is this fragmentation a symptom of success or an admission
of defeat? I don&amp;rsquo;t know.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Plug: come see our panel!&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;I have a lot more to say about VisWeek 2012, but the bigger the wall
of text, the more likely you are to go look at
&lt;a href="http://www.reddit.com/r/aww"&gt;cute cats&lt;/a&gt;. So before you leave, let me plug
a panel in which (because unfortunately &lt;a href="http://vgc.poly.edu/~juliana/"&gt;Juliana Freire&lt;/a&gt; can&amp;rsquo;t make the trip) I will be presenting, about
&lt;a href="http://visweek.org/visweek/2012/panel/infovis/reproducible-visualization-research-how-do-we-get-there"&gt;reproducible
visualization research&lt;/a&gt;. This is a topic near and dear to my heart, so
I am really honored (and more than a little scared) to be sharing the
stage with &lt;a href="http://www.csse.monash.edu.au/~tdwyer/"&gt;Tim Dwyer&lt;/a&gt;,
&lt;a href="www.cs.ubc.ca/~tmm/"&gt;Tamara Munzner&lt;/a&gt; and
&lt;a href="http://cs.uchicago.edu/~glk/"&gt;Gordon Kindlmann&lt;/a&gt;. There&amp;rsquo;s few
things I like better than a good argument, so this should be great fun.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>So you want to look at a graph, part 3</title><link>http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_3</link><guid>http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_3</guid><pubDate>Sun, 22 Jul 2012 19:15:12 -0400</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;This series of posts is a tour of the design space of
graph visualization. I&amp;rsquo;ve written about
&lt;a href="http://cscheid.net/blog/so_you_want_to_look_at_a_graph"&gt;graphs
and their properties&lt;/a&gt;, and how the
&lt;a href="http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_1"&gt;encoding
of data into a visual representation is crucial&lt;/a&gt;. In this post, I will
use those ideas to justify the choices behind a classic algorithm for laying
out directed, mostly-acyclic graphs.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Putting what we know to work&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;In a way, this sequence of posts is an attempt to understand how we
turn the process of designing visualizations from mostly art with some
craft, to mostly craft with some art. We do know a few rules which
work, but we don&amp;rsquo;t really know &lt;span class="italic"&gt;how we know them&lt;/span&gt;.&lt;/p&gt;&lt;p&gt;In part 1, I asked: &amp;ldquo;what is in a graph?&amp;rdquo;. In part 2, I asked &amp;ldquo;what
is in a sheet of paper?&amp;rdquo;. It is my view that every visualization design
should start answering those two questions: what are the structures
which we care about in our data? what can we work with?&lt;/p&gt;&lt;p&gt;These questions have little to do with the actual design of a visualization,
but they lead right away to what in my mind is the fundamental axiom of
visualization design: &lt;span class="bold"&gt;A visualization design must match
structures in data to structures in visual encodings.&lt;/span&gt; From here
on out, I will call this &lt;span class="bold"&gt;The Axiom&lt;/span&gt; (tongue possibly in cheek).&lt;/p&gt;&lt;p&gt;Following The Axiom means we identify important structures
about the data domain, study the
medium in which we are going to encode them, and create an
encoding &lt;span class="bold"&gt;in which these match&lt;/span&gt;: the graph properties should be
represented by the closest matching properties in a sheet of paper.&lt;/p&gt;&lt;p&gt;This notion is of course central in Cleveland and McGill&amp;rsquo;s
&lt;a href="https://secure.cs.uvic.ca/twiki/pub/Research/Chisel/ComputationalAestheticsProject/cleveland.pdf"&gt;classic
paper&lt;/a&gt;, and you can also see something like it driving much of what
Bertin wrote about in his
&lt;a href="http://www.amazon.com/Semiology-Graphics-Diagrams-Networks-Maps/dp/1589482611"&gt;seminal
book&lt;/a&gt;. For example, Cleveland and McGill famously showed that
positions along a common scale are a very effective way to portray a
set of real numbers. So if you have a single set of numbers you want
your users to understand, following The Axiom means using a
&lt;a href="http://www.perceptualedge.com/articles/b-eye/dot_plots.pdf"&gt;dot
plot&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Big whoop, right?&lt;/p&gt;&lt;p&gt;But there are other types of structures we would like to portray in
our visualizations. In this post, we will see some of them.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Looking at a DAG&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;To make this exercise simpler, assume that our visualizations will be
node-link diagrams. So your mission is simply to specify positions of
the nodes on a cartesian plane, and edges will be drawn as
straight lines (directed edges will use traditional arrowheads).&lt;/p&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;iframe src="http://cscheid.net/static/20120722/iframe1.html" width="220" height="220" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt; You&amp;rsquo;re given a graph $G$ which happens to have a &amp;ldquo;mostly
acyclic structure&amp;rdquo;. That is, there are either no cycles, or
relatively few of them, such that you expect that the majority of the
interesting properties of $G$ come from its acyclicity. If your
mission were to portray this graph faithfully on paper, what would you
do?  As the simplest possible &amp;ldquo;null plot&amp;rdquo;, this is
a graph whose coordinates are just positions on the unit square chosen
uniformly at random. Aside from (maybe) being able to tell which
vertices appear to have more edges than others, we cannot see
much. But we didn&amp;rsquo;t really expect much to begin with.&lt;/p&gt;&lt;p&gt;Following The Axiom, we already identified one property we want to
preserve, namely the &amp;ldquo;directionality&amp;rdquo; of the graph. This is good, but it
is not very actionable: how do we design a visualization around
that? Going back to part 1, remember that we learned that the
&lt;a href="http://cscheid.net/blog/so_you_want_to_look_at_a_graph"&gt;vertices
of acyclic graphs can be ranked&lt;/a&gt;: we can give every vertex $v$ an
integer $r(v)$ such that if there is a path from vertex $v_1$ to
$v_2$, then $r(v_1) &amp;lt; r(v_2)$.&lt;/p&gt;&lt;p&gt;&lt;div style="clear:both" /&gt;
&lt;span style="float:right;"&gt;&lt;iframe src="http://cscheid.net/static/20120722/iframe2.html" width="220" height="220" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt; Let&amp;rsquo;s set aside the fact that there are many such possible
rankings, and accept for now that a rank ordering is a good
representation of the notion of acyclicity. The question is then: can
we design a visualization that uses it?  One possibility is to use the
rank as one of the coordinates of the node positions. As a direct
consequence, all graph edges will point in the same direction.  We&amp;rsquo;re
still using random numbers for the horizontal position of the
vertices, but it&amp;rsquo;s clear that the ranks are encoding some decent
amount of structure in the graph (of course, if we wanted to know for
sure, we should be using
&lt;a href="http://stat.wharton.upenn.edu/~buja/PAPERS/Wickham-Cook-Hofmann-Buja-IEEE-TransVizCompGraphics_2010-Graphical%20Inference%20for%20Infovis.pdf"&gt;formal
inference methods&lt;/a&gt;, but that&amp;rsquo;s another story).&lt;/p&gt;&lt;p&gt;To decide the horizontal positions of the nodes, there are many
possible solutions. To begin with, we are going to use the Axiom to
state that positions should be unique (&amp;ldquo;every node is different&amp;rdquo; becomes
&amp;ldquo;different nodes should be drawn in different places&amp;rdquo;).&lt;/p&gt;&lt;p&gt;In addition, we will need a new postulate that sounds a little like the
contrapositive of The Axiom. Let&amp;rsquo;s call it &lt;span class="bold"&gt;The Other Axiom:
Everything shown by a visualization should exist in the data&lt;/span&gt;. That
is, if something &amp;ldquo;looks like a feature&amp;rdquo;, then it had better exist in
the data. Of course, what is a feature and what is not a feature can only
be determined by psychological experiments, but let&amp;rsquo;s ignore that important
point for now.&lt;/p&gt;&lt;p&gt;This notion is obviously close to Tufte&amp;rsquo;s principle of maximizing data
ink. From my reading, Tufte advocates maximizing data ink as
economy in service of aesthetics. I, however, want to use the Other
Axiom to try and keep a bijective mapping between data and visual
representation: if the Axiom is violated, then two different datasets
will look the same, and that&amp;rsquo;s a problem. But if the Other Axiom is
violated, then even if the visual mapping is unique, inspecting the
resulting visualization might make you think the data is different
from what it actually is. The Axiom tries to prevent blurred vision;
the Other Axiom, hallucinations.&lt;/p&gt;&lt;p&gt;Since we are using one of the coordinates for the node ranks,
the plot naturally grows an axis perpendicular to
the rank coordinate. We need to assign this remaining coordinate, but
we want to be careful to avoid giving the impression that our
visualization is encoding information in how the edges point one way
or another (when ranks are drawn vertically, edges will generally
point left or right; we want to avoid implying that this direction is meaningful).
In addition, edge crossings are an obvious
visual feature, and since adjacent edges meet at a node, we don&amp;rsquo;t
want to give the impression of &amp;ldquo;ghost nodes&amp;rdquo; by introducing
unnecessary edge crossings. Preferring vertical edges seems to be a sensible way to
convey &amp;ldquo;no additional information&amp;rdquo;, so we will try to position nodes
so that edges are &amp;ldquo;mostly vertical&amp;rdquo;. It also prevents edge crossings.&lt;/p&gt;&lt;p&gt;There are two main problems in this solution. First, the goal of vertical edges clashes directly with the goal of unique node positions. So we need to compromise somehow (and the devil is in the algorithmic details). But just as importantly, what constitutes a &amp;ldquo;natural
arrangement&amp;rdquo; with &amp;ldquo;no additional information&amp;rdquo; is a mix of cultural
and innate characteristics about which we know very little (but there has been
&lt;a href="http://www.cs.brown.edu/people/cziemki/documents/ziemkiewicz10_laws-of-attraction.pdf"&gt;recent work in the area&lt;/a&gt;).&lt;/p&gt;&lt;p&gt;&lt;div style="clear:both" /&gt;
&lt;span style="float:right;"&gt;&lt;iframe src="http://cscheid.net/static/20120722/iframe3.html" width="220" height="220" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt; Leaving aside all those important details, this is what an
algorithm which (roughly) encodes the above principles
generates. Aside from one important step, this is a result from
graphviz&amp;rsquo;s classic
&lt;a href="http://www.graphviz.org/Documentation/TSE93.pdf"&gt;&amp;ldquo;dot&amp;rdquo;
algorithm&lt;/a&gt;. Visually, the main problem with this approach is in the
edges. For simplicity, I have completely ignored the problem of edge
occlusion, but that has led to a violation of The Axiom: in a subgraph
of the kind $a \to b, b \to c, a \to c$, the principles I set above
mean that the edge $a \to c$ will necessarily be obscured. One can
explicitly route edges around nodes, and this is precisely what
&amp;ldquo;dot&amp;rdquo; does; I left that out in my crude drawing.&lt;/p&gt;&lt;p&gt;The paper describing &amp;ldquo;dot&amp;rdquo; is worth, at the very least, skimming over; I like that it sets forth a few visual principles and then the algorithm itself (which is, alas, fairly complicated) is designed around those simple principles. The same kinds of statements can be found in the classic Reingold-Tilford &lt;a href="http://emr.cs.iit.edu/~reingold/tidier-drawings.pdf"&gt;tree drawing algorithm&lt;/a&gt;. I don&amp;rsquo;t think I see this structure in more recent visualization papers. Should we be asking ourselves why?&lt;/p&gt;&lt;p&gt;So there you have it, a very basic graph drawing algorithm distilled
to its very basics: Find some structure you care about (in this case,
acyclicity); find a way to encode it visually, and make sure
your encoding is &lt;span class="bold"&gt;effective&lt;/span&gt; (the Axiom) and &lt;span class="bold"&gt;faithful&lt;/span&gt; (the
Other Axiom).&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Next, undirected graphs&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;Of course, defining the entirety of graph visualization as choosing
node positions in the plane is a gross simplification, and one
which lets us apply the Axioms straightforwardly. It will not always
be this simple, and in this series we will get to more delicate
cases. But before we go there, the next post will discuss (albeit in
less detail) a few popular graph drawing algorithms for undirected
graphs.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>A Javascript question on performance vs. convenience</title><link>http://cscheid.net/blog/a_javascript_question_on_performance_vs__convenience</link><guid>http://cscheid.net/blog/a_javascript_question_on_performance_vs__convenience</guid><pubDate>Tue, 26 Jun 2012 17:19:40 -0400</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;Here&amp;rsquo;s a Javascript-specific software engineering problem I&amp;rsquo;m
considering within &lt;a href="http://cscheid.github.com/facet"&gt;Facet&lt;/a&gt;.
I&amp;rsquo;m trying to decide how (or even whether) to approach type-checking
in the API, and I&amp;rsquo;m looking for input.&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;Since Javascript is dynamically checked, if a user passes a bad value
into a function, the error might only manifest itself much further
down the code. Tracking this error down is a slow and opaque problem:
the error message will typically come from the innards of Facet, which
will confuse users that are not intimately familiar with the library
(at present, anyone but me).&lt;/p&gt;&lt;p&gt;The easy way to solve this problem is to add a strict layer of
type-checking into every function. This works, but carries a runtime
penalty, and good code pays the cost of debugging over and over
again. This is not a problem if the API call is not on the application
hot path, but some calls are unavoidable: anything that happens
per-frame on WebGL should be considered on the hotpath, since spare
cycles can be used for more features. The canonical example of this
type of thing is in
&lt;a href="https://github.com/cscheid/facet/blob/master/src/shade/parameter.js"&gt;Shade.parameter&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;I&amp;rsquo;m leaning towards creating two sets of methods, the
slow, type-checked method, and the fast,
&lt;a href="http://www.hulu.com/watch/115713"&gt;you be careful with dat&lt;/a&gt;
non-type-checked version.  But what&amp;rsquo;s the best way to expose this in
an API?
Is this even hopeful to do robustly and effectively?
I have created an &lt;a href="https://github.com/cscheid/facet/issues/1"&gt;issue on github&lt;/a&gt;. If you&amp;rsquo;ve run into this
type of issue in the past, want to voice an opinion or simply help
out, I&amp;rsquo;d love to hear from you.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>So you want to look at a graph, part 2</title><link>http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_2</link><guid>http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_2</guid><pubDate>Wed, 29 Feb 2012 10:25:46 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;This series of posts is a thorough examination of the design space of
graph visualization
(&lt;a href="http://cscheid.net/blog/so_you_want_to_look_at_a_graph"&gt;Intro&lt;/a&gt;,
&lt;a href="http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_1"&gt;part
1&lt;/a&gt;). In the previous post, we talked about graphs and their
properties.  We will now talk about constraints arising from the
process of transforming our data into a visualization.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;What is in a sheet of paper? Marks&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;img src="/static/20120229/marks.png" /&gt;&lt;/span&gt;
Paper is like external memory. We can make marks on it, and later we
can &lt;span class="bold"&gt;read&lt;/span&gt; marks that we made on a particular spots. We will say
then that visualizations are encodings of data as particular
configurations of marks on paper. The process of &amp;ldquo;reading a
visualization&amp;rdquo;, of getting the dataset back into our heads, is simply
the decoding of the marks of paper into what they mean. I will refrain
from defining a mark precisely: it is only important that the writer
and the reader both agree as to what constitutes one, and that they&amp;rsquo;re
both capable of reading and writing marks. Let&amp;rsquo;s see how far this notion takes
us.&lt;/p&gt;&lt;p&gt;We can draw marks of different shapes, and we use the difference
between the shapes to encode aspects of our data.  Using this idea,
we could just write down a description of a graph in english prose.
If we wanted to &amp;ldquo;visualize&amp;rdquo; the data, we would then literally read
the prose, reconstruct the graph in our heads, and be done. But
this is not a visualization!&lt;/p&gt;&lt;p&gt;If we went ahead with this boneheaded idea, we would clearly be
employing our visual system to read the prose describing the graph,
even though no one in their right minds would describe that encoding
as &amp;ldquo;visual&amp;rdquo;.  One reason for this is we know that the process of
reading prose feels fundamentally different than the process of
looking at a scatterplot, or other abstract graphical depictions.  So
let&amp;rsquo;s constrain our encodings to be &amp;ldquo;graphical&amp;rdquo; in nature. I&amp;rsquo;ll keep
this notion underspecified, but for now think of it as requiring
encodings to only be configurations of dots, lines, circles and their
shapes and positions. This makes the encoding more &amp;ldquo;visual&amp;rdquo;, and
that should be enough.&lt;/p&gt;&lt;p&gt;... or should it?
&lt;div style="clear:both" /&gt;&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Even the simplest possible abstract encoding can go boink&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;img src="/static/20120229/innocent_marks.png" /&gt;&lt;/span&gt;
When there are &lt;span class="bold"&gt;two&lt;/span&gt; dot marks on a piece of paper, it turns out
that we immediately see that these two marks have some distance
between them. Given two marks written on paper, I can then read a real
number, in principle to arbitrarily large precision. Since we know how
to read and write this configuration, we know how to &lt;span class="bold"&gt;encode&lt;/span&gt; a
number as distance between two points.&lt;/p&gt;&lt;p&gt;This two-mark encoding appears to be completely innocent and boring,
but it turns out that we can already get ourselves in deep trouble if
we&amp;rsquo;re not careful! If you&amp;rsquo;ve ever read about Godel encodings, you
should have started feeling uneasy around where I said &amp;ldquo;arbitrarily
large precision&amp;rdquo;. With arbitrarily large precision, I can encode
a number with as many decimal places as I want, and I can define the
encoding of my graph to be, roughly, the ASCII string representing the
graph vertices, edges and properties. Furthermore, this encoding is
lossless.&lt;/p&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;img src="/static/20120229/oh_oh.png" /&gt;&lt;/span&gt; Although
encoding an entire graph as a single distance between two points is
valid. it&amp;rsquo;s clearly ludicrous. What went wrong?  Remember that we
decided the prose encoding was bad because &amp;ldquo;it wasn&amp;rsquo;t visual&amp;rdquo;. Now
this new encoding of ours is (superficially) visual, but it feels
similarly bad to the prose encoding. One thing the two encodings have
in common is that the part of the decoding process that confers
&amp;ldquo;graphitude&amp;rdquo; to the data does not seem to come from our vision
system, but from some other part of the brain. We are using
arbitrarily small differences in distances to distinguish potentially
arbitrarily large differences in graphs, this encoding needs
&amp;ldquo;additional decoding&amp;rdquo;. Somehow, there are these visual bits (in this
case, a distance) which get sent to other parts of the brain for
further interpretation. And this indirectness is precisely what is bad
about the encoding.&lt;/p&gt;&lt;p&gt;&lt;span class="bold"&gt;A good visual encoding is &amp;ldquo;direct&amp;rdquo;&lt;/span&gt;. Such encodings get their
&amp;ldquo;meaning&amp;rdquo; straight from the vision system, without requiring an
explicit &amp;ldquo;reading&amp;rdquo;. This notion should be familiar to you, if you
read Bertin before: it is related to his ideas of &amp;ldquo;perception of
correspondences&amp;rdquo; and &amp;ldquo;retinal legibility&amp;rdquo; in Semiology of
Graphics.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;How is any of this at all relevant?&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;At this point, you may be thinking that this entire discussion
is a gigantic waste of time, a treatise in picking nits. But the fact
of the matter is that arguing about the effectiveness of different
visualization techniques &lt;span class="bold"&gt;is exactly&lt;/span&gt; arguing about encoding
choices. And if we ever hope our theoretical arguments to be valid
regardless of which encoding we use to examine, we need to be able to
articulate why stupid encodings like the above are, in fact, stupid.&lt;/p&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;img src="/static/20120229/chernoff_faces.png" /&gt;&lt;/span&gt;And
even armed with the simplest of the observations above,
we can already make some nontrivial statements.
If you&amp;rsquo;ve ever heard about Chernoff faces and
wondered why they&amp;rsquo;re a terrible idea, worry no more. Remember that the idea behind
Chernoff faces is that since we&amp;rsquo;re incredibly good at face
recognition, then we should encode different attributes of our data as
different aspects of a face. (If you&amp;rsquo;ve never heard about them before:
yes, they are hilariously bad. But I assure you they were proposed
completely seriously.)&lt;/p&gt;&lt;p&gt;So why are Chernoff faces bad? It&amp;rsquo;s simple: although recognizing
different faces and telling them apart is something we do thousands of
times a day, we almost never think &amp;ldquo;yes, George Clooney&amp;rsquo;s face is
different from Julia Roberts&amp;rsquo;s face because his eyes are obviously
12.3% larger, his ears are 15.7% smaller, and his nose is half as
hooked.&amp;rdquo;  In fact, Chris Morris, David Ebert and Penny Rheingans have
experimentally confirmed this:
&lt;a href="http://www.research.ibm.com/people/c/cjmorris/publications/Chernoff_990402.pdf"&gt;Chernoff faces are not pre-attentive&lt;/a&gt;.  The important point is that
even though we can, given enough time, make precise judgements
of face proportions, the values don&amp;rsquo;t jump at us: we have to
&lt;span class="bold"&gt;read&lt;/span&gt; faces in the same way we would read numbers from a
spreadsheet. And, if the encoding forces me to read, then it&amp;rsquo;s not a
visual encoding at all. If they were visual, we&amp;rsquo;d just stick with spreadsheet rows
in the first place!&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Next&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;The above argument means that, right now, settling almost every
question of which visual encodings are good and which are bad (and at
what) needs the heavy lifting of experiments in perceptual
psychology. Morris, Ebert and Rheingans&amp;rsquo;s study is important, and
appears to answer that one question definitively. But we would like to
have a theory which would explain, ahead of time, why Chernoff faces
are bad, and many other ideas we might have, without needing to
recruit 500 people on Mechanical Turk. I might come back to this later
on in the series, when we have enough theory under our belts to
actually say something about Chernoff faces.
Still, a lot of work has been done in evaluating visual encodings in
isolation, and we will have to recap the most important ones.&lt;/p&gt;&lt;p&gt;Next: what do we know about good visual encodings, and can we use
these encodings directly for graph visualization? If not, &lt;span class="bold"&gt;why
not&lt;/span&gt;?&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>HCL color space blues</title><link>http://cscheid.net/blog/hcl_color_space_blues</link><guid>http://cscheid.net/blog/hcl_color_space_blues</guid><pubDate>Thu, 16 Feb 2012 01:20:06 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;I&amp;rsquo;ve been playing around with the HCL color space. HCL, if you&amp;rsquo;ve
never heard of it before, is a color space that tries to combine the
advantages of perceptual uniformity of Luv, and the simplicity of
specification of HSV and HSL. HCL is an improvement over HSV and HSL,
but it is not exactly ideal: there is a nasty discontinuity at some
bits of the transformation! I have been trying to find a way around
this, but I&amp;rsquo;m stumped. Let me explain, and maybe you can help me.&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;iframe src="http://cscheid.net/static/20120216/xyz_frame.html" width="400" height="480" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt;The transformation from RGB to HCL is somewhat complicated, and
involves two intermediate color spaces,
&lt;a href="http://en.wikipedia.org/wiki/CIE_1931_color_space"&gt;CIEXYZ&lt;/a&gt; and
&lt;a href="http://en.wikipedia.org/wiki/CIELUV"&gt;CIELUV&lt;/a&gt;.
Going from RGB to XYZ is a simple matrix transformation: $(x,y,z) = M
. (r,g,b)$. For arcane reasons, there are many possible matrices: the
one most relevant nowadays is the
&lt;a href="http://www.brucelindbloom.com/index.html?Eqn_XYZ_to_RGB.html"&gt;sRGB/D65
matrix&lt;/a&gt;. This is a linear transformation designed to make a
&amp;ldquo;brightness&amp;rdquo; coordinate, Y, while encoding the rest of the space in
the other coordinates by roughly mapping them to &amp;ldquo;red&amp;rdquo; and &amp;ldquo;blue&amp;rdquo;
stimuli.&lt;/p&gt;&lt;p&gt;&lt;div style="clear:both" /&gt;
&lt;span style="float:left;"&gt;&lt;iframe src="http://cscheid.net/static/20120216/luv_frame.html" width="400" height="480" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt;To go from XYZ to CIELUV, things are a bit more complicated: this is
the bit that tries to match the physiology of a typical human vision
system, which is much better at telling shades
of yellow and green apart than it is at telling shades of blue
apart. The &lt;a href="http://en.wikipedia.org/wiki/CIELUV"&gt;full
transformation&lt;/a&gt; behaves nonlinearly, and tries to make the euclidean
distance in CIELUV correspond roughly to perceptual differences. In
this space, L encodes the lightness of the color, or how bright it is,
and uv encodes the chromaticity portion: the particular tint or shade
of the color.&lt;/p&gt;&lt;p&gt;&lt;div style="clear:both" /&gt;
&lt;span style="float:left;"&gt;&lt;iframe src="http://cscheid.net/static/20120216/hcl_frame.html" width="400" height="480" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt;Finally, HCL is then obtained by simply transforming the UV
coordinates of Luv to polar coordinates. The phase is interpreted as
&lt;span class="bold"&gt;hue&lt;/span&gt;, and the length of the vector as &amp;ldquo;saturation&amp;rdquo;
(specifically, it&amp;rsquo;s then called &lt;span class="bold"&gt;chroma&lt;/span&gt;).&lt;/p&gt;&lt;p&gt;The goal of HCL is to be perceptually uniform along its axis, and so
the thing to notice is how the apparent brightness of the colors all
appear roughly the same for any given slider setting; and while moving
along the horizontal axis changes the hue of the color, it doesn&amp;rsquo;t
change the perceived lightness or saturation. Compare this with the
HSV colorspace.&lt;/p&gt;&lt;p&gt;&lt;div style="clear:both" /&gt;
&lt;span style="float:left;"&gt;&lt;iframe src="http://cscheid.net/static/20120216/hsv_frame.html" width="400" height="480" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt;So you can play with these color spaces, I&amp;rsquo;ve written a few little
demos of the color spaces using
&lt;a href="http://cscheid.github.com/facet/"&gt;Facet&lt;/a&gt;. The sliders control
the axes which resemble brightness, and the image then shows a slice
of the resulting parameter space. You will need WebGL and Chrome for
these to work (sorry!). Pay attention to the boundary of the gamut.&lt;/p&gt;&lt;p&gt;One of the great conveniences of HSV is that no matter what you do in
HSV, you will end up somewhere inside the (0,0,0)-(1,1,1) cube of
valid RGB colors. That means nothing too strange happens.&lt;/p&gt;&lt;p&gt;On the other hand, if you play a bit with the LUV and HCL colorspaces
in low luminances, you will see a discontinuity in the
conversion. Although it happens outside the RGB gamut, it is still
quite annoying: some paths through HCL are cut off in RGB. The issue
happens when clamping the values from outside of the gamut back into
(0,0,0)-(1,1,1). This is what I would like to solve: is there a simple
way to create a (clamped) conversion from HCL to RGB that is
continuous and reasonable?&lt;/p&gt;&lt;p&gt;The procedure that is used in the
&lt;a href="http://cran.r-project.org/web/packages/colorspace/index.html"&gt;R
package for colorspace management&lt;/a&gt; is the one I&amp;rsquo;m currently using in
the demo above: after converting from HCL to a value, we find the
closest point to the raw conversion that is inside the RGB cube.&lt;/p&gt;&lt;p&gt;Here&amp;rsquo;s a different approach that &lt;span class="bold"&gt;is&lt;/span&gt; continuous: instead of
converting the color $c$, we instead search for the closest color in
HCL space $c&amp;rsquo;$, which converts to a value inside the RGB gamut. Now
the problem is: how do we actually find such a transformation
efficiently? It&amp;rsquo;s easy to see that if $c$ goes outside the RGB gamut,
then $c&amp;rsquo;$ will be on the boundary of the gamut. So this is &amp;ldquo;merely&amp;rdquo;
a two-dimensional search problem. Except that the boundary of HCL or
CIELUV in RGB space is complicated. So we&amp;rsquo;re looking for the minimum
of a function constrained to a complicated 2D surface, and I don&amp;rsquo;t
think there&amp;rsquo;s any simple algorithm to do this.&lt;/p&gt;&lt;p&gt;Or is there?&lt;/p&gt;&lt;p&gt;&lt;div style="clear:both" /&gt;&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Acknowledgements&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;Thanks to &lt;a href="http://had.co.nz/"&gt;Hadley Wickham&lt;/a&gt; for teaching me
about HCL, whose &lt;a href="http://had.co.nz/ggplot/"&gt;ggplot&lt;/a&gt; library
uses that color space extensively. This post grew out of trying to
make continuous HCL scales easier to specify.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>So you want to look at a graph, part 1</title><link>http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_1</link><guid>http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_1</guid><pubDate>Wed, 25 Jan 2012 08:34:01 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;This series of posts is a tour through of the design space of graph
visualization. As I promised, I will do my best to objectively justify
as many visualization decisions as I can.  This means we will have to
go slow; I won&amp;rsquo;t even draw anything today!  In this post, I will only
take the very first step: all we will do is think about graphs, and
what might be interesting about them.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;What is in a graph?&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;A graph $G$ has two things: a set of &lt;span class="bold"&gt;vertices&lt;/span&gt; $V$, and a set of
&lt;span class="bold"&gt;edges&lt;/span&gt; $E$, where each edge is represented by an ordered pair of
distinct vertices (so in this definition we will not have multiple
edges and &amp;ldquo;self-edges&amp;rdquo;). To denote that $(a, b)$ is in $E$, I will
use $a \to b$.&lt;/p&gt;&lt;p&gt;Usually, we also have a mapping $v_\textrm{attr}$
from $V$ to some other space $V_A$. This gives us attributes of these
vertices (names of the people in your social network, names of the
computers in your intranet, etc.). A similar mapping $e_\textrm{attr}$
from $E$ to $E_A$ does the same for edges (is $b$ married to $c$ or
does $b$ work for $c$? How far is $h$ from $j$?, etc.).&lt;/p&gt;&lt;p&gt;These define a graph, but they don&amp;rsquo;t say much of what is interesting
about them. So let&amp;rsquo;s list some properties of (these very general)
graphs. By explicitly thinking about them, we can see the impact they
will have on our choices of pictures.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Graphs are directed or undirected&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;One important characteristic of graphs is whether they are
&lt;span class="bold"&gt;directed&lt;/span&gt; or &lt;span class="bold"&gt;undirected&lt;/span&gt;. When we say that a graph is
undirected, we mean that whenever $(a,b) \in E$, it is implied
that $(b,a) \in E$: in other words, the has-edge relation is
symmetric, and $e_\textrm{attr}((a,b)) = e_\textrm{attr}((b,a))$.
(For undirected graphs, I will write $a &amp;ndash; b$ to mean
that both $(a, b) \in E$ and $(b, a) \in E$ are true).  Otherwise, we
say that $G$ is directed.&lt;/p&gt;&lt;p&gt;This distinction is important because, remember, the first rule of
visualization is &amp;ldquo;draw all there is, but no more&amp;rdquo;. If our graph is
such that $a \to b$ does not imply $b \to a$, our visualization of it
better not imply that the relationship between $a$ and $b$
look symmetric. Of course, &amp;ldquo;making the relationship look symmetric&amp;rdquo;
is not a formal statement, and we might argue about what it
really says. But this is what I meant about the
difference between a formal systematization and an &amp;ldquo;informal&amp;rdquo; one:
we should not disconsider the notion simply because we don&amp;rsquo;t know how
to formalize it! And, as we will see, I believe this distinction
&lt;span class="bold"&gt;does&lt;/span&gt; guide the visualization choice.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Graphs have paths&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;An edge $a \to b$ in a graph implies some sort of connection between
$a$ and $b$, and we typically think of these connections being
&lt;span class="bold"&gt;transitive&lt;/span&gt;. So if $a \to b$ and $b \to c$ encode some
relationship, we tend to think of there existing some relationship
between $a$ and $c$ as well (we will say $a \leadsto b$ to say that
there exists some path $a \to \ldots \to b$).&lt;/p&gt;&lt;p&gt;This reveals another interesting property of graphs. Let&amp;rsquo;s say you send
the elements of $V$ into new sets, such
that whenever $a \leadsto b$ and $b \leadsto a$, $a$ and
$b$ must go into the same set. Then, every element of $V$ ends up in exactly
one new set. These sets form a &lt;span class="bold"&gt;partition&lt;/span&gt; (into &amp;ldquo;strongly
connected components&amp;rdquo;, SCCs). Natural partitions like this
are your data&amp;rsquo;s way of telling you to consider divide-and-conquer. If
you think paths are important (implying that SCCs are important as well), then
your resulting visualization should be &amp;ldquo;partition-preserving&amp;rdquo;
too: 1) your visualization should have the ability to visually
represent a partition of vertices (call it a &amp;ldquo;visual partition&amp;rdquo;) and
2) iff $a$ and $b$ are in the same partition, then the visualization
of $G$ should put $a$ and $b$ in the same &amp;ldquo;visual partition&amp;rdquo;.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Paths have cycles&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;We will call a path $a \leadsto a$ which does not repeat internal
vertices a &lt;span class="bold"&gt;cycle&lt;/span&gt; (and we will require that cycles in undirected
graphs have at least three two internal vertices). A directed graph
with no cycles is a dag (&amp;ldquo;directed acyclic graph&amp;rdquo;) and an undirected
graph with no cycles is a tree.&lt;/p&gt;&lt;p&gt;Vertices of a dag can be assigned natural numbers such that for every
pair of vertices $a$ and $b$ such that $a \leadsto b$, $f(a) &amp;lt; f(b)$. If your paths encode
dependencies, this assignment of numbers &lt;span class="bold"&gt;ranks&lt;/span&gt; the
dependencies, and is good information to have around.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Many undirected graphs have a metric structure&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;The final structure I want to mention is the &lt;span class="bold"&gt;metric
structure&lt;/span&gt;. For some undirected graphs, there is a very natural way to
come up with a distance function between two vertices such it
resembles the familiar distances in plain old
two- and three-dimensional space. Our eyes are reasonably good at
distance judgements (yes, that&amp;rsquo;s somewhat controversial because of
optical illusions and such. But if we are sensitive to these issues, I
believe we can use Cleveland to back the statement.)&lt;/p&gt;&lt;p&gt;Anyway, a function $d: V \times V \to R$ is a &lt;span class="bold"&gt;metric&lt;/span&gt; if:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;$d(a, b) \ge 0$, with equality iff $a = b$.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;$d(a, b) = d(b, a)$&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;$d(a, c) \le d(a, b) + d(b, c)$, for all $b$&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;(Assume that the graph is connected for now; any pair of vertices
(a,b) is such that $a \leadsto b$ or $b \leadsto a$.) If one of the
attributes of undirected graph edges is a positive &lt;span class="bold"&gt;weight&lt;/span&gt;
associated with each edge, then the standard metric to assign to a
graph is the &lt;span class="bold"&gt;shortest-path metric&lt;/span&gt;, where we say that the
distance $d(a, b)$ is given by the smallest cost of a path, this cost
being the sum of the edge weights along the path.&lt;/p&gt;&lt;p&gt;&amp;ldquo;But what if my graph has negative edge attributes?&amp;rdquo;, you ask. Good
question!  Then you simply can&amp;rsquo;t use a metric to describe that
particular attribute of your graph. And slightly less trivially, if your
visualization technique implies that your graph obeys some metric, then
it is telling a lie. As a preview of the next few posts, this
&amp;ldquo;metric-friendliness&amp;rdquo; will be a crucial distinction between network
diagrams and matrix diagrams.&lt;/p&gt;&lt;p&gt;Next up, I will talk about 2D space; a sheet of blank paper where we
get to write. Then we will put those things together, and bam,
&lt;span class="bold"&gt;visualization&lt;/span&gt;.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Previous posts&lt;/h2&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="http://cscheid.net/blog/so_you_want_to_look_at_a_graph"&gt;Series
introduction&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;</description></item></channel></rss>