Her dog was by her side; a blur of gray fur with a red bandana draped loosely at the neck. Should I ask her? She’d probably think I’m hitting on her? Or at least think it’s some sort of scam. I guess that’s what I’d think. After rattling around in my head, the question had finally taken shape:
“Would you mind taking a picture of this goose and emailing it to me?”
The goose was nestled at the edge of the cliff, mostly obscured in the long grass. It was gazing out over the ocean with an expression of pure… just kidding, I had no idea what it was thinking. Geese are notoriously aggressive; ready to hiss and stomp and chase at a moment’s notice. Canada Geese have been known to risk death to protect their offspring.
But this goose was… pensive? It looked over the strait and into the misty distance where it merged with the mottled, inky-gray sky. Far below, I could just about spot two ducks on the beach.
The woman had reached me. I opened my mouth and before I could say anything she smiled. I thought of how I looked. Some random guy in running clothes, slightly damp from the rain, staring at a goose, staring at the ocean. My agape mouth morphed into a smile and before I knew it she was past me. I had missed my chance.
When I run I don’t carry my phone. My internet-addled brain could do with some time away from screens, from notifications, from little red dots demanding my attention. I get a bit of flak for this, especially on runs where it’s getting dark and I’m a little lost and I’m back much later than I said I would be.
This does mean I miss out on taking photos of weird little scenes like the goose. Here’s my artist’s impression of the scene. I hope it brings some of that magic moment to life.
Humans are prone to spot patterns in the noise of their day-to-day life, it’s called frequency bias. I thought back to all the goose encounters I’d had this year – I was suddenly much more aware of them than, perhaps, ever in my life.
I had seen them blocking a patient line of traffic with an adorable gaggle of goslings. Plonked defiantly on the top of a bus as a driver struggled to chase them away. I had been surprised by one hissing at me as I ran too close on a path.
The Goose on the Cliff was an archetypal Canada Goose. You’re probably already picturing its black and white head and muddy gray body planted into the ground with stocky, scaly feet. The Provincial Museum handbook “The Birds of British Columbia: №6 Waterfowl,” first printed in 1958, reliably informs me that there are at least four subspecies of Canada Goose (Branta canadensis):
but I’d be damned if I could tell you the difference.
Known, in part, for the impact of their defecation on otherwise beautiful, swimmable lakes, the Canada Goose doesn’t always migrate. Environmental changes such as losing a nest can lead them to fly thousands of miles with surprising alacrity: some have been known to fly 1,500 miles in just 24 hours. A protected species, they can be found across all the provinces and territories in Canada.
The grass around The Goose on the Cliff was flattened. Perhaps it was nesting and set to be joined by a number of offspring in the coming days.
Canada Goose young are “precocial”, which means they are born in an advanced state, ready to forage and feed themselves on tender shoots and insects. I would soon see many goslings in the nearby park, enlarged ducklings and tufty teens.
Within a few months they will be ready to migrate with their parents.
My goose awakening of 2022 was kicked off by a surprising first appearance.
As a recent transplant to the West Coast the menagerie of ocean life is still novel to me. Seals carrying salmon home to their litter, squid picked up by their beak by a beach bum in a baseball cap, seagulls picking up crabs and dropping them from great heights to smash their shells to access the tasty treat inside, starfish in shallow tide pools…
One species I have a slightly repulsed, distant respect for are the prehistoric barnacles I see clinging to the rocks at low tide.
Barnacles start their life as larvae floating in the ocean, seeking a spot to land and make their home. Once ready they secrete a cement to affix themselves to a rock, boat or animal. The creature then forms a calcite shell made up of a collection of interlocking plates. These are adjustable, acting as a sort of garage door that opens or closes depending on the tide or if a predator is near.
After a hike on beautiful Sandcut beach on the south-west side of Vancouver Island, I was inspired to flick through another British Columbia Provincial Museum handbook on barnacles (№7). Therein I spotted an illustration of the exact stalked barnacle I had found on the beach: Goose or gooseneck barnacles. Eaten by First Nations communities for centuries and considered a delicacy in Europe, they make a prehistoric, otherworldly impression with their flesh and scales.
I assumed the naming was simply borne of a visual similarity – the barnacles are indeed reminiscent of the gray and white neck of the goose. But in 12th century Europe the ties were deeper: the nesting and migratory patterns of the Brant Goose led observers to the dubious logical conclusion that the two had to be one and the same creature.
Page 8 of the barnacle handbook contained a similar illustration from the same era, depicting the curious growth of geese as fruit from the branches of some sort of fabled barnacle tree. This created confusion about whether geese should be classified as plant or animal: something of intense religious importance to dietary observances on holy days. If a goose was basically a tree then surely it could be eaten on days that meat was forbidden?
Charles Darwin of “On the Origin of Species” fame, more recently of source/target fame, originally dedicated his time to researching the varieties of our humble barnacle. His work in 1859 ultimately severed the barnacle ←→ goose connection.Thankfully, we still have lots of medieval art to show for it.
There’s a name for when barnacles and other microorganisms attach themselves to human vessels or structures: biofouling. Unsurprisingly for a human word, there’s an implicit conflict in the name. These creatures are causing costly drag in boats and ruining our nice paintwork. The term leaves no room for the reality that we’re just passing through a world that barnacles have been cementing themselves to for over three-hundred million years.
Ocean scientist Juli Berwald’s article The Web of Life turns a critical eye to what is popularly known as the Tree of Life, derived from Darwin’s work.
Staring at the figure, something clicked for me: every individual and every population contains many genes, each with its own evolutionary history and gene tree. All those gene trees will always result in disagreements, in fuzziness, in reticulation. Vernon had been looking at the forest every time he looked at an individual.
… today’s corals are a product of Darwin’s classical natural selection when [ocean] currents are slack, and of hybridisation when they are strong. Species separate and merge, and more so over long expanses of time and space.
[What we think of as] “species” are not units, they are bits of a continua. What comprises a “bit” is arbitrary – a taxonomist’s opinion. Arbitrary, and in truth forever variable in space and time.
Some barnacles are destined for bigger things. Whale barnacles are an elusive form of acorn barnacle that manage to affix themselves to a convenient part of their namesake mammal. By alighting on the chin or forehead of whales they’re in prime position to collect plankton from their gargantuan host.
Whales have been known to accumulate 450kg of barnacles – a mass that seems significant but pales in comparison to the weight of the whale itself.
Whale barnacles can grow to be the size of a small orange and we’re unable to collect specimens without damaging the whale’s skin. Humans have therefore struggled to research whale barnacles; in the rare cases where live specimens are found on beached whales, researchers have only managed to keep them alive for a few short weeks.
“If you’re into barnacles, they’re pretty extraordinary”
— Michael Moore (veterinary scientist at Woods Hole Oceanographic Institution)
This doesn’t mean that scientists haven’t been creative. In one fascinating study, researchers took microscopic drill bits to the calcium shell of whale barnacle fossils in order to extract samples from a range of different depths.
Just like the rings of a tree trunk show the relative growth of the tree, the biological makeup of these samples reveal a faint but definitive backstory of whale migration. A “well-preserved whale barnacle is the perfect time-traveling tracking device”:
Here’s why that’s cool: when in their evolutionary history baleen whales started migrating remains an open question.
One hypothesis suggests that it happened around three million years ago, when massive ice sheets started spreading across much of the northern hemisphere. The colder temperatures would have frozen whales out of some of their habitats and put more constraints on where plankton could flourish in Earth’s oceans. And that would encourage the whales to start making longer and more directed journeys to seek out shelter and food.
A fossilized whale barnacle is a unique window into this behavior in a whale that’s been dead for hundreds of thousands of years.
The Goose on the Cliff was keeping a beady eye on the passing dog, its neck contorted in an impossibly uncomfortable kink. We weren’t far from where I’d seen a pod of whales one sunrise late last year.
What leads you to find a place to settle, to put down cement roots and attach yourself to something, somewhere?
And what brings you to build your plates, your armor, your exterior shell, hardened and tessellated shut, in preparation for the oncoming… what?
Perhaps you’re the boat, untended and unmaintained, carting around extra weight you neither appreciate or benefit from?
Or are you the whale, encrusted and entrusted to bear an undeniable burden, a secret pact in the ocean’s depths.
This week’s edition was inspired in part by Undrowned by Alexis Pauline Gumbs and Busy Doing Nothing by Rekka Bellum and Devine Lu Linvega. I recommend them both.
After two and a half years, source/target is taking a few months off for summer – I’ll see you again in September.
]]>When you think about a process or task, how does it appear in your brain?
You’ll often have an intuitive understanding of the dependencies in a process but what happens when you want to share and – gasp – communicate those processes with others? What tools do you reach for?
In the vein of “the best camera is the one you have with you” many opt for tools they have on hand – usually something ubiquitous like Excel. Powerpoint is another common foil for this, the lure of the blank canvas, obvious shapes and flexible positioning.
Fatigued productivity pros advocate for stripping back to the basics and leaning heavily on plain text. Writing lists, perhaps even nested lists replete with ascii arrows -->
, is portable and even creative (if you like typing) but not suitable for sharing or collaboration.
Web-based diagramming tools reduce the friction for getting processes out of your brain and onto the page. These vary from the enterprise-leaning draw.io, excellently sketchy (as in hand-drawn) Excalidraw to those that lean heavily on sometimes arcane domain-specific language like Mermaid.JS or the perennial favorite, Graphviz.
flowchart.fun is a bit different. It combines the accessibility of drag-and-drop tools with the power of text-based, wrapped in a shareable webapp.
I’ve been a fan of flowchart.fun for a while and I reached out to its creator, Rob Gordon, aka Tone Row to see if he would be interested in answering a few questions on the project. I can tell from flowchart.fun that real care and consideration had been put into its development and this was pleasantly confirmed by Rob’s inspirations and motivations. What follows is an email exchange between us both, lightly edited for clarity. I hope you enjoy it.
Tell us a little bit about your background, how did you find yourself making tools for the web?
I don’t have a background in computer science or math. I went to art school and studied music composition. My main interest in learning to develop software was to make tools for computer-aided algorithmic composition. Only by some meandering process did I end up building tools for the web.
There’s a certain joviality to the name Tone Row and smiley face logo – to me they introduce a collection of friendly tools built with care. Is there a backstory to them?
I definitely don’t want to take myself too seriously! The smiley face was a drawing by my wife, who is a graphic designer and who is also the only reason flowchart.fun visually passes as an app (Merci Camille!); I’m an abysmal designer.
I knew I wanted a vague-sounding name (think, ACME corporation) so I wrote a small generator and fed in words I liked and out popped Tone Row. Incidentally tone row, despite sounding vague, has an historical meaning in music composition. Here’s the wiki, but roughly-speaking it’s a technique of composing where you force yourself to use all 12 notes in the scale once before using them again. I find some pieces more listenable than others, but what resonates with me is the idea of approaching a familiar and complex problem like music composition with a surprisingly simple, repeatable solution.
That’s fascinating, I had no idea that a tone row was an actual thing. Isn’t it surprising that adding constraints can lead to greater creativity rather than limiting it?
In the world of diagramming tools, I think there’s a massive divide between drag-and-drop tools (draw.io, excalidraw, etc.) and diagram libraries for developers. Would you agree that flowchart.fun seems to occupy a sweet spot between the two worlds?
I haven’t framed it like that before, but that sums up how it came to be. One morning I needed to make a flowchart and couldn’t find the right tool for the job. I wanted to type (being developer-y and whatnot) and get a graph in return. I thought of the basic syntax immediately and already had some experience with cytoscape.js. About 30 minutes later I had a working prototype.
As to whether flowchart.fun lives up to being the “sweet spot” between the two, I would say it’s in the spot but I’m hoping to make it a lot sweeter. There are some impressive features at each end of the spectrum you mention (from draw.io to mermaid.js) Visual diagram editors easily let you customize everything: arrowheads, edges, colors, etc, while Mermaid.js can generate everything from swimlanes to pie charts.
Despite lacking many of these features, I think flowchart.fun distinguishes itself in a way that’s a bit difficult to describe. My best attempt is, the flowchart.fun feature-set is a bit of a “greatest common denominator”. The syntax is less capable than Mermaid.js but is (in my opinion, of course) easier to wrap one’s head around more quickly. Owing to and abetting that is the immediate feedback loop of getting a visualization in real-time as one is typing. Another way of describing it is that the syntax tries to be close to natural language, e.g.,
this
belongs to: that
Along with animated, immediate feedback (passed through the graph layout algorithms of people much smarter than me!) it really becomes a kind of thinking-aid, or a tool to help organize thoughts. I’m digressing into “second brain” rhetoric so I’ll leave it at that, but I do think that is where flowchart.fun achieves something unique in the spectrum of available tools. I’m hoping to really dig into and enhance that quality in 2022.
The immediate feedback loop is an important piece! I wrote a little about that last edition. Using flowchart.fun truly clicks the first time you write a word and see it added with a smooth animation. You’re right that makes it a powerful aid for thinking.
I have a lot of experience with musical interfaces (midi controllers with lots of knobs and sliders). They generally have the requirement of processing input in real-time. I think the idea that fast feedback loops can help with creativity and creation is definitely part of what underpins flowchart.fun.
You might be interested in work by Information Designer, Duncan Geere, he often writes about the intersection of music and data visualization, often through his work with Miriam Quick on Loud Numbers.
In your words: what’s the draw to helping people build flowcharts?
If I’m honest, I don’t wake up everyday thinking, “how can I help people build flowcharts today?”
Ha! You’re right, that would be a bit weird. You mean helping people make flowcharts isn’t your sole reason for existence? How reasonable!
I’m attracted to the process as a whole: putting a tool into the world, seeing how people use it and respond to it, absorbing that and channeling it back into the honing of the tool. Flowchart.fun has ended up being a great project for me to practice both sharing what I create, receiving feedback, and trying to build a community around it along the way.
From a broader lense, flowchart.fun (like everything in the spectrum of tools you mentioned) exists because of this ever-growing role of information in our lives. I think as a result, we as a species (zooming out here, haha) need ways to make information more precise/expressive– a kind of signal-to-noise effect is in play. If flowchart.fun can help people arrive at and/or express ideas more clearly and concisely then that makes me happy!
What’s remarkable to me is that even the functionality added in your first 30 minutes of work has unlocked countless hours of thinking for people around the world. Little websites can still be fun! And helpful!
What is the most inspiring use of flowchart.fun you’ve seen?
The latest use-case that struck me as a “no-brainer” is someone who uses it for decision trees for the technical support team on their product. One of the app’s biggest fans is a German with a passion for a type of chart I had never heard of, a Wardley map, that seems to be primarily used by the government in the UK.
It’s also been making the rounds in Law Student Twitter recently, though I haven’t seen an example of exactly how it’s used by people studying law. All of that to say more than any individual use-case, I’m really blown away by the diversity of users and use-cases. I’m really glad I made the decision to translate the app into other languages. I think that’s made it more successful for a global audience. (I already had a passion for looking at fonts, but now I get to look at fonts in other languages as well and I absolutely love it).
Wardley maps look interesting, a sort of graph layout where the positions of nodes are plotted against a continuous scale. When using most force-directed graph layouts, the reasoning behind the positions of nodes and links can be a little mysterious. Grounding those positions in an axis is a smart way to help communicate more information.
Open Source developers contribute an inestimable amount of value to our collective technology stack but often struggle with knotty problems like independence, longevity and self-sufficiency. What are your plans for sustainable, supported development?
This question is especially relevant for me because I recently decided to leave my job to pursue Tone Row full time. I was inspired by the success of people like Steve Ruizok (tldraw) and Matt Perry (motion.dev) in leveraging Github sponsorship to support their open source work, so I tried that first without much luck. I added the $1/month sponsorship features in November and that’s been more successful, although when I decided to charge one dollar per month I hadn’t really considered that Stripe takes 33 cents per transaction and the other third will go to taxes… honestly, my severe lack of business acumen never fails to make me laugh.
My plan is to keep adding features that make sponsorship more appealing throughout 2022. There will be new themes, new layouts, some additions to the syntax that enable more complex graphs, and better ways to share and distribute charts– not to mention other apps!
What are you most excited about working on next?
I’ve had a deep appreciation for building design systems / component libraries for a long time. I’m hoping to innovate a little bit in that space and hopefully build more of a reputation for Tone Row at the same time. I also might have another flowchart–adjacent productivity tool up my sleeve. Too many ideas, not enough time.
I hear you! Thanks so much for your time, Rob, this was great.
You can try out flowchart.fun here and follow Rob’s progress on Twitter. Oh and those of you who are web-development-inclined, check out his amazing CSS gradient generator tool.
Do you want to be featured in a future edition? Perhaps you know of a project or person who would love to share their story of working with networks? Hit reply and let me know.
Until next time!
]]>It had been a whirlwind week for Conor Ratliff. He’d flown across the country for one final meeting with the Director of a new HBO show, Band of Brothers. This was unusual, he had been cast as the character John S. Zielinski – surely it was a done deal?
Ratliff has been in numerous popular projects, even mainstream shows like The Marvelous Mrs. Maisel, but the rejection gnawed at him enough to instigate the inevitable: start a podcast. Rather than a bitter character assassination of the “Nicest Man in Hollywood”, Ratliff’s podcast “Dead Eyes” is a twisty, tender look at the the realities of being a jobbing actor in the 21st century.
In episode 31, Ratcliff met the instigator of all this angst for a beautiful conversation. For his part, Tom Hanks barely remembers the sequence of events that lead to Ratcliff being removed from the show. Despite this, Hanks duly takes full responsibility for the decision; providing a measured look at the “billions of tiny decisions” a director has to make, day-to-day.
One example stood out to me: A set designer might ask, "Do you want the red coffee cups or the blue?"An apparent insignificant decision like this one moment, can take on new weight on the day of the shoot when the blue (or red) looks out of place.
And yet, there isn’t enough time to agonize over each of these tiny decisions. Doing so would delay movie production and dilute the production’s vision.
Friends and family tell me I’m not all that great at decision making. Their words echo in my head while I agonize over choices. Even after I make a judgment, it can take a long time to shake off the nagging feeling of potential regret.
While I sometimes resort to flipping a coin, I heard a suggestion recently: faced with two options, always go for the one that’s more difficult. It’s a heuristic but it makes some sort of sad sense! In some twisted way I may be happier making a decision that prompts time and effort than a decision that could ultimately be chalked up to good luck. It builds character.
Another popular piece of advice on decisions comes from Jeff Bezos. Decisions can be classified as either Type I or Type II decisions: those that can be reversed or those that cannot. Type II decisions should be made quickly so more time can be dedicated to Type I decisions that will be set in stone. Sounds reasonable. But what if it’s hard to tell if your decision is in either camp? Back to coffee cups: choosing the red vessels isn’t irreversible, but there is a reversal cost. Like a movie director, if your time is spent with thousands of reversible decisions, how do you pause the maelstrom of mental fatigue that puts you under?
Charles Darwin was an avid diary writer and took lengths to itemize a number of big decisions in his life. Per “The Art of Decision-Making,” Joshua Rothman notes
Darwin was considering proposing to his cousin Emma Wedgwood, but he worried that marriage and children might impede his scientific career. To figure out what to do, he made two lists. “Loss of time,” he wrote on the first. “Perhaps quarreling… Cannot read in the evenings… Anxiety and responsibility. Perhaps my wife won’t like London; then the sentence is banishment and degradation into indolent, idle fool.” On the second, he wrote, “Children (if it Please God). Constant companion (and friend in old age)… Home, & someone to take care of house.” He noted that it was “intolerable to think of spending one’s whole life, like a neuter bee, working, working… Only picture to yourself a nice soft wife on a sofa with good fire and books and music perhaps.”
I find it fascinating that despite writing these thoughts down, there’s no indication of how he decided on, ultimately, marrying Wedgwood.
Last week a lost notebook of Darwin’s, last seen in 2000, was dropped off at a University library in Cambridge with an enigmatic message.
Librarian
Happy Easter
X
Encased in plastic wrap, the notebooks contain a variety of thoughts and drawings by Darwin including his iconic “Tree of Life” sketch from 1837. I like to imagine some sort of two column decision list by the person who held these notebooks for 20 years; a deep deliberation on doing the right thing over the risk of punishment.
In an article on his blog, Mathieu Jacomy wrote about the thought and attention that went into a large-scale graph visualization project for Le Monde. In it, he mapped out vast swathes of Twitter to plot out discourse surrounding the ongoing French presidential election. But instead of digging into how the visualization was built, he took a magnifying glass to the many thousands of decisions he had to take – the whys that resulted in an eye-catching, rewarding visualization.
A few points were obvious: left- and right-leaning candidates should be on their respective sides of the graph.
A few were political: flags are a bad representation of nationalities, and political allegiances are hard to derive from published or self-identifying sources.
And then there’s the aesthetic and the happy accidents: not showing links between people significantly reduced the clutter in the network, and certain shades of color matched the paper’s style guide for political content.
Practical decisions that intersected with aesthetic ones were another happy coincidence. Through curvy names snaking across the landscape of political discussions, Jacomy is able to avoid overlaps whilst providing an overall effect that’s pleasing to the eye.
In typesetting there’s the concept of rivers: unless a page is calibrated just right there will be gaps across the lines and paragraphs that draw attention from the main event. Jacomy notes that writing text from left to right can result in undesirable troughs of ink that can create a misleading effect when layered on top of the network.
Many things jump out from Jacomy’s process. For a start, there’s a healthy selection of comparison screenshots. As with any design process, I find a work log is a healthy way to keep track of a project. Liberally collecting outputs of one’s process along the way makes it easy to justify decisions and provides rich fodder for fantastic "behind the scenes” blogs such as Jacomy’s.
Feedback loops are extremely important to the process. These are the steps and time required to see the results of a change in your work, for example the time taken to fetch data from a database, transform it and run a network layout. These steps could take many minutes resulting in distraction and delay. Tighter feedback loops give you the opportunity to make so many more decisions. A work log is a perfect complement to a tight feedback loop: a log of your progress and why certain decisions were made.
Of course, building compelling and insightful data visualizations isn’t about making the “right” decision - it can sometimes simply be making “any” decision at all. Perfection is the enemy of finished. Embracing the millions of micro decisions made in a project is embracing the fact that your work may never be as good as you hope or expect. By honing this ability to make decisions you’re also building your instinct, which will compound over time.
There’s an interesting asymmetry to celebrity interactions. For a so-called “normal” person, meeting someone famous is noteworthy, perhaps even life-defining. But for the celebrity, these encounters are extremely commonplace and fatiguing.
When discussing the infamous dismissal, Tom Hanks recounts his coffee cup decision by way of an example, but he’s quick to avoid diminishing such a life-defining moment for Connor Ratcliff to that of an inanimate set-piece.
As he insists, “You are not a coffee cup.”
But in that moment, on that day, Connor Ratcliff was a coffee cup; one caught up in a mass of decisions Hanks had to make to achieve his goal of producing a hit TV show.
When making decisions on the presentation of data we don’t usually come face-to-face with the people represented and connected. Yet, we owe it to them to acknowledge and document decisions that are made in their name. We won’t always get it right but with time we can calibrate our process to understand the true impact of our choices.
]]>The ancient Egyptians are the first known civilization to write with pens, which allowed them to record and share ideas with others.
The invention of the printing press in 1440 helped start the Age of Enlightenment. The cost of books was driven down, making them much more accessible to the working class. And in 1996, Macromedia Flash enabled a creative renaissance for a generation of artists, creators and technologists.
To be a Flash developer in the late 90s was to use software directly inspired by decades of attempts at accessible development environments. Creativity enabled by Flash was perfectly paired with the potent power of mass dissemination via the burgeoning World Wide Web.
Jer Thorp was one of these visionaries. After blagging a job as a Flash developer he spent his evenings learning all he needed to know to deliver on his projects when clocking in at work.
Flash was the tooling Thorp needed to kick-start a career working with data. His approach can be summarized by his simple, aspirational thought:
lots of people do normal things, fewer people do weird things
A rare podcast appearance gives the impression Thorp is a shy, charismatic nerd; someone who obsesses over the details in service of some greater whole. He has a wry humor that glints as he recounts stories.
But Jer Thorp’s resume and experiences may be betrayed by this casual demeanour: National Geographic Explorer, data artist in residence at The New York Times and innovator in residence at the Library of Congress. His book, Living in Data was published last year. It’s a feat of a book about working with data, producing data art and creating compelling visualizations.
Thorp talks about data in a way that crackles and zips. Without shying from technical specifics, Thorp provides just enough narrative to keep you hooked. Living in Data is a 21st century take on decades of data analysis and visualization; taking a celebratory yet critical look behind the scenes. I marveled at the threads brought together by someone with a rich background across industries.
I attribute some of the book’s success to Thorp’s keen eye for design and analysis; appearing, surprisingly, even in the book’s dense appendices. Here there are deep considerations on the inclusion and exclusion of certain works, fascinating backstories and extra discussions that illuminate the core text. Not to be overlooked, the appendices provide the complex context behind the research, connections, doubts and triumphs of three years of writing a book and the life lived to enable it.
Thorp recognizes that our use of data is still fresh and evolves at a higher rate than other technical ideas. The book explores some linguistic flights of fancy which allow us to consider data from fresh angles.
We live and breathe metadata – data about data gives us context clues and is as important as the data it itself describes. The metadata is the message.
To complement metadata, Thorp defines “interdata” as the bits of data that link other data, crucial keys that draw connections across sources. This is something we talk a lot about in source/target. Links are intrinsic to our network thinking.
The power of interdata can be surprising. Jarring adverts for products apparently only discussed verbally prompt evergreen concerns that Facebook is listening in on private conversations. The reality is likely to be more mundane: you’re sharing a Wifi network with others who have probably surfed for that exact topic. The cookie-shaped interdata here is all that’s needed to target someone supposed-spooky ads.
I respect and sometimes allow myself to treat data as a plural noun (“data are being collected”) but can’t help feel a little uncomfortable about it. If it was “pompous” back in 2010 I’m not sure what that makes it in 2022. Perhaps informed from my experience wrangling datasets I struggle not to think of tables and files as a homogenous blob of data (“the data is messy”).
To make things even more uncomfortable, Thorp suggests we could elevate data from a noun into a form of verb:
I data you. You data me. We data you. You data us. They data me. They data us. We data them.
His argument is that data is far from a passive observance of attributes and statistics, it’s an active process of extraction and transformation with countless little, imperceptible decisions. To collect data is to bring bias to the table. Even if unintentional, the impact is important.
Data is not inert, yet its perceived passivity is one of its most dangerous properties. When we are warned that a government is collecting data about its citizens, we may be underwhelmed specifically because this act of collection seems to be so harmless, so indifferent.
But of course data is not collected and then left alone: it is used as a substrate for decision making; and as an instrument for differentiation, discrimination and damage. Putting an active form of the word data into common parlance could serve as a reminder that the systems of data collection and uses are humming with capacity for influence, action and violence.
I’m not so sure we’re going to be talking about dataing people any time soon but I admire the sentiment.
Back in 1931, a young woman called Helen Hall Jennings asked a class of seventh-grade students in Brooklyn, New York a simple question:
Given the choice, who in this classroom would you sit beside?
Along with her collaborator J.L. Moreno, Jennings took the answers and drew what are considered the first depiction of social networks. Nearly one hundred years later these “sociograms” look familiar to us: lines drawn between boxes on a page. And the value of the data in this form still shines through. Reciprocal friendships, unrequited preferences and the – in the case of students on the periphery – the wrench into isolation from the wider network.
But as Thorp notes there’s a core consideration that isn’t obvious: these social networks aren’t depictions of a world that is, rather they are an aspirational sketch of the world we want. Knotty, complex familial desires are laid bare in stark pencil-and-paper plots.
As reflected by notes in the appendix, it’s hard to parse fact from fiction when researching Moreno. He takes up most of the oxygen in discussions about sociograms and the wider sociometry and Jenning’s contribution has been minimized. Thorp suggests she was the one to draw the original sociograms.
Three years ago the inimitable Martin Grandjean revisited the sociograms for a blog post and released a dataset for easy consumption. His intent was to compare the hand-drawn graphs with the modern equivalent to understand the visual bias that may have been introduced.
Like me, you might faintly remember a news story about a humanitarian project bankrolled by a famous actor. It was an impressive yet mad libs-worthy project – George Clooney enlists human rights investigator to use satellite imagery and machine learning to detect and predict human atrocities – but what happened to it?
The aim of this “Satellite Sentinel Project” were quickly realized in 2011 once the team uncovered images of “recent grave sites in the state of South Sudan.”
Later they documented the razing of Maker Abior, Todach, and Tajalei, villages in a region where the Sudanese military had been targeting ethic minorities.
Other imagery showed construction and preparation that indicated impending military action.
The project suffered a blow in 2012 when a report highlighting road construction appeared to directly result in casualties and hostages being taken from the very same groups highlighted in the imagery. Efforts to anonymize the data were insufficient compared to the context on the ground – in this case overlooked and unique landscape features.
It turns out there was no guarantee that the very technology that was enabling relief to thousands could also be used to further the very same atrocities it was intended to prevent.
How do we know if we’re helping if we showed this? How may we be mutating the battle space in ways that could harm the very people we’re trying to help? — Nathanial Raymond, Human rights activist
Faced with this realization the project was disbanded and inspired the creation of five human rights in crisis situations:
It’s all shockingly relevant as we traverse a very 21st-century war unfolding over Telegram and the wider internet. The fine line between productive and destructive appears ever finer with artificial intelligence in the picture.
At times in Living in Data, Thorp sounds weary. He’s been burned by incessant outside forces seeking to market and monetize his work and it shows on the page.
In a quest to avoid the daunting specter of bias, data visualization practitioners too often adhere rigidly to best practice, scrubbing and scraping at the excesses of “decoration” until, they hope, there’s nothing but the clean white bone of truth.
The result of all this is that there’s a kind of meal-replacement logic at work—a conviction that a story might be blended down into a neat, easily consumed slurry, with all the essential vitamins and absent the pesky nuance. That none of us should miss the crisp snap of an apple’s skin.
Yet he still strives to create, to discover and to learn. He is generous with his praise and support of others and shares his platform with specialists who have even more interesting things to say.
One of the most striking projects highlighted in the book is Mimi Ọnụọha’s Library of Missing Datasets – an inspired attempt to catalogue the uncatalogable.
Since 2015, Onuoha has been assembling a collection of data sets that aren’t. People excluded from public housing because of criminal records, undocumented immigrants currently incarcerated, trans people killed or injured in hate crimes, sales and prices in the art world, how much Spotify pays each of its artists per play of song, publicly available gun trace data. These are all data sets that for some reason don’t exist or are not acknowledged to exist by those who hold them.
In the book, Onuoha outlines four reasons for this which align with Thorp’s persistent on the paradoxical: truly good data work has to consider the very data that wasn’t collected or included.
”We talk about marginalize a lot to the effect it becomes meaningless. But data is one of those places where marginalization is a real thing, if you’re not in the dataset you’re in the margins, we don’t compute in the margins.”
At the recent Outlier conference, keynote speaker Andy Kirk zoomed out to show a perspective on the history, present and future of data visualization. It reminded me that what’s old to many is likely new to others, especially those of us (all of us!) who are still learning. Disagreements over the “correct” or “appropriate” representation for some data are rehashing the same party lines drawn a decade or more ago.
This cycle is reflected by Thorp’s trajectory. He made the conscious decision to avoid “traditional data visualization” in favor of hybrid, multidisciplinary data projects, grounded in reality. Living in Data provides a blueprint for an approach to data work that’s always in search of that Flash of inspiration that crystallizes our experience of being oh so very alive, in data.
]]>Last week I took a long weekend to go surfing in beautiful Tofino. It’s still winter so there’s no way to avoid a thick wetsuit, hood and gloves. It’s been a few years since I last tried it and while I’m only marginally better than I remember, there’s something about the randomness and persistence of the waves that makes it addictive. Falling asleep at night I can still feel the waves rushing towards me (just before another mouthful of saltwater.)
I tracked my sessions in the water this time and was intrigued by the organic fronds that appear as I attempted to ride the waves. I think they tell a story of someone who doesn’t quite know what he’s doing. I can’t seem to find examples of “good” surf tracks online but one would expect more straight lines (representing actual surfing) and fewer scrunchy bits (minimizing all the splashing around and getting pummelled by the waves as I did.)
It’s been just over two years since I started this newsletter and, while my output has slowed somewhat, I’m still interested in keeping it going.
The Pudding’s fantastic interactive article on plain writing reminded me of an edition back from May of 2020.Back then I was finding some takeaways from a virtual conference on Knowledge Graphs. Nearly two years later the use of the term “knowledge graphs” has exploded - in the industry we’re really just speaking about “graphs”. It’s like the crypto/cryptography/cryptocurrency conflation – something cryptography researchers have begrudgingly embraced if it means increased funding and visibility for their work.
Perhaps it’s that adding the word “knowledge” to whatever you’re working on makes it sounds more definitive. Geospatial visualization behemoth Esri have co-opted the term with their new ArcGIS offering, “Knowledge.” Other terms are ripe for the picking: take the new decentralized finance company “The Graph” and blockchain “Ontology” – both translucent attempts to sound weighty, absolute.
I just finished Jer Thorp’s “Living in Data” – a thoughtful, lyrical book that goes into great detail on the responsibilities and realities of transforming data into something meaningful.
Almost all of my visualization work has taken the form of exploratory tools. Even in the case where the result is a static image (like the PopSci piece), I build my own vehicles, to make it easy for me to range widely across a data set’s terrain. More often, I let others drive.
The Pudding article articulates it’s point so well simply by toggling between a “traditional” and plain version of the same text. The toggle is a simple steering wheel. Combined with animations, this direct, visual comparison goes a long way to highlighting the different approaches.
The article also delves into some of the techniques used to model the complexity of text. These include some “black box” proprietary algorithms where there’s limited public information of the approach and weightings of the algorithm.
It reminds me of “what3words” – touted as a potentially life-saving method of defining specific coordinates on a map with simple English words, it’s a flawed proprietary algorithm that could cause further confusion from the use of similar or pluralized words for locations in close proximity.
Tom MacWright didn’t hold back in a recent book review aside:
But seriously, there’s no validity to this idea of what3words. If you’re in the middle of the woods, and you have internet access, and you open the app and memorize the words and tell the ambulance them and spend 30 seconds explaining what what3words is, well, are you real, or a marketing story. Wait, no, even in that situation it’s way to easy to say a word wrong and send the ambulance to the wrong place.
That link, by the way, takes the use of a “hairball” – an overwhelming mass of graph nodes and links in a visualization – to great effect. It shows just how many points in an area could be confused.
Any sufficiently hard problem results in a market for commercial solutions. And those commercial offerings will compete to solve the problem in opaque, proprietary ways to attempt to gain an advantage.
Parsing language is hard enough on a syntactic level but once semantics are introduced it can seem insurmountable. Take the sentence:
More people have been to Berlin than I have
This sentence seems to make perfect sense. It has the right mouthfeel, the correct form, the appropriate notion. But it falls apart with the faintest bit of scrutiny. “Of course more people have been to Berlin than me, there are so many more people than me!” Here’s another one:
While I was surfing the internet went down
These are sometimes known as Escher sentences, reminiscent of the twisty, inconsistent drawings from the 20th century artist.
Garden-path sentences are a similar construct – the reader is “tricked” into parsing the sentence incorrectly. One succinct example is “the old man the boat.” Your brain follows the parsing path it expects to when reading and there’s a moment of disorientation when everything suddenly seems nonsensical and non-grammatical. Is the old man IN the boat? ON the boat?
Speaking of Escher and mind-bending writing, Douglas Hofstadter’s influential “Gödel, Escher, Bach”, describes a curious device and names it a “quine” – as per Wikipedia:
A quine is a computer program that takes no input and produces only it’s own source code as output
I stumbled upon this fantastic quine on Twitter, blending code golf, the visualization abomination technique “metaballs”, fogs of war into a remarkable tweet-sized snippet.
But it’s the unfurled version of the code that tipped it into inclusion this week – I love the snaking indented view with expository comments for the casual reader. It’s not like I understand the code any better without extreme close examination but it goes a long way to help me parse it.
As graph technology reaches more mainstream interest there’s a blurring of semantic lines between concepts in the space.
For example, here’s an article about a project to track tens of thousands of objects in space. Out of 26,000 objects orbiting the Earth, only around 3,500 – that’s around 13% of it – is actually used for a purpose. It’s an illuminating project that enables some mind-blowing visualizations.
Pan around the 3D globe here and you get a much greater understanding of just how severe this problem is. At points you can barely see the earth for the dots floating in mid-space.
But in describing the project, concepts like knowledge graphs, graph databases and visualization are smooshed together into a homogenous blob.
A family tree is a simple graph database.
I mean, sure, this isn’t wrong, per say, but a family tree is as much of a graph database as sketching a route to the nearest train station is a simple atlas of the world.
Even experts in traditional semantic graphs get confused by fundamental ideas in graph visualization – in this seminar summary from Juan Sequeda feels around the domain trying to massage some Spark Notes, but misses the creative, ephemeral angle of why we visualize data at all.
Over in the intelligence trap that is the Personal Knowledge Management space – where the pursuit of knowledge is put on hold while the vessel for the knowledge is polished and tweaked for optimal polishing and tweaking. The term “graph” and “knowledge graphs” are co-opted with grandiose soundbites like “second brain” to give some faux structure to what is, essentially, writing shit down.
As one of many who strives toward accessible communication in the graph technology space, it’s tempting to declare inconsistent language and definitions as outright incorrect. But there’s a path of least resistance when it comes to the adoption and acceptance of technical techniques. I’m a strong advocate for real-world applications over academic navel-gazing; if people are inspired by technology to create interesting projects what does it matter what they call the technology? If the space is accessible and adaptable, then it is more conducive to creativity.
Many thanks for Asaf Sharif for the mention of source/target on his Social Network Analytics podcast, NETfrix. I’ve been going back through the archive but particularly enjoyed his latest episode: a skeptical review of Dunbar’s Number. In it he traces the sources of some bold, eye-catching summaries of human nature and pulls a little at the threads. The podcast is well worth a listen for a thorough yet friendly look at of some of the topics we touch upon in source/target.
Off the Charts is The Economist’s weekly newsletter on the process behind their data journalism – it’s a must-subscribe. This week they walk through charting air traffic over eastern Europe after Russia launched its attack last week.
Hey, thanks for reading. I hope you’re good – see you next time!
]]>One small but inconvenient pandemic shift is that I’ve found it to be much harder to leave the house without forgetting something. I’m not leaving the house as much, so it’s a novelty to do the usual wallet / phone / keys check. I stepped out to run some errands the other day under an overcast sky. I didn’t get very far before my phone vibrated to tell me that I’d left my keys behind.
I picked up one of Apple’s new AirTags at Christmas and put it on my keychain. AirTags are a way of tracking keys, bags, wallets, phones – really anything you want to track. It’s an eerie technology that takes advantage of the wider network of ambient connections from unassuming bluetooth bystanders. You don’t opt-in to the network, the mere existence of your compatible phone in your pocket is enough.
Flying home from the UK after Christmas I popped one in my luggage and watched as my bags made their own way across the airport and ultimately the world. I imagined a baggage handler being the hub of thousands of Air Tag pings against the phone in her back pocket.
Amber Norsworthy of Mississippi was driving home when her phone pinged a notification she hadn’t heard before. Up flashed the message
“AirTag Found Moving With You”
This notification was noteworthy because Norsworthy didn’t own an AirTag. The alert was a safety feature from Apple to alert potential victims of stalking activity. Once home she searched the car and bags but didn’t find anything, but by then it was too late, the AirTag (wherever it was) had recorded and shared the location of her home and the route there. Horrifying.
Tracking devices aren’t new but having Apple’s vast network of devices to leverage has made the technology much more accessible. Instead of the device requiring the capability to phone home via satellites, it can piggy-back on other, more powerful devices, just passing by.
A company called Tile produced an early consumer model. I owned one of their original generation of trackers before Apple muscled into the space. My Tile tracker was a small plastic square with a hole for a keyring. Without the Apple network ecosystem the Tile mainly just told you where your own phone had seen the Tile last.
Unlike the Apple AirTag the Tile was single-use. Once the battery expired there was no way to replace it and it had to be shipped back to Tile for a discounted replacement.
Tile was in the news recently as they were bought by a Life360, a company with the reputation of selling precise location data of tens of millions of customers. Location data is big business, just this week Google was accused of deceiving users about location tracking in fresh lawsuits by three states in the US.
The ubiquity of Apple devices has created a massive network, and it is this ecosystem that has made the AirTag so successful. But in a strange twist (that’s also convenient for the growth of Apple’s ubiquity), devices from other manufacturers won’t automatically notify their owners of suspicious AirTags.
With or without an Apple device, someone can still be targeted using more traditional tracking technology, it’s just that the AirTag is dramatically cheaper and therefore more accessible than those options.
Apple has released an Android app to allow detection of unauthorized AirTags traveling with you, but the number of people that will download these apps will likely be a lot smaller than those who have Apple devices with it on by default.
Back in October, a camera presumed lost for three years was found on the ocean floor in Nova Scotia. The footage it contained had an unlikely camera operator: a male grey seal. Ocean researchers use cameras in tandem with GPS monitoring and accelerometers to understand the migration, distribution, reproduction, and feeding patterns of seals and other sea creatures.
Watching the footage back is the closest you’ll get to being a seal. It’s like a video game where you swim around, forage and snap at passing fish, all from a seal’s-eye-view. For researchers, it unlocks insights that simply aren’t possible through plain observation. One fun fact learned is that grey seals will often dive down to the ocean floor and fall asleep where they’ll be rolled by the current until they wake. Without the camera and sensors to confirm this behavior the dive pattern was assumed to be a form of foraging.
Until this camera was found the footage was lost with no ability to track it. In a bid to make the recording and retrieval more resilient, researchers leverage Bluetooth technology (as in the AirTag) to keep tabs on these devices and reduce the cost of the analysis.
If you’ve ever watched a seal acrobatically glide through the water, you’ll know these static visualizations created from the tracking data don’t match the wonder of these beautiful creatures. Here’s one that could do with a #MarineMakeoverMonday.
Sticking with seals, here’s a plot of the tracking pattern of five grey seals from July 2015 to May 2016 tagged on the Magdalen Islands in 2015.
There’s a wider application for this sort of tracking in the ocean – how else is the U.S. military supposed to track their, I kid you not, militarized dolphins trained to protect nuclear power weapons?
Here in Victoria, BC, we’re close to the Salish Sea and I’ve taken more than a passing interest in the ships and boats that come and go from the shipyards nearby. With the cruise ships paused until this summer, each visiting vessel has been interesting to see towering over the marina. One such ship, the Zim Kingston, never made it much closer than 8km to the shore. And for good reason: it was on fire.
After losing over a hundred containers when traveling in rough seas a fire broke out on deck that blazed for several days. It was easy to spot the smoke and flames of the ship in person but VesselFinder offers live tracking and details of any vessel you can cast your eye on. (You may be familiar with that site after a surge in popularity during the Evergiven saga). Four months later, only four of the lost containers have been found.
A pair of ships have been docked near my apartment periodically over the past year with a distinctive pale blue base and cream top. They’re owned by nonprofit The Ocean Cleanup, and were fortuitously nearby and able to assist when the Zim Kingston was on fire.
On first pass I assumed an Ocean Cleanup was a pretty uncontroversial project – of course we want to clear the oceans of plastic, how noble for this charity to dedicate themselves to this cause. But there’s some suspicion that the environmental cost of the project isn’t enough to offset the amount of waste collected.
The Verge quotes Alexander Bond, senior curator in charge of the Bird Group at The Natural History Museum in the UK with the disappointing assessment:
if a gadget is going to clean up our plastic waste, then we can keep producing it. “It moves the proposed solution to ‘out there,’ where the trash is, rather than into our own lives, where the trash is being generated,”
A few months ago I featured a fantastic project that allowed you to follow a drop of rain to the ocean from anywhere in the US and it’s recently been updated to support rain drops anywhere on Earth. In a horrible contrast, a site from The Ocean Cleanup simulates how a single bit of plastic will ultimately make its way to the large patches of plastic floating in the ocean – like the Great Pacific Garbage Patch.
In “How Bad Are Plastics, Really?”, Environmental Sociologist Rebecca Altman argues that plastic waste is intrinsically twist-tied to the climate crisis
It can be hard to visualize the web that connects commonplace cups to the interlocking global crises of toxics, environmental injustice, and climate change, and even harder to locate where to intervene.
…
Should U.S. plastics production continue to grow as the industry projects, by 2030, it will eclipse the climate contributions of coal-fired power plants, […] Or, by another measure, the current growth trajectory means that by 2050, the industry’s emissions could eat up 15 percent, and potentially more, of the global carbon budget.
Unless it’s been incinerated and turned into an alternative form of toxic waste, the vast majority of plastic you’ve ever used still exists somewhere and will continue to exist for centuries. Think of the Tiles, containers and toothbrushes.
Remember carcinization from edition #11? The tendency for species, given a sufficiently long timer period, to evolve into a crab? What about the undersea cable maps I’ve linked in different forms over the years?
I’ve resigned to the fact that over time this newsletter will just carcinize to pure networked crab content. Here are three very on-brand source/target crab links. Thanks to Rusty at Today in Crabs Tabs for perhaps all of these?
Brown crabs find underwater power cables ‘difficult to resist’
This Tiny Crustacean Trapped in Amber Tells a Different Story About Crab Evolution - Science
Includes this fantastic “crab tree of life” – the newly found 100+ million year old crab preserved in amber is crab F in the hit parade of top crabs.
Back on the tracking theme for this week:
At a time when absolutely everyone is writing about NFTs, I’d like to just point you to a parallel exchange in the replies to that tweet:
What’s the point? Just wastage of fuel.
Because it looks cool in a flight tracker
One last thing. Rabbitholing crab content lead me to this “motivational poster” of a man just trying to get somewhere despite the thousands of crabs also trying to get somewhere.
In honor of their persistence I’d like to announce the first ever source/target caption contest – hit reply and let me know what you think. Best response gets a prize later this year.
This week’s edition is about zooming out, finding commonality and the surprising benefits of having a foggy outlook.
When I submitted my application to become a Canadian citizen I imagined the eventual outcome to be a trip to city hall in a tie and suit, surrounded by friends, family and others who were also becoming citizens of Canada.
But after a myriad of delays due to COVID and bureaucracy I logged into yet another Zoom meeting and recited an oath in English, then French in a laggy call spanning six timezones.
Then in November I found myself back in the UK due to a family emergency for an unknown stretch of time. It had been two years since I had seen my family and it was, circumstances notwithstanding, a shock to be back. Still without my new Canadian passport in hand my plans for the future turned foggy.
When I left Canada the main news story was the devastating weather and resulting mud-slides in British Columbia. In the UK the mud-slinging was political; concerning Government officials hosting parties during a strict lockdown over a year before. Back home in B.C. it was an accepted inconvenience to flash a vaccine passport to go to pubs and restaurants. For many MPs in the UK, their potential implementation was an unacceptable overreach.
My surprise transplant back to the UK highlighted differences between two largely similar countries. With limited travel due to COVID and an over-saturation in pandemic news it’s hard to zoom out and recognize whether differences are micro (in a family or household), macro (country-wide) or somewhere in between (community differences, like the north-south divide or supposed social class).
Perhaps it’s deference to the mythical “Blitz spirit” I’ve been told about my whole life but I felt surrounded by the underlying camaraderie in England. A sort of “we’re all in this together” tough love where people band together and “get on with it.” Unfortunately, this leads to selfish contrasts, like people in shops complaining about slow queues loudly to seek allies from others in the queue whilst being the only person not wearing a facemask.
I was running in the fields near my family home and became aware of a woman up ahead waving her arms in the air. I naturally slowed as her tall, unleashed dog ambled towards me. I took my earbuds out and asked if I should be worried about the dog as she seemed concerned. She shouted back “he might attack if you run towards us” and with her headphones still in she added, “it’s *my* stranger danger, you know?”
One core theme of source/target is virality. I’ve considered renaming the newsletter to “network effects.” Another theme is nostalgia, that intoxicating feeling of seeking comfort from a time, place or activity that you fondly remember.
In a vain attempt to find some kernels of universal truth I read far too many non-fiction business books last year. I shouldn’t have been surprised to find references to the same studies and, likely apocryphal, stories throughout. Take a study that demonstrates the willingness of people to allow others to cut the queue to a photocopier – even if the reason they give is completely unassuming and unremarkable “Oh I just need to use this photocopier to do some copies.” You know you need to broaden your reading horizons when a third book uses the same reference.
One common reference that I actually don’t mind has to do with taking things one step at a time. Here’s a quote from Anne Lamott’s Bird by Bird:
“E.L. Doctorow said once said that ‘Writing a novel is like driving a car at night. You can see only as far as your headlights, but you can make the whole trip that way.’ You don’t have to see where you’re going, you don’t have to see your destination or everything you will pass along the way. You just have to see two or three feet ahead of you. This is right up there with the best advice on writing, or life, I have ever heard.”
Here Lamott is quoting Doctorow but I also read it in On Writing by Stephen King – it’s a solid idea, even if it now feels a small step away from being a meme on Facebook.
It would be hard to build – a lot of these books don’t have standard citations – but imagine the invisible network of such ideas snaking through books and literature.
Nowadays I build websites with a refreshingly simple tool called Eleventy. It’s inspiriting due to it’s open, inclusive community of individuals building and creating and sharing. Unlike other web technologies the leaders and promoters seem to actually care about building inclusive sites and empowering others to leverage the flawed but extensive JavaScript ecosystem.
Here Zach Leatherman’s surprise echoes my other advice-givers. Just take it step by step, edition by edition, mile by mile, commit by commit.
Upon announcing his resignation as CEO of Github, Nat Eliason did a savvy burst of cross-marketing by saying he would be spending a lot of time with the new Age of Empires game from Github owner, Microsoft.
Taking the opportunity to clear out my stuff at my Dad’s home I ploughed through reels of CDs that would never be read, or ripped. Unmarked CD-Rs, scratched-up DVDs, symbols of a bygone era of ephemera. I stumbled upon a picture disc of Age of Empires and it all came flooding back. The colours, the sounds, the maps, the frantic rush to build your empire from a couple of villagers into a creaking, jerky mass of pixels; reaching bottleneck after bottleneck, limited only by your ability to plan ahead of have just enough grain for them to thrive.
In Age of Empires, you start at a random spot on a map with a small radius of visibility. You can’t see into the dark mass that is the rest of the map and – when playing with others – it’s impossible to see just how quickly they’re amassing their army and extending their encampments.
This fog of war, as it is usually described, is a game mechanic that mimics the uncertainty of warfare. Of course you can’t monitor something that you or others can’t see. Once a scout or similar unit has explored an area and left again, a “shroud” covers the area, concealing new enemy movements unless the unit returns.
Digitally concealing parts of gameplay is easier with video games than board games, although the variant of chess where the location of pieces are hidden to players was first played (with an facilitating umpire as computer) in 1899. Modern games like Battleship use it as a core meatspace mechanic via obstruction of view from the board itself. In Minesweeper the lack of visibility is the point – it’s unusual to play a game for which you could lose on first click.
We often talk about the desire for network graph visualizations to provide an illuminating global view of a complex cluster of connections. I respect the appeal of such global views but I’m quick to point out that the micro, local connections are where the real insights lie. What would happen if we applied this technique to the display and exploration of network graphs?
Guided search-based exploration of networks is typically via “land-and-expand” – start with a node of interest (a person, device, incident) and request any connecting details. Putting the user in control feels liberating compared to observing, say, a static bar chart but it’s like fumbling behind the back of a TV to plug in a HDMI cable without looking.
Constantly polling for fresh data and new connections is disorienting. If instead we process a network and simply peer at it through the fog we’re discovering the network as it exists.
In the online game GeoGuessr the user is dropped somewhere in the world and has to guess their location just by exploring the region via Google Street View. A fog of war network is similar – we limit our vantage point and notice the details. What is it about this node and where it exists in the wider network that’s interesting?
When the pandemic started I was overloaded by keeping track of cases, restrictions and uncertainty across Canada, England, the US and Malaysia. Refreshing the news and Twitter gives an overburden of updates leading to a sort of negative information – it’s not actionable or empowering, it’s overwhelming.
I often find myself leaning towards the comfort of what I call internal actions – reading, writing, coding or thinking that will likely never see the light of day. With internal actions there’s a fog around me and I hope that no one will intrude.
In contrast, external actions like calling a friend, posting, publishing and requesting feedback help me push through the fog from where I am but also from elsewhere on the map. For me, 2022 is the year of external actions.
]]>This time last week I was halfway through a six hour, round-trip hike to an alpine lake. My partner had taken the week off and we’d climbed high enough to have hit some light snow about an hour ago. Our phones had been without signal for two hours, and we hadn’t seen another soul for three. Suddenly we came across a dank, magical grove filled with the most beautiful mushrooms I have ever seen.
They were straight out of a video game: bigger than my fist with a bulbous, deep scarlet cap and large, knobbled, creamy-white spots. But these toadstools were more realistic and downright wild than Mario’s. It must be the season here on Vancouver Island as there were plenty of them on our hikes – many of which I had never seen before in my life. I suspect that’s because I’m a relative newcomer to the Pacific Northwest, but not all funghi lay in plain sight; many were under logs and hidden in the brackish brush around the trails. You had to pause and peek to see many of them.
It made me wonder: perhaps I’ve been missing out on all kinds of amazing funghi my whole life because I failed to stop and take a closer look?
Some things are common knowledge or easy to find. I could leave my apartment right now and find some button mushrooms within ten minutes (humble brag). Sure, they would be at a grocery store, but the point is I know what they look like and I know how to find them.
Other things are almost-impossible to find. Many dedicate their lives to searching for objects, ideas, proofs and results.
Professor Matt Might of the University of Alabama at Birmingham put together an illustrated guide that provides a broad understanding of what, exactly, people are doing when they work on a Ph.D. or similar. It’s a visualization that has inspired many as they push the limits of human knowledge through academic research.
I first saw Might’s work years ago and think about it often. Researching it again for this edition I found that the story of his family since then has emphasized his “keep pushing” mantra. When their son Bertrand was six months old they discovered he had an “ultra-rare” genetic disease. It was estimated that only 500 other living patients had the disorder.
Re-orienting his work in academia he helped forge connections with other parents of children with the rare disability and expanded the limit of human knowledge to help countless others in the future.
Might’s improbable and inspiring story is the true embodiment of his own motivating rally-cry of “keep pushing.”
I’m the sort of person who finds it hard to enjoy something if it reaches a certain level of popularity. It’s a self-sabotaging tendency to the unique – a fascination to the fringe. That’s not to say I don’t enjoy popular things. I just approach everything with fresh eyes and trepidation, waiting for the switch to be flipped before I become a fan.
In a soft quest to understand why I’m like this I’ve been thinking a lot about the homogeneity of popular culture and how it relates back to knowledge. Take TV: The Ted Lassos and Squid Gamers of 2021 are at the center of a circle of popular culture right now, joining perma-favourites like The Office.
And then there’s the internet. Think of the eternal reposts of the same articles across the web. Popular news sites and aggregators like Reddit are at the center of a maelstrom of content. Everyone has their own neighbourhood they enjoy. Many are satisfied with the center and others will prefer to explore the niche newsletters striving to share something a little different.
Inspired by Might’s “Keep Pushing” circle, we could visualize popular culture at the center with more narrow interests towards the circumference.
Sometimes culture surprises us and the fringe shifts towards the center. In a rare link between the popular and academic, Ted Lasso name-dropped a Professor of Ecology from the University of British Columbia last week.
You know, we used to believe that trees competed with each other for light. Suzanne Simard’s field work challenged that perception, and we now realize that the forest is a socialist community. Trees work in harmony to share the sunlight.
Prof. Suzanne Simard’s bestselling book, “Finding the Mother Tree” gives a full treatise of her decades of work in the field forest researching and confirming the role of mycorrhizal networks. This is the symbiotic association between green plants and fungus.
By analyzing the DNA in root tips and tracing the movement of molecules through underground conduits, Simard has discovered that fungal threads link nearly every tree in a forest — even trees of different species. Carbon, water, nutrients, alarm signals and hormones can pass from tree to tree through these subterranean circuits.
This symbiotic association is suspected to also be found in “prairies, grasslands, chaparral and Arctic tundra — essentially everywhere there is life on land.”
Together, these symbiotic partners knit Earth’s soils into nearly contiguous living networks of unfathomable scale and complexity.
These quotes are from an article in the New York Times I linked back in source/target ~ #23 – I think you’ll like it.
In a 2018 research article with the delightful title “Towards Fungal Computer,” Andrew Adamatzky posits a future where networks of mushrooms with the ability to conduct electricity could be used as a form of semiconductor. These networks are known as “mycelia” and are found underground and elsewhere, most notably in rotting tree stumps, like the ones I hiked past last week.
Sourcerers will know that researchers have been looking at the use of protoplasmic slime mold networks, but this novel funghi approach appears to be more flexible. According to Adamatzky, we can “reprogram a geometry and graph-theoretical structure of the mycelium networks and then use electrical activity … to realize computing circuits.”
On first pass it may sound like the author may have consumed a few too many electrifying, magic mushrooms but there are some smart proposed applications of these networks thanks to the incredible capabilities of our fungal friends:
Fungi “possess almost all the senses used by humans” and can sense “light, chemicals, gases, gravity and electric fields”
Likely application domains of fungal devices could be large-scale networks of mycelium which collect and analyze information about environment of soil and, possibly, air and execute some decision-making procedures.
One interest I picked up in the pandemic was monitoring weather and air quality using a little Raspberry Pi sensor made by a company in England. It’s cool/weird to think that some funky fungus network could do the same thing with a spark.
I’m inspired by the work of Might, Adamatzky and Simard and reminded how much there is to know and understand of the world around us.
The next time you’re in a forest or woods let your curiosity lead you a little – you never know what networks may be connected under the leaves.
You probably heard that Facebook and related properties were offline for most of the day the other week. As a story that had a wider impact than you may expect many were quick to attempt to explain what had happened. This article from Julia Evans gives a great summary from a visualization perspective using one of the jankiest-yet-productive online tools I’ve ever seen – BGPlay. It’s a good reminder that tools don’t have to be beautiful to be helpful. There’s no excuse for lack of discoverability though.
Here’s a neat look at summarizing writing by stripping out everything that isn’t punctuation. With practice, you could likely identify someone’s writing by the punctuation they used, as it highlights their tendency to use particular clauses and phrases. I’d love to take this a step further and find a way to compare changes to writing over time by deriving a graph of typical clause flow.
This article argues that humanity’s collective rush to get satellites into space is a double-edged sword: it’s helping connect remote communities to the internet,
One of the stories that I like to tell is that we had an elder that was excited once we connected his phone to it,” Ashue said. “He was calling people and saying, ‘I’m calling you from my couch. I don’t have to stand outside in the rain,’” because previously “in order for his cell phone to work, he had to go out toward the road.
but robbing indigenous communities and others of the constellations that have guided and inspired for literally thousands and thousands of years.
We’ve talked about the night-sky-as-a-graph in previous editions but I’d never heard of “dark constellations”:
They are shapes in the sky made out of the dark spaces, opposite to Western constellations that make shapes out of the starlight.
With the increase in light pollution due to these reflective objects in space, we can no longer access these dark constellations,” she added. “That means we can no longer monitor our cultural signals that tell us about the seasons or ceremony timing, or even access our knowledge, as much of it is stored in the sky. If we cannot access our skies, we cannot practice our culture.
Similar to “It’s not the notes you play, it’s the notes you don’t play,” perhaps it’s the stars we can’t see that are making the largest impression.
I’m excited to be part of a panel on writing about data visualization next week. It’s part of the VisInPractice satellite for IEEEVIS 2021. I’m in great company with participants from Market Cafe Magazine, Nightingale and Multiple Views: Visualization Research Explained.
More business this week. My talk for B-Sides Vancouver earlier this year is now on YouTube. It’s called Attack of the Graph: Visual Tools for Cyber Analysis and gives a brief coverage of a range of applications of graphs in information security use-cases. There was so much more I’d like to cover in the future but I was thrilled to be part of such a well-organized, domain-specific conference.
I’ll leave you with a melangerie of mushrooms to close this one out. Until next time!
]]>A muffled voice behind the door: Hello?
* There are some shuffling noises before the heavy wooden door creaks open. Inside stands an old woman whose blank expression wrinkles into mild recognition. *
Teacher: Oh! Hello my dear. It’s Christian, right?
Me: Yeah, that’s right… Christian – I write that newsletter about graphs and networks, you probably signed up at sourcetarget.email a while ago. It’s really great to have you as a subsc…
Teacher: You still write that thing? Interesting.
Me: …Yeah, it’s been a sluggish source/target summer but I’m ba…
Teacher: You know, a lot of the newsletters I subscribe to have given up and stopped a while ago… I figured you had done the same? But tell me, what’s a newsletter without any new editions? Without subscribers? Without an author?
Me: Er, I’m not sure. I’m just here to deliver the latest edition. Sorry it’s been a while. You could always unsubscribe if you don’t think it’s worth it.
Teacher: Please do not be alarmed, your newsletter is fine! I must admit I sometimes fall asleep by the fire before I finish it but I do that with most books. Now, tell me: Do you know what a Socratic dialogue is?
Me: I think so? Isn’t it a conversation made up of questions to help illustrate a point?
Teacher: Indeed. Can you imagine why that might be helpful?
I’m caught in a vicious cycle with the library down the road.
It starts with a book recommendation online or through a friend. I love hearing these as there’s such a strong signal to noise ratio: I’m likely to enjoy a book recommended by a friend or someone I respect.
First thing I do is check to see if it’s in the library catalogue and put a hold on it. Almost without fail I forget I’ve done this until I receive an email notification that it’s ready to collect. Once checked out it sits on my shelf as I work hard to clear the backlog of all the other books that have followed the same path. As the return date approaches I’m hit with an anxiety that I won’t finish it in time. Meanwhile, non-library books have a much longer shelf life so they sit abandoned as I struggle to plough through the more pressing borrows.
Despite the anxiety, I am grateful that my library loop has led to completing far more books this year than perhaps any other in my life. It means I can make my own hearty recommendations like the following heartiest recommendation of Ministry for the Future by Kim Stanley Robinson.
It’s speculative fiction crammed full of non-fictional content and a profound, heartening look at climate change. It explores what humans could do in the future to handle things a fair bit better than we seem to now. It’s a touch idealistic, sure, but I was taken by the idea of a carbon coin: a cryptocurrency backed by central banks to reward and promote ecological actions.
It’s taken me a long time to finish this podcast interview with cognitive scientist Joscha Bach. I have to listen in chunks as it’s dense; stuffed with ideas and non-sequiturs that take me by surprise. I sometimes consider ratcheting podcasts and audiobooks up to 1.25× speed but 0.5× may be more appropriate to take in the full breadth of Bach’s thinking.
Joscha Bach’s suggestions are aligned with the reality presented in Ministry for the Future: humans will persevere via the technology that will afford us life in the face of climate change.
In the podcast Bach describes cold cooling chains, an existing real-world network that will only become more prevalent. This is the idea that you pass through bubbles of air conditioning in your transit from A to B. Home to the car to work to the grocery store.
The global supply chain already takes advantage of cold cooling chains. Think of produce leaving factories in air-conditioned trucks, delivered to supermarkets directly into their refrigerated depots. As temperatures rise air conditioning use will increase, which, ironically, will contribute to more emissions from the use of air conditioning, requiring humans to adapt in other ways.
Supply chains and manufacturing are an infinitely complex chain of dependencies. Billions of components are manufactured, transported and combined into products every single day. Any minor disruption of that chain can result in surprising impacts down the line.
And supply chains have never been more visible than in the last two years. We’re already reminiscing about the toilet paper panic buying prompted by fears of impending supply chain collapse in early 2020. This visibility was a topic of global interest as we followed the sorry state of the Ever Given stuck in the Suez canal as an evergreen talking point.
And now, right on time for the return of source/target, my home country is hit by (at least two) major disruptions.
Petrol prices in the UK this week have reached their highest price since 2013 and a shortage has taken hold. Drivers are being begged not to fill water bottles with gas station petrol.These shortages combined with the rumors of them provide a vicious cycle of panic buying and finger pointing at the potential it’s a result of the B-word (Brexit).
Meanwhile the price of natural gas has grown 420% on an annual basis in September. This has lead to the indefinite closure of two US-owned plants providing over half of the UK’s food grade carbon dioxide. Aside from use in carbonated beverages it turns out this is the main way animals are slaughtered for British meat markets.
As succinctly put by Rusty Foster at Today in Tabs:
Each link in this chain is a distinct potential sub-disaster, involving risks to the U.K.’s overall heating and energy supplies, fertilizer supplies, and food supplies.
Just like cold cooling chains there’s something clinically compelling about a network made up of sub-disasters.
There’s likely to be an element of the bullwhip effect at play here. In short, the further down the supply chain you go the more likely you are to see larger variability in inventory levels. The result? Wild oscillations of availability and fulfillment.
In the studies conducted with a simple four-step supply chain, small variations in initial demand could lead to order amplitudes of 900 percent only four steps down the supply chain.
These studies demonstrated that the main cause for erratic fluctuations of order sizes and inventory levels in supply chains was actually not the amount of uncertainty of the demand function but the characteristic of the supply chain and the behavior of the supply chain managers.
Regular readers will recall the many submarine cable maps I’ve shared around here. These colourful spaghetti lines are popular as they are the closest thing to a depiction of the internet we’re likely to see. They remind us of the precarious invisible wires and cables that bind us across the globe.
The latest popular map render was created by Tyler Morgan-Wall using a technique called “ray tracing” and has been shared across the very same network it depicts. It’s a fantastic piece of work, all the more impressive as it’s leveraging R, a programming language typically reserved for academic and data science number crunching.
Many were dismayed to see criticism of the work from the venerable Edward Tufte this week. His bizarre response is in contrast to the teachings that he himself has espoused for decades.
I’ve seen this before. Data visualization feedback can be dismissive and hurt the feelings of the creators. In a recent article Cole Knaflic notes that this is a distinctly asymmetric arrangement: it might take days to build a compelling visualization but mere seconds for an observer to dismiss it for an apparent mis-used chart type or color.
As in supply chains, one could think of rash critique by respected community figures as the start of a bullwhip effect of criticism. A small snide comment can be magnified by the characteristics of the medium delivered (in this case Twitter and other echo chambers). Think of an early career developer with limited confidence; noticing the full force of the bullwhip on others at the receiving end of ill-judged criticism. Perhaps they’ll decide it’s not worth putting their work out there.
Ben Jones has written about the Love Fest vs. Shark Tank scale of community feedback. On one end everyone responds with platitudes and positivity, but on the other extreme, observers are tripping over each other to dunk on the piece, or worse, the creator. A common take is to insult the author’s fundamental choice of chart type – choosing from a vast taxonomy of potential approaches – thereby falling into the trap that certain charts are inherently “bad” or never useful.
(I think source/target’s rule of “be nice” pushes it firmly into Love Fest territory.)
Both Jones and Knaflic advocate for a thoughtful, measured approach to providing feedback. A core tenet of each is asking whether the creator is prepared to receive notes at all – this simple step is a hallmark of good management so it makes sense that it’s a good idea when sharing feedback with peers. Another suggested approach is to engage in a Socratic dialogue. By asking questions and actually listening to the response you may uncover consideration or intent that may not have been obvious.
So, what about those “you used a pie chart therefore your visualization is bad” arguments? In his new article, “Data Visualization Has a Taxonomy Problem,” Elijah Meeks encourages data visualization practitioners to break out of their “taxonomic thinking” and consider the ingredients that make up popular chart types with their value and strengths rather than the classification of the chart type itself:
Avoid casual critiques based on chart types – Look more at the whole meal rather than the individual dishes.
When I scan my list of books read this year a few jump out with unbearable titles for reading in public. “The Courage to be Disliked” leads the pack with “How Not to be Wrong” as an insufferable close second.
Nevertheless, both are eye-opening explorations of psychology and mathematical thinking, respectively.
The Courage to be Disliked takes the form of a Socratic dialogue between a student and a teacher. It summarizes the psychology and philosophy of Alfred Adler, an Austrian medical doctor, psychotherapist from around the 1900s.
One notable suggestion of Adlerism is that when speaking to others one shouldn’t “praise or rebuke” – the idea is that these actions promote “vertical” relationships. By praising you are implicitly defining yourself as superior to others. You’re emphasizing that you’re in a position to be judging them. Instead, Adlerism promotes the cultivation of “horizontal” relationships through the expression of gratitude.
A community of peers expressing nothing but gratitude is the purest form of “Love Fest” I can think of. It also sounds impossible – I don’t think it’s reasonable to avoid praise or criticism.
I think the following progression of feedback has merit:
public display of gratitude
private discussion of visualization decisions and intent
suggestions for improvement, if creator is willing to receive them
Like the intricate supply chains that shape our daily lives Adler recommends recognizing that each action we take influences others in our community. This is applicable to data visualization communities but with a twist:
When Adler refers to community, he goes beyond the household, school, workplace, and local society, and treats it as all inclusive, covering not only nations and all of humanity but also the entire axis of time from the past to the future.
Adler’s community even includes inanimate objects, like the ones shuttled through our supply chains. We talk about a wide range of interconnected processes in this newsletter but with this definition Adler took network thinking to the extreme.
A butterfly effect of community behavior leads to supply chain shortages and unpredictable impacts. When a respected industry figure delivers an ill-conceived critique it starts a chain of criticism that descends into bullying and belittling. Snark begets snark.
Thanks for reading, see you in a few weeks!
]]>I used to work with someone who would take weeks off work to do nothing but watch the Olympics. Summer or winter he would close the blinds, order takeout and multi-screen his way through a slew of sporting excellence. He was predictable like that.
Predictability in sport is a big deal. Everyone wants to know who will win various qualifiers, heats and tournaments – and that includes the coaches and athletes in the middle of the action.
Unpredictability has its own allure. A few nights ago a commentator noted:
nobody in the world following table tennis would have expected Germany leading China 8-7 at this point
Surprise events like this are novel and fascinating to audiences. When you’re caught up in the action, it feels like anything could happen.
I’m drawn into the Olympics in a way that surprises me. This year is no different. I was glued to the new skateboarding and surfing events, taken by the frantic energy of badminton mixed doubles and the frenzied paddling of the canoe slalom, and stayed up way too late watching “artistic swimming” - of all things.
My Olympic habits have dovetailed nicely with Wikipedia deep dives. Did you know synchronized swimming was renamed in 2017 to artistic swimming? And that athletes risk serious injuries due to concussion; wearing helmets during practice to avoid any mishaps? There were three sets of twins competing on Tuesday night, one pair were part of a triplet. I wonder what that family dynamic is like?
Glued to heat after heat of incredible athletes my mind wanders… Putting aside the complete impracticality (and potential offence caused by even suggesting it), what sport do you reckon you could qualify at Olympic level in time for Paris, 2024? I’m probably too short for basketball, not great with heights (no diving) and tap out at about 20 push-ups. That rules out the vast majority of events. No offence to badminton players but I suspect that may be my best bet. Of course, I’d have to quit my job to dedicate my life to the way of the birdie – and perhaps move to a different country?
It’s 5am and I’m watching the women’s football gold medal game. It’s Canada vs. Sweden and there have been a lot of games to get us to this point. Looking back at the various qualifiers and knockouts we see a pattern that resembles a graph – this makes sense, in graph theory this network is classified as a “tournament.” It’s a directed graph where the direction matches who won their game.
One reason why these networks are interesting is that there’s a graph theoretical aspect to them. First there’s transitivity: if Sweden beats Canada and Canada beats the USA then it’s logical to say that Sweden is the better team than the USA. Another fact: it can be proven that all tournaments have what is called a Hamiltonian path. This means it’s possible to trace along the links of all nodes in a network without returning to any previously-visited node.
In the social network analysis context we can think of these tournaments as “competition networks” and look to graph measures for more insight. PageRank is perhaps the most famous graph measure and is similar to a much older technique called eigenvector centrality. Google famously used PageRank on networks of hyperlinks between web pages to approximate the relative importance of pages for their search results.
Taking this metric and applying it to tournaments makes a lot of sense. Instead of treating the existence of a link as a vote of credibility, we can take a win against another team as the transfer of a tiny bit of reputability. It’s very natural; you get kudos for beating a reputable, strong team. Football-loving researchers have undertaken a network-science informed comparison between this predictive model and betting markets. Their results?
Relying on large-scale historical records of 11 major football leagues, we have shown that, throughout time, football is dramatically changing; the sport is becoming more predictable; teams are becoming increasingly unequal; and home field advantage is steadily and consistently decreasing for all the leagues in the sample.
Or, to put it as succinctly as the title of the paper: football is becoming boring.
Dressage has been a part of the Olympic games for over 100 years. Prior to 1952 only officers in the military were allowed to compete at the games and it was common to see riders compete across decades of games. This rich history gives us a dense network of interactions to explore.
The use of colours in this visualization of the riders across the decades shows that those who competed immediately after the “demilitarization” of the dressage event didn’t last very long. Twinning colour with various time metrics is a great way to emphasize patterns in networks like this.
(Mildly unrelated but I read this week that all horses in the Southern Hemisphere have the same official birthday of the 1st of August. When I first heard this I thought it was an incredible fact but it’s not like all these horses were actually born on the same day. Imagine the logistics.)
Thanks to recorded championships and Olympic games it’s easy to find data relating to competing athletes. This slickly interactive site does for swimmers what PM_ME_YOUR_SADDLEBREDS did for horse riders. Graphs like this with inferred co-occurence links usually grow in size to be very strongly-connected but the strict cadence of the games across many years mean the resulting network is quite restrained.
I recommend Tanyoung Kim’s three part series for background on the data preparation, visualization design decisions and insights the graphs reveal.
Another sport making its debut in the Olympics is rock climbing. Back in 2017 it felt like everyone was talking about the documentary Dawn Wall. It’s an electric flick showcasing the incredible grit and determination of climbers Tommy Caldwell and Kevin Jorgeson as they plot to conquer the film’s namesake southeast face of El Capitan in Yosemite National Park.
(I heartily recommend two more movies with some overlap in subjects and directors: Meru and Free Solo make a fantastic triplet of climbing movies.)
One fascinating facet of the Dawn Wall is the painstaking lengths Caldwell and Jorgeson took to map out their route to excruciating detail. Rock climbing difficulty grades and rankings are serious business. I find it remarkable that it took this painstaking effort and decades of effort to do the climb but once it was mapped it was completed again just a year later by Adam Ondra. Showing it was possible was somewhat sufficient to allow others to do the same.
(Ondra was featured in a wonderful piece in the New York Times this week – worth a look if you can conquer the paywall.)
The climb route is, of course, just another directed graph. Nodes could be individual holds or, summarizing they’re the individual sections, or pitches. The links are the transitions between holds. It’s a multi-graph: there are often multiple ways to complete the climb. And of course, it’s directed because, for a climber, the only true way is up.
Depictions of routes are known as topos, short for topological diagrams, and often have a hand-carved feel to them.
Watch the Olympic diving and it won’t take long for the camera to pan to a person with an iPad – it’s always an iPad – filming the athlete as they take their dive. It’s amusing to see. Don’t they know the dives are being recorded by state-of-the-art cameras and simultaneously broadcast to millions of people? But training teams and athletes use as many modern techniques as possible to get the competitive edge in their discipline. It’s a key practice for Karate Kata athletes or “karatekas”; recording repeated runs of a routine for review. Over in the diving center, amateur iPad filming provides alternative angles and can be used between heats for last-minute tweaking and tuning.
Network science is used to gain a greater understanding of the mechanics of successful matches, teams and athletes. At its heart badminton, my new sole purpose in life, can be modelled as a bipartite graph. Every time the shuttlecock passes the net a new link is created between the positions of players at that point in time. The overall outcome is that of an “interaction network” between players as a complement to the competition networks of many games. The word “bipartite” means the passes are always between players on opposing teams (passing between teammates is not allowed in badminton, even for doubles).
One published study [pdf] aimed to quantify the performance of elite olympic badminton players.
One pattern uncovered was predictability across medal lines:
The Badminton Stroke Network identified different playing patterns for medallists with the Silver medallist categorised with the less predictable and defined style of play, the Bronze medallist exhibiting the most defined style; and the Gold medallist exhibiting the greatest predictability, but only when losing points (self-networks).
It’s remarkable that medal classification can match such a clear split across play styles. This idea matches other studies, such as this research from the Carlson School of Management:
Hedgcock and his team compiled the largest dataset ever used to study this phenomenon and are the first to use facial expression software. Altogether, they studied medal stand photographs of 413 athletes from 142 sporting events, representing 67 countries, across five Olympic Summer Games between 2000 and 2016.
The results replicated and expanded on what was previously found: bronze medalists were more likely to exhibit a smile than silver medalists, while gold medalists were happier than other medalists.
Interaction network analysis can be applied to other sports. Volleyball game networks extracted from video footage have been used to question a common assumption: that there’s a direct relationship between particular patterns of game activity and whether a team wins or loses.
Using eigenvector centrality it appears that individual attack behaviours are actually more impactful to the game outcome than groups of actions taken by a team. Social network analysis techniques will lead us to stronger predictions across all disciplines.
You don’t have to look too closely to find networks in your favorite Olympic sport. Historical data puts today’s athletic achievements in perspective, visual modelling is paramount to beat the world’s toughest peaks and cutting-edge network science helps coaches and athletes reach the upper limit of their chosen sport.
Now, if you’ll excuse me, I’m off to practice my backhand.
I don’t really want to draw too much attention to this article entitled “Why Young Developers Don’t Get Knowledge Graphs” but I am interested to hear what you think of it. Drop me a line and let me know!
VOSviewer is a new Open Source project released this week with an exceptional take on building co-citation networks. I’ve previously highlighted Connected Papers as a good application of this use-case, but VOS takes things one step further and applies a FlowMap.blue-like approach where you’re able to provide your own data through a URL. The resulting app can be embedded anywhere online and has some unique features and functionality. I’m not familiar with the installable application that inspired this work but there’s a custom layout algorithm included that seems to do a great job with most of the networks.
For all those horses in the northern hemisphere I hope you are having a great summer. We all seem to be in a tentative state of uncertainty but I hope there is truly that ray of light at the end of the tunnel. As you may be able to tell by the tardiness of this email, source/target is still on summer time. I have a couple of projects I’m excited to share in the coming months.
As always, thanks for reading! I’m going to leave you with my top three surprising moments from the Olympics:
The terrifying basketball robot shooting hoops and taking names.
The time a cameraman panned to a cockroach instead of the final minutes of a women’s field hockey match. (Remarkably not the most notable insect of Tokyo 2020).
The adorable tiny cars “Field Support Robots” which exist for the sole purpose of shuttling equipment around the games.
See you in a few weeks!
]]>Edward Tufte has informed and inspired data practitioners for, literally, decades. A lot has been said about his forthright opinions on information design and one can draw a direct link between his work and the attention to detail many aspire to when they create visualizations for consumption.
His original triumph “The Visual Display of Quantitative Information” was self-published in 1983 after potential publishers wouldn’t grant the creative control Tufte required. It’s a stellar introduction to a meticulous craft, even with diminishing returns from his later books.
(Tufte is often referred to by his initials, ET. I can’t bring myself to do this as I can’t shake the image of the famous globular alien claiming credit as a lauded design professional)
This week I’m going to apply Tuftean thinking to what I see as a vital visualization and thought device – interactive network graphs.
The most viral ideas from Tufte are those with catchy titles like “data-ink ratios”, “lie factors” and “sparklines” – these are all valuable concepts but minimize the value of artistic license. To take a purely Tuftean view, creative license is unacceptable. All design decisions should be made in service of the data and scientific process. This is a noble aim but somewhat rigid.
I found the discussion in the latest edition of Data Stories to be extremely enlightening. Hosts Enrico Bertini and Moritz Stefaner are joined by guest Sandra Rendgen to discuss how Tufte’s work is a direct response to the informational reality of the late 20th century – one where historical data visualizations were compiled through extensive travel and painstaking curation of rare manuscripts, a far cry from the Google-powered hive mind of today. They also described a prevalence of underwhelming visuals found in newspapers and media limited by analogue typewriters and printing presses.
(As a mid-millenial referring to the 1980s as “late 20th century” chills my bones)
Interactive, immersive data visualization experiences are common today and stand in contrast to traditional, flat typesetting and printed materials. Tufte’s pithy description of “overview first, details on demand” is something we take for granted in most interactive data visualizations we see today.
We don’t have to look too hard to find relevant material in Tufte’s work. The chapter “Links & Causal Arrows” in “Beautiful Evidence” is a goldmine for network thinking nuggets.
We’re very comfortable drawing circles or boxes around the nodes in our networks but is this really necessary? As Tufte notes “maps don’t put boxes around city names” – why not use the space taken up by the shapes to display extra information or draw attention to the content? I’ve used this to great effect by letting typography and the words themselves be the star players in a visualization – see my lyrics and typography pairing graphs for examples of this.
Leaning heavily on cartographic standards with a philosophical angle, Tufte goes to great lengths to discuss the core question posted by network graphs: “what exactly do the arrows mean?”
A common suggestion from Tufte is to abolish any use of “chart junk,” that occlude data and distract from the point you’re trying to make. So what is network graph junk?
On the last point I advocate for extreme restraint for network animations. Ideally constrained to a few milliseconds – slow enough to allow the user to understand what’s changing but fast enough to get out of the way. There should also be a logical flow from a user interaction to an animation:
Find a node with a particular name -> grow and shrink it
Run a new layout -> animate nodes to position.
A strict reading of Tufte implies that the use of semi-decorative icons and symbols is inappropriate in network visualization. This ignores the need for visualizations to be memorable and attractive. There’s a hope that if you follow recommendations your graph will be attractively minimalist but I don’t think there’s an issue with some colors and icons.
Another famous term used for chart junk is that of The Duck – “When a graphic is taken over by decorative forms or computer debris.” The term brings to mind some of the infographics we discussed in source/target #33.
The biggest duck of them all in graphs is that of the hairball, an ornament of chaos that intends to impress rather than reveal.
One of my favorite takeaways from Tufte’s Qualitative Design was the idea of showing additional variables, not just by adding, but by removing ink from the page. The instrumental example is that of truncating grid lines to denote the range and median values in your data (find a good explanation of that here.) I had never considered this as an option!
A more extreme example in a network graph could be to completely forgo the display of links in a network and let the (now invisible) connections pull the network together. This places severe emphasis on the network structure in a way that may be beyond comfort. Nevertheless, depending on the structure of the data this could be a valuable approach to reduce clutter.
With interactive, dynamic network views we can afford to make bold design decisions. Mouse clicks or touches on network can reveal insights on demand.
On the topic of layouts, network graphs have the unusual property of allowing absolute flexibility in their presentation. You can literally place nodes anywhere and those positions may tell us something! It doesn’t always illuminate, but the possibility is significant. In practice we need to be careful to choose a complementary layout – sometimes a challenge if we don’t have the hardware or patience to tune them.
Tufte argues that graphical excellence is nearly always multivariate, that is it shows measurements across multiple axes. In our layouts this is second nature – the output of a graph layout algorithm is the ultimate multivariate calculation; forces and constraints define the network structure and informs our understanding of the underlying data.
For extreme multi-variation we also have a number of well-used tools at our disposal, the size and color of nodes and links.
Perhaps we could do better. Why not introduce more rigid constraints in service of an incisive presentation? Krzywinski et al present a persuasive argument that Hive Plots are a particularly good example of a network view with rigid constraints that maximize utility.
Again, interaction affords us some flexibility. Like Tufte’s picture book flaps, we can introduce comparisons through a quick touch or hover over a network area. We can also avoid references to distant footnotes and figures by embedding visuals right there rather than in a separate page/widget.
In the Data Stories podcast I was interested to hear the argument that the influential nature of Tufte’s work likely obstructed other voices and opinions on information design. It’s true that we are very much in debt to Tufte as a leading voice advocating for truthful and concise information design but his work has a surprising lack of reference to psychological literature and actual studies. As noted on the podcast:
All about science apart from his science about data viz!
After ploughing through each of Tufte’s volumes I look forward to learning from those he inspired. As we embrace new mediums, technologies and studies it’s helpful to seek out on the best practices that will bubble up throughout the 21st century.
I’m going to lean into my newfound summer cycle of writing and stick to a 3-weekly (triweekly?) cadence for a bit. As you’ve probably seen on the news it’s been pretty hot here in British Columbia. I’m safe and sound (despite being a no-A/C-hold-out) but it’s awful to hear of the wildfires spreading throughout the province and US states.
A couple of the links above are from Nightingale, the journal from the Data Visualization Society. They just launched an optional membership program and I’m happy to support the organization as they continue to grow and do great work. I’m also pleased to see that the fabulous writing output will be freed from the Medium paywall.
I was on a call with someone last week who told me how they were juggling a full-time job with the final stages of their ten-year-long PhD journey. I asked that fateful question: “what is your thesis on?” Over Zoom I heard the appropriate resigned pause followed by a weary intake of breath.
Their description reminded me of “cartographic generalization”, mentioned in the last source/target. “That’s exactly it” he responded, “why were you reading about that?”
It was my turn to breathe before a bashful “well, I write this little newsletter…”
After listening politely and another short pause he said “Oh huh, so this stuff is, like, your passion? Wow.”
I was surprised by this response. Despite starting this newsletter to reflect on the passion on display in the world of network visualization I had never considered that it was also my passion. That’s something else I’m going to lean into.
Thanks for reading source/target, my triweekly newsletter on my passion: graph visualization.
]]>If you were feeling charitable I suspect I could persuade you that everything is an infographic.
Your standard internet infographic of 2021 aims to be a number of things. It’s an eye-catching, sharable, bite-sized piece of content, typically confined to a tall—very scrollable—rectangle. It offers a glimpse into a topic through some core elements, helpfully dragged-and-dropped into the Canva page of life.
To put it simply an infographic is a mash-up of information and graphics. Let’s take this broad classification and run with it.
Infographics are a bloviated term for diagram – a visual depiction of a process or situation. If infographics are diagrams then diagrams are infographics. Consider the instructional manuals from IKEA that aid DIY-ers with their modest construction projects.
There’s an argument that if you can remove the graphical elements of an infographic without creating any confusion for the viewer then it was never an infographic in the first place. You could build a Billy without the labels in an IKEA guide because the visuals are so informative, though it may be a challenge without any numbers, arrows or exclamation marks.
Paul Ford’s fantastic article on military infographics highlights a number of “incredibly cool and simultaneously insane” maps and networks used in US military training materials. The complexity in some of these is inspirational and terrifying – there’s no doubt they’re infographics though.
Road signs are infographics. Often distilled to a simple pictograph, their presence by a road or intersection is both graphical and informative. You could quibble that road signs are such an atomic piece of information that they don’t count as an infographic. There’s an assumption that solitary graphics can’t reach a threshold of information to deserve that term. “A pie chart isn’t an infographic” they’ll sniff. But frankly, who cares? It’s a graphic and it’s informative. Case closed. Furthermore how would one set that theoretical information density threshold to qualify? One datum per pixel? It’s preposterous.
In fact, a lot of consideration went into the design of the humble stop sign over time. This included some intriguing thoughts on the shape of the sign itself, as illuminated by the podcast 99% Invisible:
“The recommendations were based on a simple, albeit not exactly intuitive, idea: the more sides a sign has, the higher the danger level it invokes. By the engineers’ reckoning, the circle, which has an infinite number of sides, screamed danger and was recommended for railroad crossings. The octagon, with its eight sides, was used to denote the second-highest level. The diamond shape was for warning signs. And the rectangle and square shapes were used for informational signs.”
Even the number of sides hints of the information provided in the road sign infographic.
While we may take this for granted today, Even the “number of sides” is yet another piece of information in the road-sign-as-infographic.
For most websites the use of infographics is intended to prompt surfers to share a watermarked image on social media and similar with the aim of attracting more eyeballs.
The vast majority of infographics are static which promotes screenshotting and bit rot as various image compression algorithms take over and start the process I can’t be the first to coin as “dankification.”
If infographics are basically treated as memes that probably means memes are infographics.
What about art? Paintings? Photos? That’s a tough one. For the petty purposes of my argument I’ll say these are definitely infographics but the true answer is probably the fateful “it depends”. An observer with an awareness of materials or medium may find a painting to be as informative as your basic viral infographic. Content and composition in a photo may betray facts (aka information) about the subject in focus.
In the Renaissance, patrons used art to show off particular knowledge associated with wealth or solidify their social status. Commissioned artwork often depicted stories understood only to those with education or a religious upbringing. During Europe’s “Age of Enlightenment,” a neoclassical style became hugely popular after the discovery of Pompeii and subsequent popularity of the Grand Tour, which saw wealthy tourists return home with “souvenirs” from Antiquity. Greek and Roman aesthetics (think columns, vases, togas, and weird dolphins) conveyed carefully curated imagery that said a lot about the contemporary ruling classes’ ideals and interests.
Andrew Bird’s recent album – with the firmly-tongue-in-cheek title “My Finest Work Yet” – has the cover art of the artist in a photo-perfect match of a Jacque-Louis David painting from 1793. The original painting depicts Jean-Paul Marat,
a radical journalist during the French Revolution and one of the leaders of the insurgency against the Crown. He took frequent medicinal baths to soothe painful skin infections, and he wrote most of his most famous works while soaking in his tub. That’s where he was assassinated by the conservative royalist Charlotte Corday; shortly after, David painted him as a martyr, a stab wound to the chest stained his bathwater red. (source)
If you know what you’re looking for there’s a lot of information in the painting – it’s an infographic of sorts, but a subtle one. When an observer has knowledge of a work’s context – the artist, contemporary events, popular taste – this infographic can convey whole new levels of meaning. By invoking the original, Bird takes all this information and applies a fresh layer of intent through its use.
This stirs up memories of our discussion of medium vs. message back in #16. Just like the placement of our roadsign transforms a simple symbol into vital communication, the aesthetic reference in a painting or photo can tell us a lot.
(At this point I’ll refrain from the argument that “cave paintings are infographics” and point you to this wonderful Twitter thread. )
Okay well what about books? Are they infographics? Is that a resounding “no” I hear? I assume I’ve started to lose you here. It’s hard to argue that books are infographics. Their covers perhaps, with an on-trend cover design, title, author and blurb (perhaps even some notable testimonials) – now that’s an infographic. But why not the content of the novels themselves?
The more information we have takes us further from that summarized snippet of content. Infographics don’t have a minimum requirement for information shared; it is more accurate to say there’s an upper bound. Too much information and it’s no longer digestible with ease. This definition is easier to formalize through something like Kolmogorov complexity – if we can’t describe an object with a simple-enough algorithm then it’s probably not an infographic.
The argument for mass over-classifications aside, when I first mentioned the word infographics there was probably a particular you had in mind. There’s a significant downside to these colorful bite-sized morsels of information.
The vast majority of infographics are supplied in image form which renders screen reading technology without unreliable OCR inert. Infographics are far from accessible.
It’s common to see small print references at the bottom of infographics but this is often a few short words of a datasource or a long URL that no one is going to take the time to type out.
Infographics often prioritize form over function – for many their accuracy is an afterthought. Online their context is at risk with the very first share. In geospatial mapping there’s a concept of cartographic generalization where aspects of maps are summarized to reflect the current zoom level. This can lead to misrepresentation or confusion.
(An interesting topical, semi-related recent story.)
As we’ve learned through a quick dive into art (and road-sign) history, context is not always very obvious to a current observer.
Ultimately you’ll probably win the argument – my broad acceptance of most visual media as “infographics” is a bit of a stretch. Nevertheless it’s fun to think about!
As the media’s use of information visualization has become more commonplace our bar for what counts as informative and insightful is lowered. Too often short-hand for “researched and accurate,” infographics – whatever form they take – mimic detailed data analysis and help to fast-forward to the takeaways, with a significant cost.
Oh hey, it’s been a while eh? Due to various, mostly boring, reasons it’s been a challenge to get source/target out the door. It doesn’t help that I had the phrase “everything is an infographic” stuck in my head and couldn’t move on until I had turned it into, well, something. I hope to stick to my usual biweekly cadence from here on out.
If you wish to support this little project I’ve opened up “Pay-Whatever-You-Want” subscription option with my newsletter provider Buttondown. Just stick your email address again in this box and you’ll make me very happy.
Ever look around and consider the vast number of connected devices sharing bits and bytes around and passing through your home? I’m a staunch defender of the humble light switch but even without wifi-enabled lightbulbs there’s glut of constant, invisible signals and traffic all around us.
Invisible Roommates is a delightful exploration of that idea: what if you could see the invisible networks of communication as they ping and pong? There’s unfortunately no production version but it’s a neat use of Augmented Reality tech. For more background check out the commentary here.
Since I learned the word “rhizome” to describe tree-like, organic network structures I’ve struggled not to drop it into casual conversations as if I’ve been using it my whole life.
There’s no such thing as a tree (phylogenetically) explores what makes a tree a tree and provides a rich library of fancy biology words to add to my collection.
See also TimeTree, a public knowledge-base of the evolutionary timescale of life.
I had some great responses to the last newsletter!
On tracing the process of writing a novel, Duncan Geere shared this ace Chrome Extension for doing exactly that.
(Duncan and Miriam Quick just launched a new podcast on the topic of data sonification that’s worth a listen – it’s called Loud Numbers. Naturally I’m curious what a sonified network would sound like.)
Another timely extension I stumbled upon was Vandal – a wonderful little tool you can use to quickly jump to archived versions of websites from the Wayback Machine. I’ve already used this countless times.
The venerable Dr Albert-László Barabási wrote a piece for the New York Times on the invisible network of NFT transactions. He and his team minted and ultimately sold an NFT of the NFT network itself. I probably missed the boat on that one. Is it too late to do the same with one of my CryptoPunk networks? Probably.
Thanks for reading, see you in a few weeks!
]]>I’ve never asked a question on Yahoo! Answers and now it’s too late. The site was put into read-only mode and was wiped clean off the internet on Tuesday.
It’s strange when sites that you consider to be “part of the furniture” of the internet disappear. I wasn’t an active user, only an occasional viewer, but the mere existence of Yahoo! Answers in my search engine results was comfortable and familiar.
Yahoo! is notorious for killing off the internet’s darlings. Back in 2009, Geocities – the formative free website host and third most visited site on the internet in 1999 – was dumped. Approximately seven terabytes of hand-carved websites were lost.
How are we supposed to feel when a vast array of individual creativity, aging back 16 years, is deleted from the internet without care or ceremony? Should we accept it as inevitable or be seeking to preserve and protect?
Manuel Lima is a renowned speaker and author who specializes in information design with a clear passion for network graphs.
Lima’s site Visual Complexity is a compendium of visualizations going as far back as 2005. I finally took a closer look a few weeks ago. It’s a smorgasbord of color and networks with hundreds of thumbnails to draw you in.
Over on the “links” page there’s a dense list of hyperlinks to related blogs and topics. It’s reminiscent of the original, human-curated Yahoo! directory of links. Faced with a wall of potential gems I did what I always do: ran down the page, and opened a new tab for each link. I was ready to find my new favorite projects and sites.
But I was dismayed to find that many of these links were dead. Defunct. Kaputt. I was repeatedly 404’d and, in a few instances, redirected to domain squatted sites and, in one case, an extremely NSFW site.
It’s not Lima’s fault, internet links are brittle and prone to breaking. It’s common for me to find a link to a promising page on network analysis from around 2010 and find it eradicated from the 'net.
The Wayback Machine from the Internet Archive is a valuable resource in these circumstances but firing it up and pecking through the years to find the right version of the site sucks the joy out of that which had caught my eye.
Though it’s fun to check out website design of the 90s and 00s,after a while of waiting for pages to load and crossing fingers I’m prone to getting distracted and losing interest.
Bit rot is a term that describes the degradation of digital media over time. You and I have first hand awareness of bit rot: the .DOC file that wouldn’t open after conversion to .DOCX, the 32Kbps MP3 that you can’t believe you ever listened to, the snarky meme that’s been repeatedly screenshotted to oblivion…
Canadian author and artist Douglas Coupland introduced me to the term in his essay of the same name. He notes:
A friend of mine works as an archivist at a large university that collects rare documents of all sorts. She tells me that a major issue with collecting documents that were created after about 1990 is that the really desirable “papers” don’t physically exist–or rather, they do exist, but they’re lying comatose inside a 1995-ish laptop. Not only that, but the structured electrons that constitute any given file inside that 1995 laptop are drifting away as electrons apparently do. Depending on a laptop’s architecture, its drive will erase itself at a half-life rate of about fifteen years.
As Coupland notes, bit rot isn’t unique to purely online, digital media, I sometimes wonder about the CDs and DVDs burned in the late 90s. Back then the humble CD was a futuristic disc of dense bits and bytes that we assumed would last forever. We now know this to be foolish and risk losing a generation of media to the whims of a ring of plastic and shiny layer of aluminum.
Paul Gordon, Senior Film Conservator at Library & Archives, Canada featured in a recent documentary, “Four Days at the National Preservation Centre” laments the challenges of bit rot.
Hundred year old hard drive? I don’t think it would spin up. And would you be able to read the files on it? I don’t know. We have trouble reading the files from the 90s and 80s as it is.
Bit rot on the World Wide Web is the link rot we endure when clicking on old hyperlinks to dead destinations. This in turn leads to graph rot: what was once a densely-connected ecosystem of interests starts to lose it’s kinship and connectivity.
Many have studied this behavior in network science, we can explore what happens to graph properties when nodes are deleted and links are no longer there. This so-called “node deletion” has wide-reaching applications:
In the case of the broken links on Visual Complexity the result is the diminished spread of ideas coupled with the impression that our community isn’t as tight-knit and active as it once was.
It’s not a coincidence that the most interesting and dynamic graph projects are driven by data scraped from websites and resources while they’re still active.
Take the podcast example I shared a few weeks ago, this is scraped from iTunes to create podcast neighbourhoods of interest.
Clubhouse Social Graph, a new project from Travis Fischer, Tim Saval, and Tokyo does something similar: it draws from the official APIs of the buzzy new audio-only social network app to build up interactive graphs of users and their followers.
Profile headshots in an interactive, animated graph like this are a shorthand for the people and connections that make the platform as compelling as it is.
Another coincidence: both of the above examples use the same core library to show their graphs. It’s easier than ever for developers to build semi-scalable applications for online consumption, I’ve come to build these again and again. It’s ridiculous but I’m going to coin a new term for these apps: “scraphs” – a blend of scrapes and graphs.
As side projects and guerrilla marketing campaigns these scraphs are unlikely to be around forever. Yet interestingly, the scraped data will more likely live on and have a higher chance of being archived because it has already been saved privately.
When using sites and services we have an awareness that there’s data lurking in the periphery (in site databases, APIs, storage, etc). In the case of Geocities and Yahoo! Answers all that data representing interactions with and between real humans is gone.
Funny story: without thinking I went to scrape Yahoo! Answers to build up a network graph of related questions for this newsletter, only to find it was already gone. The Archive Team planned to archive as much of it as possible but with little warning or support from Yahoo! this was set to be a tall order.
In his essay Bit Rot, Douglas Coupland describes a world where each draft of a novel is retained and archived for posterity:
Well, here’s a thought: many writers email themselves a copy of their novels at the end of every day, using the cloud as a backup mechanism. Imagine if one were able to take all of those daily backups and then place them into a sort of stop-motion animation. One could see how an author constructs their work by looking at words per day, words cut and pasted, paragraphs deleted, items suffled about, typos, notes to self, and then, when the editing process begins, one could watch how a novel is hacked and pruned and reshaped—an organic process displayed in a dynamic organic mode. This would be a fascinating new way of appreciating a book’s creation—a visual language to describe a verbal process.
I’d love to see that scraph.
Trails of Wind: the architecture of airport runways
New proof reveals that graphs with no pentagons are fundamentally different
SkyKnit: when knitters teamed up with a neural network
“Did someone say Emoji?” from Jennifer Daniel is a favourite new newsletter of mine. I enjoyed this recent edition, especially the color variance exploration of emoji hearts across various platforms. I had no idea that these weren’t standardized and that perception of colour spaces can vary across different cultures.
Bruno Gonçalves launched a newsletter earlier this year to provide background on some graph data science. I have plans for something not-dissimilar to this — it’s nice to see how someone else approaches it.
I love the eclectic, home-grown aesthetic of this video game level scraph. Scraping data from the Fandom Wiki for the fan-made sequel-of-sorts to a popular RPG, this graph is a cluttered but endearing map of the various game levels and creators of each.
Things have been a tad slow around here as I’ve been prepping my talk for BSides Vancouver next week. Pre-recording talks for online conferences is a joint curse-blessing: there’s no risk of live performance glitches but it’s hard to break the cycle of slide tweakage and repeated audio takes.
Thanks for reading, see you in a few weeks!
]]>It’s Leonard Euler’s 314th birthday today. In network circles the grandfather of graph theory is perhaps best known for his 1735 solution to the problem known as the Seven Bridges of Königsberg. Using novel graph theory techniques Euler was able to show that a route across the seven bridges without crossing the same one twice was impossible.
Euler’s work was fundamental for graph theory and I find it delightful to overlay the original problem statement over the roads and bridges that make up Königsberg, now Kaliningrad, today.
The problem was inspired by real bridges across real rivers, connecting real roads, and it sparked a domain that routinely builds roads and connections from messy, raw datasets.
In the world of visualization, there’s a tendency to put network graphs on a pedestal and expect some “aha” moment from the observer.
There’s also a tendency towards visualizations that look novel or flashy. This isn’t unique to network visualization but in a rush to create something eye-catching we trip over ourselves.
I’m guilty of this! Building clear graph visualizations requires constant re-evaluation of the visualization purpose and the intent of the data presentation.
In Edward Tufte’s influential book “the Design of Quantitative Information” he suggests that effective data presentation should allow for viewing at three different levels.
What is seen from a distance, an overall structure usually aggregated from an underlying microstructure.
What is seen up close and in detail, the fine structure of the data
What is seen implicitly underlying the graphic
His book was written before the explosion of technology and tooling that enable intricate, interactive visualizations today. New approaches come with new pitfalls but Tufte’s core principles still resonate, especially for network visualizations.
There are many data analysis projects that meet Tufte’s conditions for effective design.
Historian Cameron Blevins has released a dataset tour de force and accompanying website tracking the spread of post offices across the USA.
Between 1848 and 1895 the federal government wove together a “gossamer network” across the West, a sprawling and fast-moving web of post offices and mail routes that connected the region’s far-flung settlements into a national system of communications. The US Post was the underlying circuitry of western expansion.
This circuitry tells a story of colonization through thousands of dots placed across a map of the USA, sliced across pivotal moments in time for the country.
Most of the maps omit the roads from the time so it’s up to the observer to imagine the route between the post offices - this isn’t hard to do! I love Dan Cohen’s summary of the spread of post offices in his newsletter.
Blevins’ data highlights, perhaps better than any other evidence, how the westward expansion of the United States was strongly tied to state power rather than individual or local activity by European settlers, as it was the Post Office infrastructure (linked, of course, to the military and other levers of the state) that enabled the kind of communication network and support lines that eventually led to the seizing of native lands. (Just look at those tendrils shooting west from the Mississippi.)
Cohen also highlights a complementary project from Justin Gage that explores the correspondence and visitation networks of Native Americans.
Despite colonial control and confinement, western Native Americans spread ideas and information important to them along vast communication networks.
The map of Königsberg above is from source/target favorite Andrei Kashcha’s City Roads project which allows any cityscape to be visualized in a bare, zoomable form of roads, bridges and paths.
Kashcha’s latest project eschews real roads in favour of inferred connections across an artificial continent representing online communities. Map of Reddit is the confluence of all the stuff he does best, leveraging real social networks, smart technology choices in impressive interactive visualizations.
If you click a subreddit in the map you get a links that bridge to other subreddits with a strong crossover of comments. Once you start exploring it’s hard to stop! This application gives the best indication of the vastness of a social network like Reddit than any other attempt I’ve seen.
I’ve touched upon a real-world routing application of graph theory, the use of network analysis for deep historical research and the emulation of these techniques to explore online communities.
A recent post from Matt Webb highlighted something a little different: maps of urban layouts extracted from ‘mental images’ through sketches and interviews.
Web draws from lecture notes on the work of Kevin Lynch and highlights one example of Boston:
Here’s an example of one of Lynch’s maps: Boston.
What you’ll see from that map is that it’s totally recognisable as a city, and you could totally use it to navigate, but it’s also what you would scribble on the back of a napkin. It’s also way more memorable. If you gave me a glimpse of Boston from Google Maps and asked me to sketch it for someone else, I can’t imagine it would include any of the salient details. But given a Lynch map, I bet I could pass on the most relevant bare bones, just from memory.
Webb notes the use of five elements to build up an image of a city: Paths, Edges, Districts, Nodes & Landmarks. Sound familiar?
I really like this thread of graph drawing experimentation – it shows how difficult it is to balance aesthetics with clarity
A six-second GIF illustrating herd immunity. This is originally from 2017 but has found a new virality in 2021
It’s trendy for those with the time and inclination to wrestle ownership of data from the giants of technology like Google. There are a number of vendors catering to this crowd but the approach remains inaccessible for your average consumer.
This map of personal data infrastructure from @karlicoss provides a fascinating look at how complex it is to undertake this ownership.
Dig a little further on their site and you’ll find an elaborate, home-grown personal knowledge management implementation in the same vein.
Oh hey, there are quite a few events coming up I should highlight:
There is so much diversity when it comes to the ideas and people working in the graph space. It’s great to see when conference line-ups reflect and encourage this diversity. Lots of work still to be done. If you know of an event that can use some extra attention, please reach out!
I’ve spent a lot of time over the last month working on the new home for source/target and I’m pretty proud of the result. With a dedicated site I have a space for the newsletter on the web and am better able to showcase projects and related materials to the wider world.
Take a look and let me know what you think. I have a laundry list of tweaks and fixes to make but I’m relieved to say the bulk of the work is done!
Thanks for reading, see you again in a few weeks!
]]>For everyone sick of NFTs and the Blockchain, don’t worry, we’ll be jumping back into graph fundamentals next time.
In 2008, 36 dairy farm owners, milk traders and purchasers were arrested for their direct connection to a compromised supply chain of milk and infant formula in China. Their cost-cutting actions had fatal consequences: six infants died and many more were hospitalized. An estimated 200,000 people were affected by the spiked milk products.
In “Blockchain Chicken Farm” author Xiaowei Wang mulls over whether this aberration in the supply chain could have been avoided through The Blockchain, a shared immutable ledger of transactions and transfers. Their conclusion:
The contamination came from farmers, driven by economic pressures. Blockchain wouldn’t have helped prevent falsification, but it would have made the milk more expensive.
As with the majority of apparent innovations in this area, I suspect this is another case of a solution looking for a problem. As per the illuminating “Blockchain, the amazing solution for almost nothing,”
Out of over 86,000 blockchain projects that had been launched, 92% had been abandoned by the end of 2017, according to consultancy firm Deloitte.
Why are they deciding to stop? Enlightened – and thus former – blockchain developer Mark van Cuijk explained: “You could also use a forklift to put a six-pack of beer on your kitchen counter. But it’s just not very efficient.”
The main character in the NFT story is the blockchain graph of transactions. This graph is fueled by network effects leading to a surge in token “value.”
Due to the terminology and the complex “mathematical puzzles” that support their apparent magic, the blockchain and NFTs are opaque and have a firm barrier to entry. Platforms like Zora and Open Sea make the process of minting and speculating easier. Their process may be built on “proof-of-work” but there’s no work required on the part of the user. So while the barrier is diminished, the understanding still eludes.
These platforms are simply sweeping the details away—I learnt little about the minting process from using their arcane command-line APIs and web interfaces.
This combination of a general lack of knowledge and increasing options for accessing the blockchain has led to a tidal wave of get-rich-quick enthusiasts. Innovators, entrepreneurs and multi-level marketers crowd these new channels, trying to push their coin.
In most cases, and for most consumers, the blockchain is indistinguishable from a proprietary database held by a private entity. Xiaowei Wang has more to say on this:
Under governance by blockchain, records are tamperproof, but the technical systems are legible only to a select few. Even exploring transactions on a blockchain requires some amount of technical knowledge and access.
The technology of record-keeping has become increasingly more complex. This complexity requires trust and faith in the code—and trust in those who write it. For those of us who don’t understand the code, trusting a record written in natural language on a piece of paper seems at the very least a lot clearer.
Sure, I could entrust my transactions to the globally-warming server farms humming along in a bid for the utopia of a decentralized currency. But tales of loss, social engineering and volatility can make traditional banking look safe in comparison.
Let’s see how to apply graph thinking to better understand NFTs. I introduced CryptoPunks last week: the pixelated characters often thought of as the original NFT.
These unique, collectable characters were initially released for free to anyone with an Ethereum wallet.The punks themselves are diminutive pixelated characters that now fetch tens of thousands of dollars. CryptoPunk #6965 was sold last month for 800 ETH – that’s around 1,500,000 USD.
I’m going to explore these pixelated pals through three distinct graph models.
Playing heavily on the collectibility of the tokens, each punk has between zero and seven accessories and can be of one of nine types.
Attributes range from the rare “Alien” (fewer than 0.1% of punks) to the common “Male” (just over 60% of them). The most common accessory is the earring: nearly 25% of punks have an earring.
The record-breaking sale I mentioned last time has since been eclipsed by the purchase of the only existing Alien with a pipe, #7804:
The punk characteristics were originally randomly generated – building a network graph of all 10,000 punks connected to the accessories they hold, we get an extremely dense, mostly-connected view.
Look closely and you’ll spot there are two connected components, 9997 punks with characteristics and 4 without any characteristics at all:
I don’t get much from this visualization although it’s fun to zoom and pan around the interactive version. We get a clearer perspective when removing the links:
As usual when working with densely-connected data, this large graph provides us a helpful place for an initial look but is really best-suited as a springboard to dive deeper.
An alternative network take on the CyberPunk story is a transaction graph showing each token since their inception. The majority of these transactions are sales or transfers of the tokens from one ETH wallet to another. A lot of the transaction graphs look like this one:
These little graphs confirm the explosion in the punks’ value. It’s not unusual to see a fat link at the far right of the network corresponding with the incredible prices fetched for the punks on the ‘chain. In the example above, the original sale was for .28 ETH – $72 back in 2017 – before being sold for 9.90 ETH earlier this year – now worth roughly $18,000. That’s almost 250 times the value!
Scrubbing through the networks for each punk the first thing I notice is that the vast majority of punks have very few associated events. The raw data corresponds with this: by my calculation fewer than half of the punks have only ever been transferred once.
These links aren’t just transactions: some are alternative events in the history of each punk. I was dimly aware of “wrapping” in the crypto world. In the context of CryptoPunks, this means the use of an alternative currency to trade the NFTs.
When a Punk is wrapped our tree-like graphs get a little more complicated. The “wrapped” node in the following example is a sort of circular diversion, a gateway to an alternative blockchain. Without weighting the links by points in time my graph layout doesn’t handle this diversion very well.
Ethereum is pseudo-anonymous, which means that it’s totally possible that multiple wallets with unique IDs or accounts could be controlled by one individual. The matching of wallets to individuals isn’t an exact science, but our graphs start to show patterns on closer inspection.
Let’s remix our data one last time to show interactions between punks and wallets – this is a mix of the two approaches we’ve explored thus far. Here’s one example account along with the CryptoPunks purchased or sold:
I’ve only explored two “hops” (account to punk to account) to give a flavor of the connectivity here. Most account graphs are very dense which highlight the “gotta catch ‘em all” collectible tendency for CryptoPunkFans.
I’m scratching the surface of exploration, there are many further avenues of investigation:
Suddenly, what was opaque and intangible comes into focus. We have the inherently collectable, rarity of attributes that make it likely that someone will get attached to a punk. We can trace the type and accessory combinations that have resulted in higher value over time. Finally we gain a greater understanding of the usual patterns of accounts buying and selling the NFTs since 2017.
Using graph techniques to interpret and visualize the blockchain makes the actual NFT transactions much more transparent. This is ironic when you consider the decentralized and semi-anonymous nature of the blockchain – partially driven by the aims of untraceable cash. When we look at this in contrast to the traditional art world, notoriously rife with fraud and money-laundering, the inherently digital nature of NFTs may actually allow analysts to spot patterns more effectively than with physical transactions. All without standing up from their computer chair.
Thanks for reading, see you again in a few weeks!
]]>It’s a kind of game and it’s a game where you have points.
This is how Vitalik Buterin, co-creator of Ethereum and Ether, the second-largest cryptocurrency answered when asked to define money. He goes on:
And it serves a lot of useful functions. And so it kind of just survives in society as a meme for thousands of years.
Something that hasn’t existed for thousands of years are NFTs—non-fungible tokens. For those unfamiliar with the mushroom-sounding F-word, something fungible can be interchanged with other goods or assets of the same type, money itself being the canonical example. But by saying these tokens are non-fungible we recognize that these tokens are unique: they can’t be replicated.
The most formative example is CryptoPunks, the influential non-fungible token first seen back in 2017. These unique, collectable characters were initially released for free to anyone with an Ethereum wallet.
The punks themselves are diminutive pixellated characters that now fetch tens of thousands of dollars. CryptoPunk #6965 was sold last month for 800 ETH – that’s around 1,500,000 USD. How did we get here?
Technologies and topics bubble under the surface of general popular culture until a critical mass of activity punctures through the discourse.
If you read anything at all you’ve likely had your fill of articles describing NFTs. The most accessible of these stick to surface-level commentary and are broadly of two themes with a rough split in coverage:
Neither treatment cuts into the why of NFTs, what’s the appeal? Why bother?
There are (at least) four broad categories for you to choose your own NFT adventure.
(Oh and if you are looking for a primer, try this coverage in Verge or this good glossary of NFT terms.
When newspaper headlines tout gigantic financial reward for a digital asset it’s hard not to see dollar signs. If something doesn’t exist in the real world then surely it should be easy to obtain? Put differently, if others are creating money out of thin air, why can’t I?
In NFT lingo this creation or commitment is known as minting. It certainly feels that if people are minting and selling such banal things as tweets then the barrier of entry is remarkably low. An email? This newsletter? Your grandmother?
As with Tulip fans in the 17th century or, say, GameStop fans in 2021, it’s easy to get caught up in the rush.
One thing that differentiates the NFT speculation over tulips, meme stocks and beanie babies is the overwhelming influx of “cash” from multi-millionaires and/or billionaires with literal, virtual cash to burn.
Looking again at the Ethereum ecosystem, many who participated in the original “pre-sale” of the currency and bought in at the pre-mining event in 2014 now find themselves with mountains of Ether, the currency fuel that powers the Ethereum network.
Seven years later, early-adopters are faced with the option of a hefty tax bill upon exchanging or tossing some loose change (bytes?) to an artist. Many crypto-holders are choosing the latter. It doesn’t take much imagination to see how money laundering and similar nefarious motives might also be “appealing” options.
But one thing is certain. The non-fungible nature of these tokens offers something a number on a cryptocurrency tracking screen struggles to match: exclusivity.
For some NFT enthusiasts, minting cryptoart represents a greenfield opportunity to coin and own a new aesthetic.
Take a look at the home page of popular cryptoart marketplace Zora. The showcase of recently minted digitally enhanced smorgasbords of content feels in equal parts gaudy, online and, sometimes, an in-joke.
Hypnotic loops of generative art, pristine three-dimensional renders and vaporwave visions. You may have opinions on whether any of these constitute as art but isn’t that the very definition of art itself—that it doesn’t have one?
Drew Schwartz in Vice notes that cryptoart isn’t as innovative as it first appears. But this may be irrelevant for those born into a world with a fully-realized internet. By making art less tactile it reflects an online, permanently connected world. One that feels even more real in (groan) a global pandemic.
For more context on the environmental impact I also recommend cryptoart.wtf which gives a breakdown of the impact of random pieces of cryptoart from across the marketplaces.
Putting the brakes on the feverish speculation of NFTs and the aesthetic of cryptoart there’s a significant moral argument—prominently laid-out by Everest Pipkin’s scathing depiction of a core building block of NFTs, “proof of work” places a direct lien against the future."
During unprecedented temperature increases, sea level rise, the total loss of permanent sea ice, widespread species extinction, countless severe weather events, and all the other hallmarks of total climate collapse, this kind of gleeful wastefulness is, and I am not being hyperbolic, a crime against humanity.
The ecological argument against NFTs is an easy one to sympathize with. You may argue, “what’s one more game with points?” but why does that construct have to be enormously wasteful?
Pipkin dismisses the idea of something called Proof-of-Stake, which would look to reduce the environmental impact of cryptocurrency and related technologies. I’m hopeful that the suggestion that Proof-of-Stake is vaporware is inaccurate.
My background in computer science and cryptology created a strong pull towards the promise of technologies that sustain the blockchain.
To the occasional chagrin of cryptology experts, it’s hard to detangle modern cryptography from cryptocurrency technology. It doesn’t help that both are shortened to “crypto.”
Renowned cryptographer David Chaum’s creation of the first anonymous digital money system was a major catalyst for the CypherPunk counterculture of the 90s. This blended crypto(graphy), politics and mathematics to advocate for the use of technology to protect rights and freedoms.
This history lives on today in the continued proliferation of electronic money and smart contracts by cryptocurrency fans.
I was charmed by author Robin Sloan’s creation of “amulets,” last week. Explained and documented over on his site, these are unique NFTs defined thusly:
An amulet is a kind of poem that depends on language, code, and luck.
- Its complete Unicode text is 64 bytes or less.
- The hexadecimal SHA-256 hash of the text includes four or more 8s in a row.
This is a fancy technical way of saying that amulets are both
Here’s an example. You can try it out on the scratchpad:
Winter evening, a leaf, a blue sky above.
This is an extremely rare amulet from Albert Granzotto – rare as it has ten 8s in a row, right at the beginning of the hash.
I’m not a poet and it felt obvious to start churning out randomly generated text in the hope of stumbling upon a string that was both an amulet and a satisfactory poem.
Stretching the definition of “poems” and inspired by various projects of the past I looked to generate emoji to see if there were any low-hanging fruit hidden in (ideally short) strings of emoji. And indeed, I found a number of rare amulets.
👩🏽⚕ 🦓 🕺 (mythic)
👽 🙌 👹 (legendary)
👌 🚐 🏛️ (mythic)
👉🏻 🧜♀️ 🧜🏾♂ (mythic)
In a bid to make a (tiny) mark in the space I wasted hours of my life following arcane steps to mint one of my rarest discoveries and list it on Zora. Behold the “mythic”:
It’s a stretch to pretend this “poem” has a narrative but I do enjoy the incongruence of each of its three components.
Is this the best I can do? I’m still looking!
The fact I bothered to follow through with minting one of these silly emoji combinations is a testament to the appeal of three of the four NFT adventure paths I described above. The raw gamification of writing code to discover them and the rush of minting to plant a flag in the chain is rewarding.
As for the environmental consideration: following the amulet rules I offset the carbon produced by the emoji poem’s life on the blockchain. Considering the innovation and excitement surrounding NFTs it’s a shame that this problem underMINEs the entire crypto sphere.
Flowchart.fun – I love the simplicity of this single page app to build flowcharts
Vaguely reassuring state machines and a regular expression crossword
Exploring Stagflation Explanations with Interactive Networks
Grow bonsai trees in your terminal with cbonsai. See it in action in the browser using repl.it
The Office interaction Graph – A much improved creation for the popular TV show
Thanks for reading, have a great fortnight – see you soon!
]]>One piece of lore about Steve Jobs, mainly perpetuated by his 2005 Stanford commencement address, was that his early exposure and interest in typography set him on a path of innovation and attention-to-detail that, ultimately, fuelled the growth of Apple.
As an outsider, the world of font design and typography has always daunted me. It’s hard not to feel intimidated by a topic that’s thousands of years old. I like to think I appreciate good typefaces when I see them but – like with other design elements – I struggle to understand how fonts should be evaluated and selected for maximum effect. Fortunately, I’ve recently found two resources to help me with this.
Adventures in Typography from Robin Rendle is a inspirational weekly coverage of fonts and types that have caught their attention. It’s always fun to read someone’s deep dive into a creative topic and I hope to pick up some of Robin’s enthusiasm along the way.
I also finally found time to digest two of Edward Tufte’s famous books on design, “Beautiful Evidence” and The Visual Display of Quantitative Information. Whilst not directly about typography, these are searing, authoritative resources on design and there are enough lessons in both to last a few lifetimes. May I be the latest in a long line of people to recommend them to anyone with even a passing interest in data visualization.
My typographical toe-dipping all came to a head last week when I started work on the new website for source/target. My current site is bolted on top of my personal site and cobbled together with some dubious build processes and questionably-crafted CSS. New(ish) year, clean slate etc. It was all coming along nicely until I hit a roadblock: of selecting complementary typefaces.
I’m told that fonts should have personality and complement each other. On one hand, I could just pick a couple at random that I think look nice but surely I should do the selection more justice than that? At a risk of sounding immodest, WWSJB?
Fifteen minutes of rabbit-holing on Google later–including this illuminating article–I emerged with some new instincts on choosing the “right” fonts for my site (I’m probably going with Proza Libre & Cormorant Garamond, thanks for asking). Of course, I was also distracted by the following table of suggested rankings of various typefaces, as published in U&lc from 1992.
Some readers will recognize this table as an adjacency matrix. By tracing the column and rows you’re rewarded by a judgement on how suitable each pair of typefaces are for combination.
Other readers may also find it reminiscent of Mario’s Picross from Nintendo. I’m not sure what the overlap is between these two groups of people I know it includes at least me.
There are at least two interesting characteristics of this matrix:
Firstly, it provides a judgement on mixing the same typeface as both the display (otherwise known as a header) and the body text. This isn’t a given in these matrices, the center diagonal is often blanked out as it’s not possible to cross an entity with itself – for example if you were encoding the likelihood of a win between tournament participants: I can’t beat myself at chess.
Scanning down the diagonal we can spot at least one example where it’s not recommended to combine two of the same typefaces. Matching Benbo with Benbo loks like a no-no.
The other characteristic is that various typeface pairings are deemed to be suitable one way round but not the vice versa. It reflects the opaque world of typography that the author decides that those worked as headers matched with body text but dissuades the opposite.
This table excels once the reader has a chosen a typeface wishes to find a suitable partner. I spotted the above two matrix characteristics with ease but found it much harder to generalize them: it’s difficult to trace the diagonal (especially on a screen) and almost impossible to mentally compare the elements mirrored by this diagonal.
These challenges are mainly a result of the spare, stark styling of the adjacency matrix. I was curious if I could apply any of the advice from Tufte in building a fresh representation of this chart.
I was particularly interested in surfacing the actual typefaces in question. One of the key takeaways from Tufte is the value in making visualizations as easy to interpret as possible. One approach to this could be as simple as reducing the amount of flipping between pages to compare charts or find supplementary figures, references and footnotes.
This also reminded me of the design methodology promoted by Don Norman in his book “The Design of Everyday Things”, that of “self-documentation”. What if we do the usual transformation of adjacency matrix into a graph but, crucially, surface the actual typefaces in question in the chart?
By the way, it’s actually quite hard to find some of these fonts in 2021. Perhaps it’s because I’m on a Mac but fewer were already installed on my machine than I thought would be the case.
After the initial hairball view of all fonts connected to each other I decided to focused on the font pairs that had the lowest rankings, the “think again” pairings. By visualizing just these typefaces I can begin to understand why they may not go well together.
What if we take another step out and add additional nodes with examples of the two fonts together? The result is messy and overwhelming but our view of the pairings is looks a little like this:
I leaned on my trusty force directed graph for these example but I think the use of an alternative representation would be a good next step.
I learned a new term this week. The “asymmetric dominance effect” is the counter-intuitive behavior where an additional, “decoy” option can result in a change in behavior.
There are clear examples of this in marketing and sales. Take the addition of a product tier with an lower, “anchor” price. By introducing this into the mix a consumer automatically changes their perceived value of the other tiers.
But the phenomenon isn’t limited to humans. It relates to slime mold and their apparent favorite food:
The slime mold likes the small, unlit pile of oats about as much as it likes the big, brightly lit one. But if you introduce a really small unlit pile of oats, the small dark pile looks better by comparison; so much so that the slime mold decides to choose it over the big bright pile almost all the time.
Something about your friendly neighborhood slime mold striving the largest number oats really captured my imagination. But there was something this reminded me of that I couldn’t place. Flubber? Alex Mack?
Turns out it was a video I first saw a few years ago: by emulating Tokyo and the surrounding areas with strategically positioned oats, slime mold was found to spread in a way that closely resembles the metro system for Japan.
Perhaps we need to take time to reconsider our infrastructure projects. I can see the headline now:
“Slime mold completes metro system replacement significantly under budget with surplus of oats”
TikTok, as per Nathan Fielder, "is a children’s dancing app.”
I’m not on TikTok, partially because I think I’d get addicted in fewer than 15 seconds. I have, however, noticed the increase in innovation and creativity on show in the countless 'toks uploaded daily. A follow-up article from Eugene Wei on the network effects of TikTok is an interesting deep dive. I liked this section:
TikTok beatmaker Ricky Desktop pictured, in his head, dancers performing some movement. Then he wrote a piece of music that included a musical cue intended to elicit that exact movement.
Then, later, some dancers on TikTok performed the movement he had pictured, exactly at the moment he had inserted the musical prompt. It’s not just that he choreographed the human body via music, but how he did it. Ricky Desktop is a marionettist manipulating human bodies not via strings but music.
The program WinAmp used to do software visualizations of music. TikTok is like Mechanical Turk for visualizing music.
Finally, over on Twitter this week, MC Hammer shared a research paper exploring citations across various scientific disciplines.
Yes, you read that correctly.
Quantum physicist and computer scientist Michael Nielsen highlighted some graphs of note from the paper:
I really wasn't expecting MC Hammer to become a source for ongoing research projects. But, well, here we are. This graph is striking, showing the share of citations to philosophy of science papers from other fields. https://t.co/WQD6hy1ari pic.twitter.com/bVqOGZD4A1
— Michael Nielsen (@michael_nielsen) February 24, 2021
As MC Hammer says,
Elevate your Thinking and Consciousness. When you measure include the measurer.
Last call for one minute of your time: the annual (sure) anonymous survey on source/target is still open! Your feedback has really helped me plan for the future with this newsletter.
Until next time! 🤹♀️
]]>There’s a reason it’s called link analysis: Graphs are links. Looking and analyzing graphs without links is using a laptop without an internet connection or meeting people without forging relationships.
Strictly speaking graphs without links are graphs – a hill I’m happy to rest on for a while.
Classifying the many different types of graph links leads us to theoretical definitions of “bidirectional,” “multiplexer” and other leaden terms. These are helpful but let’s cast our net wider than graph theory and social network analysis and consider an alternative classification of links.
I’ve split all possible link types into four distinct categories: Relationships, Paths, Observations and Inferences – something I’m going to call the OIPR model of links.
If I were to ask you – quick – “what’s the first network you think of?” I suspect the answer would be that of a social network. Your social and familial connections from the real world, represented in the digital one.
The links in these networks are relationships. You may have a husband, a sister, a cat or (hopefully) at least one friend.
Our relationships can also be less concrete – you could have an acquaintance or a tenuous, globe-spanning relationship with, say, the person who reverts your Wikipedia edits.
Graph modelling guides will tell you that the best graph models use verbs to describe relationships: father of, babysitter for, etc. This is good advice but it’s interesting to note that these verbs are invariably passive.
This way of describing a relationship could lead us to think of these relationships incorrectly, after all, what do they mean to an outsider looking in? In this graph, being a mother or owing someone money isn’t about what that intimate relationship actually means to those individuals, but how viewers perceive and respond to that relationship in their own understanding of the world.
Paths are found in networks all around us but they are commonly understood in the geospatial context. What’s the shortest route between A & B? Can I reach the museum by bus? Will my new apartment have fibre?
There’s a strong overlap between networks and the world of optimization: What’s the software dependency chain with the fewest elements? What task do I need to complete to get this off my plate?
There’s a difference between the existence of a path and the action of taking it: we travel along paths all the time.
It’s helpful to generalize this path utilization in an “observation” link type:
Observations can take advantage of projection: you and I may join the same conference call via a service like Zoom but it’s the co-occurrence of us on the same call that’s interesting to a social network analyst. On the other hand, a Site Reliability Engineer at Zoom cares more about the volume and structure of the many thousand of Zoom calls than the relationships forged.
After about a year of COVID all my examples gravitate to be about online meetings.
A common trick in network analytics is to take a relationship or observation network, defined above, and derive fresh – sometimes artificial – links out of their raw structure.
One example of this is an inference, commonly used in semantic knowledge graphs: I’m your brother and you have a wife, so they must be my sister-in-law.
Another example is an inferred similarity. Here’s a project I found this week that demonstrates this:
TV Tropes seeks to be a comprehensive breakdown of every trope that has ever graced popular media: a familiar storyline, a narrative device, visual shorthand for a common theme; no matter how vague the trope, it’s likely documented here.
If you’re not familiar with TV Tropes there try clicking the random trope button a few times to get a feel. Their page describing a number of “Forgotten Tropes” is also interesting to see how certain tropes have fallen out of favor.
Diving into the vast web of tropes is a task that rewards the reader with a category mapping of shared tropes. This network is neat but isn’t illuminating. It’s a summary of the most popular tropes, sure, or a pointer to popular TV sitcoms with cliched, lazy, but accessible storylines.
These links, by the way, are examples of “observation” links – we’re observing a certain trope in a TV show or movie. If we assess how often tropes are observed between shows a naturally interesting network with a different form of link bubbles up.
Taking advantage of a technique called Jaccard similarity, Reddit user /u/theotheredmund analyzed the TV tropes database and created this visualization of interconnected tropes. What’s remarkable about this is the neighbourhoods of various TV shows. We see a “late night comedy” cul-de-sac, a “Law & Order” lane, a terrace of sitcom titans and a side-alley of sci-fi shows brought together by the tropeful Babylon Five.
/u/theotheredmund describes their process:
For each pair of shows, I count number of tropes that exist in both shows and divide by number of tropes that exist in either. That’s their “similarity.” Then I go through each show and find the most “similar” other show, and link it. So a chain indicates that show A has most in common with B, but B might have most in common with C, onto D, E, etc.
TV Tropes is manually curated and there’s sure to be some artistic license on theotheredmund’s part when building this visualization. Regardless, I think these are a different type of link to a relationship, path or observation.
For another example of a similarity graph check out source/target favorite Andrei Kashcha’s sayit project which, given a subreddit, shows a graph of related subreddits.
Many networks could have links that span across different pairs of the above classifications. I’m struggling, however, to think of a network link that isn’t covered by any of my definitions of observations, inferences, paths & relationships.
What do you think? Have I missed an obvious link type?
I attended the Outlier 2021 conference this past weekend. It was a wonderful collection of talks by data visualization practitioners from around the world. I really enjoyed the emphasis on an asynchronous experience: it would be (almost) impossible to catch all the talks live as they spanned a number of timezones. The recordings, transcripts and collaborative boards & docs meant it was rewarding to catch up on talks that happened while you were sleeping. As a virtual conference it also afforded a “rest day” on the Saturday, a smart decision from the organizers.
Outlier helped me find a number of people doing great work in the graph space and I’ll be showcasing their work here in source/target. This week I’m starting with two delightful projects from Krist Wongsuphasawat.
First up we have SentenTree, a novel way to summarize text by plotting the various deviations and forms around a topic from social media. It’s a generalized but streamlined approach to my Song Lyrics Graph project. It’s intuitive to see the sentence variants play out across the screen and see the sources in the tooltips – I vastly prefer “SentenTrees” to word clouds!
Most people are probably over Game of Thrones by now but I like this project giving another summary of tweets. This time we learn more about the characters discussed together as well as the emoji used in the various tweets.
Like my link classification from earlier, these projects showcase observation links: we both observe the paths taken by words in sentences and observe the co-occurrence of characters from Game of Thrones in a set of tweets.
For more insightful visualizations check out Krist’s projects page.
While we’re on the topic of Twitter, this thread on US Election voter fraud claims Andrew Beers is worth a read.
Here's a thread on this graph I made, which has been bouncing around Twitter a bit. Broadly, it shows the connections between "influential" Twitter accounts during the election.
— Andrew Beers (@beeeeeers) February 6, 2021
But it's always more complicated than that with networks! Here's some insights you might be missing. pic.twitter.com/OSpxS2aR7c
Here’s an accompanying Nature article on the work.
I was almost finished up for this week but I spotted this fantastic project from the Open Syllabus Project:
Excited to release a big update to the @opensyllabus Galaxy visualization today, which shows node2vec -> UMAP embeddings of the OS citation graph. Now showing the 1.1M most-assigned works. Using the @RAPIDSai UMAP implementation, which is amazing! https://t.co/uepz5vlL6N pic.twitter.com/jJNh78Ecn8
— David McClure (@clured) February 10, 2021
It’s a wonderful implementation with a detailed technical write-up over on their blog. Impressive stuff.
My source/target anonymous survey is still open, I’d love to know what you think! See you in two weeks.
]]>Recently, I’ve been thinking a lot about newsletters. I’ve been reading a lot of newsletters and reading about a lot of newsletters, but most of all I’ve been thinking about them.
Both Anna Weiner’s article for the New Yorker and Clio Chang’s special report for the CJR describe how popular newsletter service Substack is hard to classify as it morphs into “not-a-media-company”; failing the duck test.
Robin Rendle’s visual essay questions what we might have lost as the early internet drifted into today’s internet of apps. It’s an arresting view, expertly delivered through a slow-web guided essay.
I was surprised by the combative tone of this interview with one of the Substack co-founders — reading this pre-empted this companion article in warning of the imminent danger from lack of moderation.
And I found myself regretfully agreeing with this guest post’s severe suggestion that newsletters, mostly found and published daily on Substack and promoted via leadership boards, lead to homogenous content and a violation of a “Social Fog of War”.
In the end, I figure none of these above hot takes matter. The best thing about newsletters is how they allow you to turn down the incessant noise of the internet and strip it all back to a short (or long) conversation between a writer and their readers.
The keynote from Vicki Boykis at rstudio::conf upon this:
One of the recurring themes of Normcore Tech, my newsletter, has been that, even though we all sit at machines and write code or munge data for a great deal of the day, we ultimately crave human connection. We don’t have a way to create and handoff physical things. Cultivating our own garden online that’s not subject to variations online … can be a way to give us that satisfaction.
When I started writing source/target—around this time last year—I wasn’t sure of much. I wasn’t sure if it was something I would stick with. I wasn’t sure of how often I would write it. I wasn’t sure the topics I would cover. I wasn’t sure if anyone would even read it!
One thing I’m sure of now is that the original idea of “news about graphs” turned out to be not that which I’m interested in writing about. This isn’t news, I said as much back in edition #14. There are loads of other places you could go to scratch that itch.
In earlier editions I spent a large amount of time thinking about “why” graphs and networks are interesting. I tried to sneak up on the topic; looking to discover a new perspective on their appeal and understand why I even wanted to write the newsletter.
Twenty-five editions later and I feel much more comfortable about my topic of choice. It turns out source/target isn’t really about the graphs and networks. It’s not about the ways they’re implemented or stored. It’s not even about the way they’re visualized. It’s clear now that source/target is about the world around graphs and how we grapple with the connections around us.
Three “unofficial rules of source/target” have bubbled up over the last year:
I’m a sarcastic person who decided early on that I wanted to keep source/target and takeaways as positive as possible.
This may seem “safe” or “boring” at times but I’m not looking to spark any viral beefs.
There’s an infinite stream of content out there on the internet. I don’t want to add to the firehose by copy-pasting any link with a tangental connection to graphs and networks.
Everything I share in source/target has been read, digested and sparked interest for me in some way.
Corporations are people too. Like people, this doesn’t mean that everything they say is necessarily interesting. It’s easy to say “everything is a graph” or “everything is connected” but digging a little deeper is much more rewarding.
With source/target I want to share projects and content from individuals from around the world that show a spark of curiosity fanned by graphs and visualization.
With all that reflection out of the way I want to set the tone for source/target for 2021 and beyond. I want source/target to exist in the gaps between a variety of topics:
As a fail-safe to the “Substack milquetoast” I think the breadth and depth of these topics are set to be the strength of source/target in the future.
I’d love to hear what you think, feel free to hit reply and let me know. I’ve put together a short, anonymous survey to make the feedback process as easy as possible.
Oh and for the 5% of you who want to hear more from me on writing the newsletter and related projects you can now support source/target here.
As ever, thank you for reading – see you in two weeks.
Two podcasts this week:
Okay, okay, I don’t speak Spanish so I just broke rule #2. The podcast looks great though!
What’s the difference between envy and jealousy? I wondered this recently and was surprised to find the answer in The Simpsons, of all places.
Homer Simpson: I’m not jealous, I’m envious. Jealousy is when you worry someone will take what you have. Envy is wanting what someone else has.
Another resource I found was this endearingly-2006 site on Emotional Competency which includes the following helpful map.
What about spite? The map above shows how jealousy could lead to anger via insult but what about the case — as with spite — where harm is caused without any benefit to the perpetrator.
Take so called “spite houses”, “building[s] constructed or substantially modified to irritate neighbors or any party with land stakes”
The closest obvious example of these to me is The “Sam Kee Building” in Vancouver which is The Guinness Book of Records’ “shallowest commercial building” in the world:
The unusual proportions arose from a dispute whereby the City had expropriated most of the lot for street-widening without compensating the owner, the Sam Kee Company, for the residue, believed to be unusable.
This event has value as a gauge of the disrespect shown to Chinese-Canadians by the civic authorities; and owner Chang Toy’s response in building on the much-reduced site is an indicator of the Chinese community’s defiance to this discrimination.
A recent paper from Nature models spite as a dynamic network where a subset of the network is given spiteful characteristics. They found, per Northeastern that:
spiteful agents targeted non-spiteful players, draining their resources so the spiteful agents looked better in comparison.
This resulted in the initially non-spiteful agents realizing they were worse off and perpetuating the spite to get ahead. Researchers found that it continued to spread until there were no cooperative players left.
“Spite is contagious” is a dour takeaway but, hey, now we know.
I stumbled upon this graph data art from Brendan Dawes in 2016. It’s a graph visualization for Cancer Research UK showing collaboration between scientists fighting to beat cancer globally. Used as a cover for a research journal it’s an uplifting use of visualization that offers an antidote to all that spite.
If you’ve read this far you should take 5 minutes to fill out this short survey on source/target – help me shape the future of source/target.
]]>Back in December of 2019 I received an email from Air Canada alerting me that I hadn’t reached some modest minimum number of points for the year and would therefore be losing my “status”. I was mildly disappointed but understood that I wasn’t travelling enough for whatever status I’d missed out on to make a tangible improvement on my life.
Six months later, the number of flights taken on a single day in April had plummeted to a 10-year low.
I have thousands of points spread across countless different point schemes. Groceries: swipe, airlines: scan, pharmacy: tap, coffee shop: stamp.
It took me a while to understand that the primary benefit of these points isn’t for the consumer, they’re a way for corporations to promote a sense of loyalty — thereby likely selling more in the future — with the added benefit of being able to track consumers across purchases. As a customer, I only receive a small reward for the sizable gains made on the other side.
Points are hard to track and understand. Companies mildly obfuscate the true value of their points to disorient customers as they strive to get a good deal. There’s a rewards scheme here in Canada where 1000 points are worth approximately 1 cent. I’m not horrendous at mental math but it still takes a second to translate before realizing you’ve earned some pretty low value points despite those tantalizingly high numbers. Obscurity-via-too-many-zeroes.
At first it may seem that these rewards schemes are walled gardens: points earned with one retailer can’t typically be redeemed at another. This is only half true. By undertaking what is known as a “transition event” one could sever ties with one airline and become a fervent fan of another via one simple transaction.
I now pronounce you my #1 favorite airline.
This possibility to swap and transfer leaves us with a tangled network of value translations mapped across the thousands of companies willing to trade points for other points.
It’s not dissimilar to the system of “barter” where items are assessed by their comparative value. Take this excerpt example from “Year of the Rabbit,” Tian Veasna’s graphic novel describing “one family’s desperate struggle to survive the murderous reign of the Khmer Rouge in Cambodia”.
]
Many assume systems in this style were the precursor to modern currency-based economies but there’s very little evidence this was the case.
Back to the present day we can use graph theory to understand our best option in these translation networks. In this graph each of the links has an accompanying points cost. By finding the minimum distance between two points we can find the minimal cost for maximal benefit.
The most tangible network in the world of points are those of humans taking flights. Origin/destination networks give us a fascinating look at migration and travel for people around the world.
In a pre-9/11 world it was remarkably common for people to fly using tickets issued for a completely different person. There are possibly apocryphal tales of entire basketball teams racking up points for a single, well-point-endowed, person. More concrete and fraudulent was the NBA referee travel expense scandal of 1994.
Airline points used to be precisely aligned with the number of miles flown across the planet. Eventually, most airlines introduced a minimum miles policy: even the shortest flights would reward the intrepid road warrior businessperson.
Just like the shortest path in our cost network, humans are wired to find approaches that will maximize our reward for as minimal effort as possible. And that’s exactly what happened:
Flying back and forth between two short-leg cities, a rewards ticket to Hawaii could be earned in just eight continuous hours of flying. “One of the most popular ones was Dallas to Austin, people would do that eight, nine, 10 times in a day.” Source
It’s weird to think that these pointless (pointful?) jaunts aren’t technically as bad for the environment as one may think. Common commuter routes are likely to have empty seats ready to be paid for and sat in:
Airline seats were perishable; planes take off, full or not.
Nevertheless, these airport hops wasted time, burned extra fuel and artificially inflated demand.
The illuminating podcast series The Missing Crypto Queen tracks journalist Jamie Bartlett as he investigates the murky world of OneCoin. Marketed as a cryptocurrency, there’s no evidence of any bona-fide blockchain underpinning OneCoin and it’s now widely-regarded as a Ponzi scheme.
The podcast documents how a network of affiliates promoted educational packages that sold for between €100 and €225,500. Each package includes “tokens” which can be assigned to “mine” OneCoins. US Prosecutors have alleged OneCoin brought in approximately $4,000,000,000 worldwide.
It’s safe to say credit card point rewards schemes don’t classify as a Ponzi scheme – although it’s interesting to learn that they are unregulated – they are, however entangled in the intricate world of affiliate marketing. To give one example, by promoting a credit card, bloggers and others receive commission on the purchases or commitments ultimately made as a result.
With continued COVID disruption I expected some sort of points collapse event. Perhaps each of these networks – the points barter web, the human ambition to travel and the interlacing world of affiliate marketing – are keeping the points plane in the air for now.
I’ve used the intuitive diagram sketching tool Excalidraw in the past. It’s free and well-suited for both small visuals and complex, collaborative diagrams. Their review of 2020 gives a good summary of it’s breadth of and includes this beautifully curated intro to graph theory from Anas Ait Aomar. Check out the full notes here.
One of the most remarkable things about the internet is just how reverse-engineer-able it is. If you want to understand how a website works or find the host for some piece of content the developer tools are right there. Sure the underlying code will likely be minified but the fact you can view it and observe it is surprising.
In this tutorial video, Mathieu Jacomy and Jonathan Gray show how the ad network hostnames can be scraped from websites to gain a greater understanding of funding sources for these sites; in this case, sites about anti-vaccination. It’s a solid application of the “follow the money” adage and worth a watch.
The Digital Methods Initiative tool used in the video is Tracker Tracker and Gephi is used for the graph analysis and visualization.
Last up we have a project from Keith McNulty. He scraped every script from the hit sitcom Friends and plotted a graph of the interactions between each of the characters. I like that you can step through each of the seasons and the “minimum number of scenes” filter bar.
These visualizations are pretty popular. I spotted a similar one for The Office over on /r/dataisbeautiful that was a lot less well-received in the comment thread; most criticism being about the apparent inaccuracy of the data scraping approach.
Turns out graph visualizations are a fun pastime — just don’t annoy the fanbase!
I’m Christian (👋) and this is my bi-weekly (fortnightly?) newsletter with interesting content and links orbiting the world of graph.
If you’ve reached this far I’d love to know what you thought of this week’s edition. Hit reply or hit me up on Twitter. See you in a few weeks.
]]>Here we are at the end of 2020. I’m slipping in one more source/target before the end of the year. It’s that weird time between holidays so I’ve been hard at work on a fresh project.
Back in source/target #17 I admired a newsletter called Winning the Internet built by Russell Goldenberg of The Pudding. As I wrote back then:
The idea of the newsletter is simple: they aggregate links from the most popular link-sharing newsletters—a daunting list of over 100 newsletters—and produce a breakdown of the most-shared links of the last 7 days.
This past week they released a dataset of over 100,000 links found in 113 newsletters since June 18th; wonderful snapshot of the last half of a pretty, uh, remarkable year. It gives us an exhaustive glimpse into the common articles and topics covered by the ever-expanding newsletter-sphere. A zeitgeist visualization, if you will.
Before the year was out, I wanted to take this dataset and build something to show off the interconnectivity of all these newsletters.
A source/target take on the “end of year” list.
After much data wrangling I present:
Instead of the ASCII Perl approach I shared in #17 I built a much more capable JavaScript app that allows you to explore the network and see the linked sites from each of the newsletters. I scraped additional metadata for each of the links and newsletters – you can see this when you hover or click on a node of interest.
This didn’t work for some of the links. Makes sense, it’s a lot of links!
I pre-compute my layout to minimize the amount of time and fan noise when opening the app. Saving these coordinates ahead of time this has the side effect of really ballooning the size of the graph on page load.
Another downside of this is that it takes a while to load at first: the loading animation is very necessary!
I’ve built a number of projects for source/target this year and my process is now pretty familiar:
It never ceases to surprise me that the last bullet point seems to take up the vast majority of time spent on a project. It’s a familiar observation that sits somewhere between the the ninety-ninety rule that states:
The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.
and the Pareto principle that (roughly) suggests that 80% of my efforts will be spent on the last 20% of tasks. Some of this can be attributed to the use of new (or new-to-me) technologies for each project. I’m trying to build up a library of common code to use across projects but these things take time.
Over the year I’ve become progressively ruthless with project scope and continually reaffirm that done is better than perfect. In the case of “Half Year Hyperlinks” the result is an application that barely takes advantage of the underlying graph of links and newsletters.
I regularly tell people that it’s just plain hard to build interactive graph visualization applications. Challenges include a heady mix of data wrangling, interactivity design, performance considerations and the moving target of the modern web.
That’s not to say I don’t enjoy it–I’m clearly building these projects for some reason! I think source/target in 2021 will aim to build up a blueprint for the design and implementation of interesting, insightful graph experiences.
Until then, let me know what you think of Half Year Hyperlinks.
I featured the new GitHub home page globe as part of #23’s “12 days of graphmas.” There’s now an interesting 5-part article series on how it was made.
An article from The Brookings Institution on a paper using network analysis to measure the representation of different groups across 1,600 individual board members at nearly 100 globally-prominent organizations.
To establish the core of the network that is distinct from the periphery, we use an iterative process called k-coring, which identifies the group of organizational leaders within the network that are more connected to one another than the rest.
[We] find that women—and non-white women in particular—are proportionately more confined to peripheral positions within the network. We find that the representation of white males increases as we move from the periphery to the core. For the women, it is the opposite: Their representation declines as we move toward the core.
While it is perhaps not surprising that men still make up the majority of governing boards, what is significant is how being a woman disadvantages her from being included in the “core” of the network.
This is the network equivalent of not having a seat at the table.
I like to read writing by developers who post regular updates on their work and process. It’s a great way to pick up new approaches to work and to help fend off imposter syndrome.
One of my favorites is Road to Ramen from DK. It’s a daily journal covering all aspects of being an independent developer working on new products to aid productivity. I love DK’s approach to product development, he guides clear vision into compelling apps and I’m excited to see his output in 2021.
I met Duncan Geere virtually at the formation of the Data Visualization Society last year. Duncan’s a thoughtful and inspiring information designer and I enjoy reading new editions of his newsletter chronicling his work and thoughts in 10 day increments.
Finally, Will Lyon (Neo4j) just started his own newsletter last month where he covers graph projects he’s working on. Will’s a very generous and capable graph advocate so I recommend subscribing here. For his latest project he’s building a Podcast Search app with Neo4j & GraphQL.
🎉 Happy new year to all Sourcerers throughout the world! See you in two weeks for #25.
]]>Something a little different this week. To close out the year I’ve collated some of my favorite finds from the last little while.
Siteswap, also called quantum juggling or the Cambridge notation, is a numeric juggling notation used to describe or represent juggling patterns. I had no idea this was a thing!
Exploring the themes and forms of lineage diagrams with Paul Kahn and DVS Nightingale.
I’ve previously marvelled at the wide range of data formats used to represent graphs. Here’s a new one from Andreas Kollegger at Neo4j. Gram is a “textual format for data graphs” and it will look familiar to anyone with Cypher experience. The best place to get a feel for this new format is over on Kollegger’s Observable.
The Arrows graph drawing app from Alistair Jones has been reimagined and reimplemented under the Neo4j Labs banner. I gave it a spin this week and can already see using this a lot to prototype graph models.
It’s officially the rainy season here in Victoria, B.C. Contrary to popular belief we actually get less rain here than in Vancouver or over the border in Washington. I like this visual breakdown of Seattle precipitation patterns from Eric Lo. It’s neat to see it integrated with his “WeatherWheel” for cross-filtering goodness.
Trees appear to communicate and cooperate through subterranean networks of fungi. A fascinating article in the New York Times.
Before Simard and other ecologists revealed the extent and significance of mycorrhizal networks, foresters typically regarded trees as solitary individuals that competed for space and resources and were otherwise indifferent to one another. Simard and her peers have demonstrated that this framework is far too simplistic.
I found this graph on Instagram (of all places!) and I have no idea where it’s from. Reverse Google Image searching it gives me thousands of results for the original diagram that inspires it: The Shield of the Trinity.
This article from Nature tracks the collaborations between AI researchers across institutions and countries around the world. I like the breakdown of local and international collaboration patterns.
🎵 Simply having a wonderful Christmas (graph) 🎵
I really enjoyed this podcast interview with Sir Paul McCartney this past weekend. It spurred me to watch “Two of Us” (2020) starring Jared Harris & Aidan Quinn which I’d also mildly recommend.
Past Sourcerer Anvaka has updated his fantastic spaceship app originally mentioned all the way back in source/target #2. Fly through a galaxy of NPM code repositories.
Protip: press space to enter steering mode and drag the mouse for the best experience. Add in a bit of Shift key action to enter warp speed.
“A map of my School’s network as ‘seen’ from my dorm room”
Github launched a new home page this week with a live 3D globe front and center highlighting pull requests opened and closed across the world. Hover over the links to show the source and target of each PR.
Whew! Stay safe, warm and healthy – I’ll see you all next year.
Why not share this edition with someone you think would enjoy it?
]]>Do you ever think about how many different types of music there are in the world? Across time and history the amount of music that has been generated is astounding – enough for infinite lifetimes of headphone-wearing.
We’re nearing the end of the year, a time full of festivities. One familiar tradition for me is that of the “Year End List” – a time when online and offline media converge in their desire to rank the songs and albums released in the past year. It’s a curious tradition on a arbitrary scale – what about the calendar year makes it suitable to split and rank record releases? Surely an album released in January is likely to be lost in the noise of the year in favor of a Summer or late-fall hit?
One project that puts this into perspective is that of British comedian James Acaster. While handling mental health issues Acaster distracted himself by pouring over records released in 2016. His efforts spawned a book, podcast and expansive coverage on his site of 350+ albums of note from 2020. In a recent interview Acaster suggested that 2016 was a turning point in the music industry: it suddenly feels impossible to listen to all the music released in a single calendar year.
Another way to look at the vastness of music can be found on “Every Noise at Once” – a fascinating project that plots out thousands of genres from Spotify. The site is deceptively simple but once you click around and play snippets of different musical styles you’ll quickly realize it gives a unique window into an astonishingly diverse set of music. Drilling down into a newly discovered genre, you’ll be rewarded with additional text maps of artists also grouped by similarity. It’s amazing.
I’m also a fan of Radiooooo and the inimitable Poolside FM.
Every Noise at Once is a link-less graph, otherwise known as a scatter plot. The node locations reflect the underlying music:
The calibration is fuzzy, but in general down is more organic, up is more mechanical and electric; left is denser and more atmospheric, right is spikier and bouncier.
The core concept is adjusted in a number of ways on the site. In “The Sounds of Places”, for example, countries are mapped “not by their coordinates on the crust of the Earth, but by the acoustic characteristics of their music.” There’s also a neat breakdown of the most popular music of 2020 from around the world.
Another revival of esoteric music has occurred courtesy of YouTube, the largest music streaming platform in existence. When listening to music using YouTube the addictive sidebar of recommended videos is a graph of musical journeys just clicks away.
Coined “YouTubeCore” a delightful article from Ars Technica this week loosely defines the genre:
admittedly open-ended in terms of genre and style, but for our purposes, we can limit it to soft, instrumental fare—specifically, an algorithm-driven hierarchy of ambient albums that leans, for one reason or another, to the island nation of Japan.
This isn’t a small trend, we’re talking millions of views and enough new fans to surprise artists and even bring them out of retirement. Due to YouTube music licensing policies these views can result in significant compensation for artists, sometimes considerably more than on other streaming services.
The algorithm behind this phenomenon is unknown to us YouTube outsiders, but this hasn’t stopped some researchers:
Massimo Airoldi, a professor at Emlyon Business School, co-authored a 2016 paper titled Follow the algorithm: An exploratory investigation of music on YouTube. It proposes that the algorithm partially leans on sequential viewing: if a significant number of users watch video B after video A, the two are considered related and therefore recommended.
Within this framework, genres stop being simple technical distinctions and become granular concepts based on crowdsourced human-behavior patterns. Utilizing network analysis, the study estimates that viewing habits cause the algorithm to connect videos via recommendations, thereby knitting tight genre cliques in the process.
]
After years of significant negative press it’s refreshing to hear a seemingly positive result of the YouTube algorithm at large. What’s more the comment sections on popular YouTubeCore videos are a surprisingly wholesome place.
This week’s source/target was brought to you by all the music I stumbled upon during my “research”:
Klaus Doldinger’s Passport - Schirokko (1973) – Amazing dueling drumming at the end
Herb Ellis & Remo Palmier – Windflower (1978) - Generous jazz
Tatsuro Yamashita - Sparkle (1982) - Powerful intro
Beckett - Four (2016) - Turns out I have a soft spot for *checks notes* “synthwave & talkbox with a backdrop of 80s beats”
Sourcerer Jan Žák has put together an experiment in Observable that showcases a great technique for reducing network complexity. By reducing the proportion of edges drawn in a graph you get a visual shorthand for the density and connectivity of the nodes.
Partial Edge Drawing (PED) graph visualization experiment https://t.co/CILvt04kxB #observablehq
— Jan Žák (@zakjan) November 23, 2020
I see this as a good middle ground between a link-less graph and an overwhelming hairball. I think it has the potential to be used a lot more in other applications.
I’ve been thinking about the breadth of topics I cover in source/target and wonder if I could make it easier for readers to pick out the content they are most interested in. Images are a great way to do this but time-consuming to add and curate. Perhaps a little tag icon or similar would be good? What do you think?
All that’s to say there’s some fantastic data art out there so please imagine a little “data art” icon here:
Cursor Home from Sophia Schomberg and Nikolaus Baumgarten is a trippy journey into a seemingly-infinite world of working from home and graph motifs. Also see their amazing prior work Arkadia.xyz.
A mixture of data art and ruminations into the role of data, Stefanie Posavec’s sketches as part of her art residency with People Like You are a delight. Updates will be posted to her Instagram.
First round of sketches from my @PersonalisePLY residency, understanding how those who work w/ medical ‘big data’ perceive the ‘people behind the numbers’ who consent to their data being used and stored for future research.
— Stefanie Posavec (@stefpos) November 30, 2020
Follow my IG for updates: https://t.co/nAO2Urlh7D
/1 pic.twitter.com/Hrgkv3Ezvu
Stefanie has been producing beautiful work for years: one project that caught my eye was a “Literary Organism” piece from Writing Without Words which
explores methods of visually representing text in order to highlight the similarities and differences in writing styles when comparing different authors
In this case the first part of On the Road by Jack Kerouac.
Another was the lovely “Relationship Dance Steps” , a graph project from 2013 that slips into the physical world in the form of spatially-aware dance steps.
Popular design podcast 99% Invisible recently featured another podcast on the Enron collapse, in particular the email data released by regulators in 2001.
When these emails became public, for the first time there was a database of thousands of real emails sent by real people that were available to the public, and researchers.
The episode touches upon the novelty of a public dataset of these real connections between people. It serves as a solid introduction to graph analytics from an interesting origin.
Today I present at @ivconf a conference paper titled "Drawing Network Visualizations on a Continuous, Spherical Surface." The pre-print is available on Google Docs at https://t.co/f9DLxkcFtc pic.twitter.com/2Ku13xybL6
— Dario Rodighiero (@dariorodighiero) September 8, 2020
Why not share this edition with someone you think would enjoy it?
]]>At it’s best, the internet connects and inspires those with common interests. The stranglehold of social media on most web content in 2020 makes it easy forget that we have also enjoyed 40+ years of web forums, bulletin boards, Usenet communications and other discussion groups.
I’m not old enough to have really embraced Usenet but Andy Baio’s investigation into a weirdly specific group is worth the read — if you’re into that sort of thing.
Disparate interests can sometimes converge and intersect in fascinating ways. Take this tweet featuring a simple coincidence of initialism:
Putting "IPA enthusiast" in my dating profile that doesnt exist so when girls want to talk to me about craft beer I can smoothly redirect the conversation towards phonetics
— Michael Chertoff ❁ (@MichaelShirtOff) April 24, 2020
This week I’m going to thoroughly dissect the frog and look at graphs from two surprising angles: linguistics and craft beer.
The International Phonetic Alphabet (IPA) is a set of symbols that linguists use to describe the sounds of spoken languages. You’ve probably seen these in the dictionary (remember those?) at the top of a Wikipedia page as a way of clarifying the pronounciation of a particular entry. While actual pronunciations can vary wildly the IPA symbols — otherwise known as glyphs — help us classify groups of sounds that are roughly the same.
On the topic of accents, this video is quite the tour of the variety you’ll find across the UK and Ireland
For the unfamiliar, the characters can look pretty intimidating. Turns out there’s a cottage industry of instructional language videos on YouTube for those looking to learn English and adjust their accent.
I’ve been obsessed lately with developer Josh Comeau’s site — not only is it a fabulous collection of educational content but the user experience on the site is delightful. The first post I saw from his site was one suggesting the internet should be louder. Josh explains that while this can be a real annoyance on a lot of websites,
When done tastefully, sound can make a product feel more tangible and real.
Clicking around on Josh’s site I’d agree. Audio in graph visualization is underutilized and something I’d like to explore in the future.
Sound is also informative when presenting or explaining something inherently audible. The Interactive IPA Chart could be seen as a graph of vowels and other symbols that help one get a better feel for IPA:
It’s fun to slide up and down the close <-> open range and hear the tonal changes of sound. Note the ə in the center of the graph — this glyph is pronounced “schwa”. As per Wikipedia it’s the most common vowel sound in the English language.
You’ll find it lurking in lots of words:
but it can even be “unwritten” as in the word “rhythm” [ˈɹɪðəm]
IPA also stands for India Pale Ale — a name given to describe beer that was traditionally heavily hopped in order to last the voyage between the UK and India. As Bon Appetit notes in their “abridged version”:
British sailors, while sailing to India, loaded up barrels of beer with hops, because hops were a preservative. The hops hung around in the beer for so long that they lost their fruity flavor and left a bitter tasting beer. So … British IPAs are malty, bitter, and one-noted. […] These are best consumed on some kind of a cliff with sea mist spraying in your face.
As a Canadian newcomer unable to make it back to the UK this year I’m actually missing British IPAs quite a bit. My quest to find the equivalent over here is foiled somewhat by all the strong Canadian beer.
There are now a wide variety of beers that are classified as an IPA, not all of them are as bitter as traditional British ales. That’s why you’ll often hear conflicting opinions on IPA preference – how can a beer be both “too weak and bitter” and “extremely strong and floral”?
PopChart’s “The Magnificent Multitude of Beer” chart helps to clarify the classification of certain tipples. For a more gra(i)nular look this graph from Reddit user takeasecond
provides a summary of the ingredients of over 6,000 different IPA recipes:
The node color maps to either the grain color or hop bitterness while the relationship width is determined by the number of times two ingredients appear together.
If a) this newsletter was American and b) each edition was a year it would be apt that edition #21 — predominantly about alcohol – would reach the legal drinking age. Neither is true so I’m not sure what point I’m making here.
The explosion in popularity of craft beer across the world hasn’t been unnoticed by Big Alcohol. There’s been a dramatic increase in acquisitions and mergers that may mean your favorite hand-crafted brewery is now owned by a multi-national conglomerate.
There have been a few attempts to map out the circumstances of “craftwashing” as it has been coined. Phil Howard updated his 2017 summary of the landscape to reflect a number of recent changes to the space.
Meanwhile, Mike at The Mad Fermentationist collated a web of brewery collections and has been updating it for a few years. Landscape maps with hundreds of logos always capture the imagination and his latest reflects the landscape as of December 2019.
After the rabbithole of research into linguistics, language and the IPA for this edition my head is spinning — it’s a rich domain with fascinating backstories for the language we use today.
As part of a supply chain webinar for Cambridge Intelligence I presented an application that took data from the Liquor Control Board of Ontario to visualize the various tasting notes of alcohol. I enjoyed building this demo and the recording is available here.
The name of the project is pretty inflammatory but this visualization from Sophie E Hill shows tight web of connections between “Tory politicians and companies being awarded government contracts during the pandemic.” It’s pulled together from a variety of sources and all the data and code is available on GitHub.
Hill’s great work has been picked up by a variety of media reports. It’s great to see how her graph has captured interest and prompted some interesting further analysis by others.
Visual works from influential network scientist Albert-László Barabási and the Barabási Lab are featured in a new exhibit at Ludwig Múzeum in Budapest. Background for sections of the exhibit are being posted weekly on Instagram and there’s a long-form discussion over on YouTube.
25 Years of Network Visualization
— Albert-László Barabási (@barabasi) November 2, 2020
For the Ludwig/ZKM Museum Exhibit we prepared a timeline, to show the visual vocabulary of the BarabásiLab. In the coming week I will tell the story of each image within the timeline on https://t.co/L3W0NrMXIS#barabasilab pic.twitter.com/2Q9dEOmhr5
Not the first time I’ve featured hot takes from Conor White-Sullivan in this here newsletter but on the topic of inflammatory comments his suggestion that it’s “All downhill since [1995]” is pretty bold! What do you think?
Also from Barabási Lab and new to me is this earlier exhibit on the “(Virtual) Physicality of Networks” with a neat complementary website.
Thank you for subscribing, I’ll see you in gasp December!
Have you found something interesting you’d like me to share? Let me know! Why not share this edition with someone you think would enjoy it?
]]>It’s a wet and dark week here on Vancouver Island. Edition #20 of source/target feels like a minor milestone. This edition we’re looking at migration graphs, I’d love to hear what you think.
Every year, like clockwork.
This regular pattern of searches for the Mariah Carey holiday megahit is a neat reminder that some things are inevitable; after a distressing year there’s some stability in our song choices.
A similiarly humorous Google trend was the spike in searches for “how to apply for canadian citizenship” reported after the first US Presidential debate earlier this year. The taller peak of similar searches was well reported back in 2016 and there was indeed a noticable uptick in permanent residency applications from Americans in 2017.
My partner and I moved to the west coast of Canada just over a year ago. We packed our car and took our time on a two week drive across the country. Choosing not to go via the USA we stuck to the Trans Canada Highway with some diversions.
]
We now live in one of the mildest and most temperate part of Canada. As we head to winter there’s an influx of “snowbirds” — Canadians who would usually spend the cold season in the southern states of the USA but who are stuck above the border for the foreseeble future due to the pandemic.
Migration across the country is something that’s tracked by Statistics Canada and released as part of their comprehensive Open Data platform. I was curious how much the pandemic has changed usual migration so took a look this week.
Here’s an interactive origin/destination map showing the relative net migration between the different provinces and territories. You can pick two quarters and see the relative difference in migration patterns, between them.
![](/optim/assets/st20/animated.gif "I built this in React using MapBox and the Flowmap.gl library.)
Migrations in nature aren’t aware of state and country lines. One of my favorite examples of this are from the Voyageurs Wolf Project, a group that tracks wolves and their prey around Voyageurs National Park in Northern Minnesota.
At first glance this looks like a pretty random walk but when combined with other wolves in the area we get an idea of the territory clusters.
For more details check out this Q&A from an Wildlife Biologist from the Project. If you’re itching for more wolf content check out American Wolf.
The Voyageurs Wolf Project get extra points for this absolutely adorable video of wolf pups practicing their howling.
At the beginning of the year I fell into a pandemic-induced rabbithole of productivity & life hacks, gurus & guidance. One interesting suggestion I read was to take time to introduce people in your life who may have mutual interests. This is the sort of networking I can get behind. Sparking fresh and insightful connections between those you know is a favor that is likely to endear and assist as you seek like-minded people and opportunities.
This 2019 article from Valdis Krebs explains this activity through a graph lens: it’s your responsibility to “weave” graphs by closing triangles of contacts.
Lots of flow maps this week. This infographic from Lapham’s Quarterly is a brief and topical look at some contagious diseases throughout history.
Some of the most impressive visualizations of nature GPS tracking data I’ve seen are from 422 South. Check out their bee, bird and shark visualizations!
This newsletter typically focuses on data visualizations that push the envelope in some way — usually an innovative approach to readability or interactivity. Sometimes visualizations are delivered in surprising places.
Aircal is a script that takes a directed-acyclic-graph of Airflow schedules and exports them for use and visualization in Google Calendar. Graphs don’t always have to look graphy to be useful.
This tweet from September seems apt for those following the US election this week.
I have to admit this is not quite what I was going for pic.twitter.com/HY0fi93z1s
— Charlie Smart (@charlie_smart_) September 30, 2020
Thank you for subscribing, I’ll see you in a few weeks.
Have you found something interesting you’d like me to share? I’d love to hear from you!
Why not share this edition with someone you think would enjoy it?
]]>Hello to all the new joiners from Neo4j’s NODES conference this week — it’s lovely to have you here. To get a feel for the newsletter you should check out the archive. Subscribers have enjoyed editions #7 (the movie one), #13 (the design one) and #17 (the ASCII one).
Regular sourcerers might notice something different this week. Thanks to Lis Xu for bringing a fresh look to the newsletter – you should definitely check out her fantastic work.
Have you ever wondered how you might visualize life? Y’know the thing we’re all experiencing right now? Birth to death. Cradle to grave. Everything.
Let’s start with a concrete approach. Right now you’re reading source/target from some real-world geospatial location. It’s easy to represent that: a dot on a map. This dot represents a snapshot of you at this precise moment in time.
Over time your position will change and that dot will move. Geospatial positioning over time is something we’re comfortable visualizing. Here’s an example from a recent run I recorded with my Garmin watch:
This is a linear time-sliced ribbon of me on a run. In this example my pace is encoded as a color gradient; you can see where I picked up speed (running in the road to avoid coughing people) and slowed down (hills).
If we were to take this idea and extrapolate to life more broadly, what other variables could we encode? Perhaps general state (e.g. awake, asleep) or emotions (e.g. happy, angry)?
I recall that back in 2012 Stephen Wolfram plotted a variety of his personal analytics from logs of composed emails and files.
From Wolfram’s plot we have a pretty good idea of the times that he was asleep. Eight years laterwe have lots of tools that make this state data easier to collect. Quantified self analysis through devices to track sleep and heart rate is very popular.
The story told by this dot plot has the potential to be more interesting than focusing on your coordinates in the real world. For my visualization, let’s keep the time component but instead of a ribbon unfurling over a geospatial map let’s imagine a multi-dimensional ribbon showing the many facets of your life.
For a similar 2D example of this check out the influential Movie Narrative Charts from XKCD and this recent work on building these charts with reinforcement learning techniques.
My life visualization would be animated with the velocity of the ribbon reflecting how fast time appears to be passing. The direction along a certain axis could mirror emotions. The color of the ribbon could represent whether you are achieving your goals (unconsciously or not).
Bear with me…
Personal connections and relationships are an important part of life. If we piled everyone who’s ever lived into the same chart and applied a form of force-directed physics we could model the alignment between individuals as ribbons converging. Repelling forces could represent a difference of opinion or ethics.
One thing’s for sure: a visualization of everyone who’s ever lived would result in the hairball to end all hairballs.
Humans can’t perceive enough dimensions to fully comprehend how this could look (plus it’s a pretty half-baked idea) but I do sometimes envision myself as this little ribbon in time and space.
Time Curves is a project from 2015 with an innovative way of depicting lives or “patterns of evolution.” The curve bends and twists to show the degree of difference between states. Color gradients are used instead of animation and the resulting plots are compelling 2D representations.
These Time Curves are largely data agnostic, they provide similar depictions whether the input is derived from changes between frames in a video or the deltas between Wikipedia pages. I’m fascinated in how these beautiful ribbons give you an instinctive understanding of how something transitions over time.
Early versions of Time Curves “folded like a snake” — a comparison which reminds me a little of the “Clover” motif commonly found when visualizing song lyrics
Check out the paper for the gory details and some illuminating figures. If you’re feeling brave you can generate your own Time Curve over on the project page.
In a case of life imitating, uh, life, there’s a wide array of bioinformatic visualization tools that share similarities with Time Curves.
RNArtist from Fabrice Jossinet is a work-in-progress application to design and visualize RNA 2D structures. These dense ribbons of Ribonucleic Acid, one of the three major biological components essential for all known forms of life, can be drawn, arranged, annotated and exported.
Another tool that comes to mind is Bandage from Ryan Wick. Bandage can be used to visualize de novo assembly graphs, a specific type of DNA sequencing. The visualization of the force directed loops can help computational biologists to “better understand, troubleshoot and improve their assemblies.”
To me, graph visualization is strongest when it borrows gratuitously from physics to provide an natural depiction of the attracting and repelling forces between nodes. The Time Curves and similar loopy approaches are a solid example of this in action.
Wikipedia corner: Cat gap
Scott Aaronson’s essay “Who Can Name the Bigger Number” was a gateway for me into the world of computer science and math. It introduced me to a curious concept that’s rattled around in my brain since – the Ackermann sequence.
Ackermann’s idea was to create an endless procession of arithmetic operations, each more powerful than the last. First comes addition. Second comes multiplication, which we can think of as repeated addition […] Third comes exponentiation, which we can think of as repeated multiplication. Fourth comes … what? Well, we have to invent a weird new operation, for repeated exponentiation.
Aaronson goes on to describe this sequence:
If each operation were a candy flavor, then the Ackermann sequence would be the sampler pack, mixing one number of each flavor. First in the sequence is 1+1, or (don’t hold your breath) 2. Second is 2´2, or 4. Third is 3 raised to the 3rd power, or 27. Hey, these numbers aren’t so big!
Fee. Fi. Fo. Fum.
Fourth is 4 tetrated to the 4, or , which has 10154 digits. If you’re planning to write this number out, better start now. Fifth is 5 pentated to the 5, or with ‘5 pentated to the 4’ numerals in the stack. This number is too colossal to describe in any ordinary terms. And the numbers just get bigger from there.
That escalated quickly.
This week I learned about a similar sequence that has an even greater acceleration in size: TREE. I was delighted to learn it had roots (pun intended) in graph theory. This Popular Mechanics article gives a good summary of the sequence. The embedded video from Numberphile is also a great watch.
Sourcerers Dave Bechberger & Josh Perryman have completed their book “Graph Databases in Action”. Dave & Josh have actual, bone fide experience building production graph applications and it’s exciting to get their smarts distilled into book form.
I notice their publisher Manning has a nifty graph widget on their catalog pages. This is a neat way to show similar books via the purchasing habits of their readers.
A key aim for many graph layout algorithms is to minimise the number of overlaps between nodes. This prevents nodes hidden behind one another, skewing the perspective. However, some use-cases it may be desirable to show these overlaps – manual clustering or aesthetic preference come to mind. Piling.js is a JavaScript library for Interactive Visual Piling. Used for scalable “small multiple” visualizations the library would be well-suited for use in graph visualization. Check out the background and introductory video.
Congratulations to Scott Dobbin for rising up to the challenge to find a better Ghibli graph pun than “The Graph Returns”… “Graph of the Fireflies” is a much better title and I’m ashamed I didn’t think of it.
My Ghibli Galaxy from last edition is getting rave reviews from around the world. I particularly enjoyed the following description:
“Бесполезно, но красивенько.”
Quite! ;-)
The next time you hear from me your clocks will have changed (if you’re into that sort of thing). Stay safe, warm and well.
Why not share this edition with someone you think would enjoy it?
]]>I always look to provide a subtle theme for each edition of source/target. This week the theme is far from subtle: I seem to have a gravitational pull towards Wikipedia.
“Howl’s Moving Castle” was the first Studio Ghibli movie I saw. One rainy Sunday morning I became idly aware of a trippy animated cartoon with a fantastical storyline playing on a nearby TV. It reminded me of the Pokémon anime I’d grown up watching but it was so much more compelling.
It’s pretty reductive to call Studio Ghibli the Japanese version of Disney but that’s the easiest way to explain them to those who are unfamiliar. Even if you’ve never seen any movies written and directed by Hayao Miyazaki you’d recognize some of his more famous characters. You’ve likely met your neighbor Totoro via, I don’t know, some child’s backpack or The Internet At Large.
It took me a while to make the connection between the wildly popular “Spirited Away” (aka “The Best Animated Film of All Time”) and the film I ended up watching that drizzly Sunday morning. Years later, I was again surprised to learn that Howl’s is based on a story by British novelist Diana Wynne Jones, an author I’d read a fair amount.
I’m not sure I know anyone who detests Ghibli movies. Is this an example of one of those personality questions you can ask to check if your interests align with someone else?
There’s a manually-curated dataset on GitHub that caught my attention the other month. It’s called the Studio Ghibli API and it allows REST requests to be made against an endpoint in order to get the films, people, locations, species and vehicles featured in the movies. I wondered at the time how this data would look in graph form?
It turns out the answer is: “not very good”. Getting a classification of a character in a movie as “Human” is helpful for tutorials or mockup applications. But it doesn’t make for an edifying graph to know that so many Ghibli characters share the same species.
One way the Studio Ghibli collection strays from the Disney comparison is there are very few sequels. (There are also, to my knowledge, 0 animated CGI blockbuster remakes on the cards.) This also means the data from the Studio Ghibli API has very few connections across different clusters of characters and the movies they are in.
The final nail in the coffin for using the Studio Ghibli API for graph purposes was that the data was getting to be quite stale. It would be much harder to fill the gaps of movies and details than to build the graph itself.
For an upcoming talk I’m giving at Neo4j’s NODES conference (register for free here!) I’ve spent a lot of time working with the Wikipedia/MediaWiki API. This API allows you to make queries against both the full edit history of Wikipedia pages as well as the content of the pages and categories on the site.
MediaWiki actually underpins a lot of the Wikis you’ll see online, including Fandom, a place that houses many exhaustive pop culture wikis including Wookiepedia, the Pokémon Wiki and, you guessed it, Ghibli Wiki.
The Ghibli Wiki is extremely comprehensive, and that gave me an idea. What if I scraped the dedicated Ghibli Wikipedia instance to get the data I need to build a “Ghibli graph”? I could collect the movies, characters and all other pertinent parties and build a graph of the connections between them.
This worked surprisingly well, especially when I picked up the images associated with each movie and character along the way:
I like this but felt I could do more to highlight the joy from seeing all these characters clustered and linked across different movies. Switching over to a 3D graph helped me to achieve this goal: there’s a lot of fun to be had panning and zooming around a 3D map of the Ghibli universe, a Ghibli Galaxy if you will.
I had a lot of fun tweaking the 3D render of this graph. I like how the sphere nodes for the movie characters came out. Check it out here and let me know what you think.
“The Graph Returns” (um, The Cat Returns) was the best Ghibli graph pun I could come up with. I feel there has to be a better one out there? “My Neighbour Nodetero”? If you can think of any please write in ASAP.
Using graph neural networks to monitor tree health
Graph based applications for academic discovery
Real time dashboard of Belgian startup tweets
Wikipedia corner: Human disease networks
Obsidian have completely revamped the graph view in their Markdown-based connected note taking tool. A question from the release announcement on Twitter prompted an interesting response from Conor White-Sullivan, CEO of Roam Research. I think Conor is right to have prioritized other features ahead of their graph view but they are definitely lagging in terms of performance, usability and elegance.
The team behind Obsidian has also released a way to publish notes in a clean web application — one which keeps a dynamic navigation graph on every page. I see this as an interesting push into productizing the buzzy world of “digital gardening.” Check out this example from Nick Milo to see “Obsidian Publish” in action.
Obsidian’s choice to treat the graph visualization as a first class component will pay dividends as they grow.
The team at TerminusDB have documented a project as part of the DBpedia Autumn Hackathon for this year. In it they look to blend DBpedia — an extract of the structured content that underpins information in Wikipedia — and a dataset I was unfamiliar with. Seshat, a project that aims to:
bring together high quality datasets describing every human society that has existed since 10,000 BCE, covering all aspects of social evolution. the most current and comprehensive body of knowledge about human history in one place.
It’s unusual to see a project explore the intersection of knowledge graphs with archaelogy & anthropology. I didn’t expect the article to end with a deep dive on the “Late Antique Ice Age” and I learned a new word along the way:
polity: a form or process of civil government or constitution / a form or process of civil government or constitution or an organized society; a state as a political entity.
Check out their resulting visualization of the matched polities from Seshat along with historical battles from DBPedia. Red nodes indicate battles while blue and purple nodes show polities with and without standing armies, respectively.
The article ends with a modest aim for the project:
Documenting and enriching this data allows for deeper understanding of the drivers of resilience and will hopefully allow us to better understand how societies can prepare for cataclysmic change.
Who doesn’t want to be prepared for cataclysmic change?
That’s all from me, thanks again for subscribing to source/target! You’re the best.
Why not share this edition with someone you think would enjoy it?
]]>Hi, I’m Christian and this is my bi-weekly (fortnightly?) newsletter with interesting content and links orbiting the world of graph.
How are you holding up? A warm welcome to all my new subscribers, it’s nice to see you.
My younger self had no idea that most of his professional life would be spent using chat applications just like the ones he used to chat with friends after school. As a remote employee (going on 6 years) Slack is my main medium of communication with colleagues across the world.
I’m a particularly chatty Slack user and I blame a childhood spent on MSN, AIM, IRC and chat rooms. I type on Slack in a very conversational way:
I sometimes wonder whether it would be better to communicate in full paragraphs of fully-contained thoughts but that feels so formal for the medium. It also doesn’t seem to be my style.
For the many millions of Slack & Microsoft Teams users around the world, one way to express a style is through emoji reactions to posts and messages. Back in the 90s & 00s we had display pictures to represent us in cyberspace. However the _\~true~/_
outlet for creativity was found in the status & away messages. Taking advantage of a toolbox of characters and symbols one could craft an ASCII masterpiece:
,.-~*´¨¯¨`*·~-.¸-(source/target)-,.-~*´¨¯¨`*·~-.¸
Faced with toxic air from wildfire smoke floating up from Washington, California & Oregon last week I spent some extra time on a personal project from earlier this year: Twitter bots that post weather maps in emoji form.
Using emoji for weird little projects like this is the logical extension of the age-old ASCII aesthetic. One key difference is that we have a wide array of multicolored glyphs at our disposal.
Oh and I know what you’re thinking, “isn’t the plural of emoji actually “emojis”? I note your right to tack an “s” on the end, but respectfully disagree. We don’t say “I’m going to go eat some sushis” do we?
Back in source/target #13 I threw out the phrase “curating the curators.” I should have known this was an entirely unoriginal sentiment—19,100 Google results and counting.
Two weeks later Russell Goldenberg and the excellent data visualization & visual essay website The Pudding launched a newsletter entitled “Winning the Internet.”
The idea of the newsletter is simple: they aggregate links from the most popular link-sharing newsletters—a daunting list of over 100 newsletters including one of my favorites, bnet—and produce a breakdown of the most-shared links of the last 7 days.
As they put it:
We decided to curate the curators. What’s one more newsletter, anyways? No more link FOMO, just the statistically best links + cute charts, weekly. That’s our pitch. Subscribe! Or don’t ¯\_(ツ)_/¯.
That little shrugging chap is a great example of the sort of emoticon that thrived on MSN. It’s actually a subtype of emoticon known as a “kaomoji”— unlike :-) , :-D , :o( and similar there’s no head turning required to visualize the emotion portrayed of those 9 characters of text. A smiling shrug.
The aesthetic of “Winning the Internet” is firmly rooted in the world of chat rooms: gratuitous text art combined with the unique approach of plotting line graphs using ASCII.
To me the color palette and form evoke an era of The Matrix and CRT monitors. In many ways the generation of these plots was the first multi-platform graphics system. As long as you used a monospaced font, the chart would look mostly the same regardless of the text editor, platform, device or screen.
Did you see the recent Foone thread on running Doom on a digital pregnancy test? I’ll just leave this here.
These line charts give a neat, if rudimentary, look at the popularity of each linked article over time but there’s a facet of the data that isn’t represented in these plots. If we created a graph of the newsletters and the articles they link to we would instead be able to see:
At source/target we’re no strangers to building graphs like this. What if we make it extra difficult and attempt to match the Winning ASCII plots? Can we generate an ASCII network graph from the connections?
Of course there’s a tool for that. Graph::Easy is a 16-year old Perl module that can be used to generate ASCII graphs. It’s a unique library; I couldn’t find very many libraries that exported to ASCII and it was impossible to find anything I could run directly in the browser. In a world of CSS 3.0, WebGL and WebComponents this makes sense!
It’s been a long time since I’ve written any Perl code so it was a mild gauntlet to get it to run on a laptop mainly equipped for web development. Parsing the data from Winning the Internet was a easier: there’s a helpful RSS XML feed of the latest edition of the newsletter which was easy to convert to the right format.
The bulk of the work for little tasks like this comes from shuttling between file formats. My vague plans of building a web application that built an ASCII graph from arbitrary newsletter feeds introduced a JSON step into an already convoluted pipeline:
+-----+ +------+ +-----+ +-----+
| XML |-->| JSON |-->| DOT |-->| TXT |
+-----+ +------+ +-----+ +-----+
So here we have the full Winning Graph in all it’s ASCII glory:
And yes I appreciate the irony of providing this as an image – click it for the full plain text version.
There’s a live, online version that updates with each newsletter update. Thanks to Georgi Gerganov for serving a helpful API against a Graph::Easy instance.
It’s a unique view of connected data with a number of limitations. For one, the layout seems to make nodes with a high degree quite large to squeeze in all the links. It also has zero ability for interactivity in the current form. Despite these and other limitations I think this approach has a certain -~=_=_=_charm_=_=_=~-
Netzschleuder, a new network data catalogue and repository from @tiagopeixoto
A graph <-> timeline hybrid experiment in d3.js from @markiaaan
A LinkedIn group for Women in Graph
Wikipedia corner: Krackhardt kite graph
The Knowledge Graph Conference Slack has been re-born under the enthusiastic facilitation of Ellie Young. It’s a welcoming, active group for those working with or interested in learning more about Knowledge Graphs. There are lots of ways to connect with the community but the first step is to join here.
Sameer Singh’s profile describes him as a “Network Effects Advisor” — his blog at breadcrumb.vc is home to a number of articles discussing the role networks play in company management and venture capital. I enjoyed this article on innovation and market-building through the lens of network effects.
This post from Ben Thompson at Stratechery back in July follows a similar train of thought. He notes the result of new Slack features that enable collaboration across multiple organizations:
… the more companies that take advantage of Slack Connect the more of a moat Slack has. That’s the thing about social networks: their best feature is whether or not your friends are on it, or, in this case, whether or not the companies you are working with are using Slack.
And if you’re interested in reading (even) more analysis on the graphs that underpin the largest social media companies today, I recommend this coverage of TikTok from Jackson Mohsenin.
A new notebook from Mathieu Jacomy demonstrates how distances between nodes in force directed networks remain broadly the same despite a change in node positions.
Jacomy has a treasure trove of a blog. I recommend this post summarizing a paper on tensions in the network science community, this post on what we do when perceiving networks and this post on the role of big data visualizations (“digital glitter”).
I sometimes search for content suitable for source/target on the usual sites and have to manually parse through references to graphs (read: plots and bar charts) and graphs (read: source/target).
It’s usually enough to add “network” to my query to find relevant results but it’s a bit of trial and error. I recently stumbled on a YouTube comment thread and learned this isn’t a problem in all languages:
“In Dutch we actually have 2 different words: ‘grafiek’ is a graph with a function plotted on it, ‘graaf’ is a graph with vertices & edges.” - Peter Van Camp
"In Portuguese we also have two different words. Vertices and edges are in a “grafo” while x-axis and y-axis are in a “gráfico”. - Nuno Salvaterra
"well in greek we also have two words, “grafos” for GVE graph and “grafima” for a coordinates graph, the word graph literally means “write” " - Theofilos Mouratidis
“Same in Spanish. Grafo y gráfica/o” - Alex Blanco
In Polish, there are also two different words: the plotting of a function on the coordinate plane is called a “wykres” (etymologically meaning something like ‘drawing out’), while the verices-and-edges kind is called a ‘graf’. - Kuj2
I’m interested to learn of other examples — do you know of cases in other languages?
- ----------------------------------------------------------------------
/ __ __ __ __ ___ / ___ __ __ ___ ___ /
/ /__` / \ | | |__) / ` |__ / | /\ |__) / _` |__ | /
/ .__/ \__/ \__/ | \ \__, |___ / | /~~\ | \ \__> |___ | /
/ / /
/ ----------------------------------------------------------------------
I’m finishing up this week’s source/target on the equinox, marking another milestone in a disorientating and unnerving year. Stay safe and well, I’ll see you in a few weeks.
]]>I’m Christian (👋) and this is my bi-weekly (fortnightly?) newsletter with interesting content and links orbiting the world of graph
Lewis Carroll’s Sylvie and Bruno Concluded has the following exchange:
“What do you consider the largest map that would be really useful?”
“About six inches to the mile.”
“Only six inches!” exclaimed Mein Herr. “We very soon got to six yards to the mile. Then we tried a hundred yards to the mile. And then came the grandest idea of all! We actually made a map of the country on the scale of a mile to the mile!”
“Have you used it much?” I enquired.
“It has never been spread out, yet,” said Mein Herr: “the farmers objected: they said it would cover the whole country and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.”
In 1931, Alfred Korzybski, a mathematician from New Orleans, coined the phrase “the map is not the territory” to describe Carroll’s notion of confusing models of reality with reality itself.
The reality of territories can be extremely complicated. One organization bringing this complexity to life is a Canadian non-profit, Native Land Digital. Their crowd-sourced map, Native-Land.ca, shows how ancestral territories, languages and treaties are interwoven with other ways of understanding our world. In embracing the entanglement it paints a much fuller picture of reality.
When the map website first loads a caveat is posted:
this map is not perfect
While this understatement reflects the complex nature of the data depicted in the tool, it prompts a larger question: what would a perfect map even look like? Surely it would have to be an exact replica, just like Carroll’s mile-to-the-mile map. But what purpose would this map have?
Perhaps I could re-formulate the beloved aphorism from statistician George Box:
Some maps are useful, but they’re all not perfect
Marshall McLuhan, a Canadian communication thinker echoed Korzybski in 1964 when he devised another now-ubiquitous phrase:
The medium is the message
In short, McLuhan is highlighting that the methods we use to communicate can be more significant and influential than the message we’re communicating. And it’s mediums all the way down: the content of a medium is always another medium. As McLuhan notes:
thus, speech is the content of writing, writing is the content of print, and print itself is the content of the telegraph.
For us, graphs are our maps. We traverse to highlight paths & connections. We apply common tasks to read networks and get insights. We use constructs such as the small world network to model the density of relationships in the world around us.
When visualizing graphs the depiction of the network is the medium: from the lowly static infographic to the dynamic interactive application. Our choice of medium betrays the supposed complexity of our data and signposts to others that connections need to be seen to be understood.
A perfect graph model of the world would be a vast knowledge graph of reality, a plethora of connections replicating the world in its entirety. But by choosing this medium what sunlight are we shutting out?
A long read on the resiliency of global supply chains from McKinsey
A Twitter thread summarizing a network analysis of Ireland’s climate action plan
Wikipedia corner: Five Room Puzzle
The adorably-named Little ball of fur graph sampling library
A musing on belief networks prompted by, of all things, Disney’s Wreck-it-Ralph
You’ve probably heard of the The Bechdel test; a quick litmus test to consider the representation of women in fiction. Since it’s (re-)formulation in 1985 it’s reached a level of ubiquity in film criticism and persists as an interesting rule of thumb.
I hadn’t heard of the technological-formation of the test until this week: source code could be said to pass this test if it:
US Government agency 18F attempted to evaluate their software repositories against this test and documented their process and results. I think this analysis would be interesting to map out as a network graph although I suspect code attribution would be the hardest challenge.
Movie data is eminently more available and used to great effect in this poster from Jill Marie Hackett exploring gender portrayal in film.
The write-up of the process behind the poster creation is excellent — the behind-the-scenes explanation and analysis goes hand-in-hand with the visual analysis of a dense network such as this one.
I also love the description of “guerrilla user testing at Starbucks” used as a method of getting feedback from unsuspecting coffee shop patrons. Feedback is important and who better to consult than a bunch of caffeine-addled PSL drinkers?
Although there’s significant overlap between COVID-19 topics and the topics I cover in source/target I’m a little averse to sharing COVID-19 articles for the sake of it. I therefore benched this document a few weeks ago after reading the fateful phrase “I’m not an epidemiologist, but”…
In this case, at least, it turns out my concerns don’t apply: the author, Dr. Brooke Foucault Welles, PhD is extremely qualified to apply network science to the thorny challenge of reopening schools in a “safe and pedagogically sound” way.
Reminiscent of the ground-breaking Washington Post article with mesmerizing blobs representing each other as they bounced around the screen, Foucault Welles uses multiple small networks to illustrate important details in the strategy to re-open schools. Data and additional context are also available.
I love the node styles for this Madrid subway station complaints by station. It reminds me of the visuals in this classic article from Elijah Meeks.
I’m often asked for a dead simple application to explore and visualize data in one of the many Apache TinkerPop-enabled databases including JanusGraph, Neptune & DSE Graph.
Graph Explorer from Ravi Raja Merugu is an early release of a promising application that fits that description. Check out the latest demo video here and a background article here. Definitely one to watch!
Well, here we are again. Don’t be upset but this is the end of source/target for this week. Next time we’ll be looking at some further content curation through particularly lo-fi graph visualizations.
]]>Hi, I’m Christian and this is my bi-weekly (fortnightly?) newsletter with interesting content and links orbiting the world of graph.
Happy Thursday! To all the new subscribers, welcome to the club! I recommend taking a peek at the archive to get a feel of how we do things around here.
I’m finding I have lots of amazing content and topics I want to share. I want to keep the signal-to-noise ratio high but I’m toying with the idea of sending out new editions every week. What do you think of that? I’d love if you would hit reply to this email and let me know.
Users of visual graph analytics know that the use of networks to explore data is a force-multiplier for recognizing patterns and understanding relationships. But what do we actually mean by this? What is it about this visual representation of connected data that makes it superior to other forms of visualization?
I took some time this week to look into this deceptively simple idea and I’ll be focusing today on a few analytic tasks that would be difficult (or even impossible) to undertake using non-graph visualizations.
When using data visualizations in an analytics context it’s important to remember the underlying deliverable expected from viewing the visualization. All too often you’ll see something akin to the famous “Step Three Profit” trope:
Even without the ridiculous “???” step, the above plan has so much “magic” it’s unbelievable. What insights? Do you expect to only model your data once? The finality of step three implies that there’s a logical conclusion to your work: In reality the analytics hotel will be open for business forever, even if you’ve checked out.
I’ve somehow only just realized I’m this close to ripping off the Eagles with that tenuous metaphor
Of course individuals don’t start their analytics journey outright planning the above, the magic expectations slip in, often due to time constraints. Visual graph analysis with the above approach is unlikely to result in an actionable output or deliverable. Instead, working backwards from the decision or output to be made is the key to designing insightful applications and unlocking smart analysis.
In reality the flow could look more like this:
This diagram could be drawn countless ways but note the key differences from before: we’re in a loop generating insights, there’s an expectation that we’ll need to remodel our data and there’s less hand-waving when it comes to actually executing pre-processing to set the stage for graph analytic activities.
This isn’t all too dissimilar from Don Norman’s “Iterative Cycle of Human-Centered Design” I highlighted in #13.
It also brings to mind the classic OODA loop you’ve probably heard mentioned by the gruff Chief Security Officer who was never at their desk pre-Covid
So we’ve graduated from “???” to Analytic Activities. These are collections of tasks and subtasks undertaken to understand a visualization: in our case, a node-link diagram.
We do these tasks without even thinking about them. I think it’s helpful to give them a name and identify them as the baseline of our capability when reading graphs. These considerations may seem rudimentary but it’s worthwhile going (metaphorically) “back to school” to remind ourselves of the subconscious techniques we use.
There’s extensive research and categorization of these tasks — I’m going to draw from “Task Taxonomy for Graph Visualization,” a paper from Bongshin Lee et al. and highlight a few.
It’s fun to see the familiar food web example used as one example in that paper to illustrate their points.
I hope it will be pretty clear that the following tasks are vastly easier to complete using graphs.
This is so fundamental to graph analytics it feels weird even typing this, but here goes… By modelling and visualizing data as a graph we can see that nodes connect to each other. There. I said it.
The adjacency task is an example of a core topology task: we’re using the network structure to highlight patterns and in this case we declare a relationship between two nodes.
Described as the repetition of the Adjacency task, here we identify whether a node is “reachable” from another.
Another topology task, a Common Connection task is seeing and understanding where there’s commonality in interactivity or classification, based on whether two nodes have the same neighbours.
Bongshin Lee et al. refer to Pathfinding as a “Browsing” task and to me it’s the logical extension of Adjacency & Common Connection: we can visually or procedurally trace a path between two nodes, classically used for route planning.
I think this one is even more subconscious and typically comes first: when we observe a graph we immediately instinctively recognize a number of factors: size of network, number of distinct components, distribution of disconnected component sizes
Have you ever noticed these fundamental tasks as something you do when you look at a graph or have you chalked it up to intuition?
Justin Walsh (ISS Archaeological Project) walks through the creation of this visualization of “crew relationships on Soviet-Russian space stations."
An interactive tool from Frederik Jørgensen to teach the theory behind various graph width metrics. (GitHub)
New New York Times tweets whenever a word appears in the New York Times for the first time. This new animated visualization brings the process to life.
I’ve fallen in love with this tactile, organic mind-mapping tool from Pirijan Ketheswaran, one of the co-founders of Glitch. Kinopio is described as a “visual thinking tool for new ideas and hard problems” and allows you to build and collaborate with others on rich cards of text, images and media. Links can be drawn between cards to make connections between them.
Kinopio takes the endearing Glitch aesthetic one step further as a sort of mash-up of Apple Hypercard, Web 1.0 and Japanese video games — it’s probably not a coincidence that the name is the same matches a popular Super Mario character. I think it has an incredible UX and has great potential as a learning and thought-supporting tool.
To get a feel for the versatility here are a few spaces to start: My guitar setup, My mind garden & plantasia.
Kinopio is quite distinct from the gamut of note-taking-apps-with-a-graph view (see #1, #10) primarily because you curate the graph from the outset rather than see it built up from your notes.
Check out Ben Tsai’s introduction for more details and don’t skip Pirijan’s blog for some behind-the-scenes process notes on building this dynamic tool.
Oh and here’s a depiction of this very same edition of source/target re-built in Kinopio!
Lots of blockchain visual analytics companies jumped on the “Twitter hack post-mortem” bandwagon last month to show off their credentials in the wake of a breach targeting high-profile celebrity accounts. One of the best articles I’ve seen is from Elliptic as they walk through some interesting Bitcoin graphs and what they actually show us.
Most of these will be familiar to Sourcerers (still pushing that) but I have a new project page to collect all my source/target projects in one place. My neuomorphic graph visualization prototype app from #13 now works in Chrome so check it out if you weren’t able to before.
I could have sworn I had already shared this fascinating article from a few weeks ago but I thoroughly recommend this (longish) read on the relative superiority of the social graph that’s catapulted TikTok to semi-global-domination (for now).
The results are in from my (again, unscientific!) graph drawing Turing Test from last time — looks like source/target readers fared a little better than the original study that inspired this trial: 65% of participants guessed correctly! A few people answered exactly wrong which (to steal a joke from one participant) suggests I’m actually a computer. Beep boop.
As ever I’d love to know what you think of source/target, hit reply to let me know.
Stay safe & see you soon!
]]>If you’re new here, hello and welcome! Pull up a chair. Do you want some coffee? How’s your day going? This week I’m covering an interestingly graphy take on the Turing Test you’ve probably know all about. I’ve also collated a number of interesting projects plucked from the internet — let’s get to it.
As some of you source/target old-timers (sourcerers? targeteers?) may recall I originally expected the newsletter to dig into releases and updates to graph technologies. Six months in, I’m finding myself less interested in that and focused more on the intersection between graphs and other topics. I think the result is a newsletter that’s little more compelling than just a record of point releases from graph database vendors.
For the graph database vendors reading, ignore the above. I love your point releases and read every single changelog, I promise.
I also don’t want to be someone pretending to know absolutely everything about graphs. The edict of “write what you know” is more apparent to me than ever and there’s a lot I don’t know about graphs. source/target is a great outlet for me to learn new things. My intent is to bring you, dear reader, along for the ride.
The domain of graph drawing is a particularly deep one that I’ve barely scratched the surface here. Thanks to Patrick Mackey for drawing my attention to the recent paper entitled “The Turing Test for Graph Drawing Algorithms” by Helen Purchase et al.
You’ve probably heard of the Turing Test, especially if you’ve seen Blade Runner (I haven’t). If not, let’s look to Wikipedia:
a test of a machine’s ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human.
Reasonable enough. Out of interest, here’s how Simple Wikipedia describes it:
a test to see if a computer can trick a person into believing that the computer is a person too.
It’s a definition of broad strokes but I like this version a lot. The spin that the computer is misleading humans is delightful.
The Turing Test is one of those evergreen scientific concepts referenced and applied across a gamut of different topics. I think of it as Schrödinger’s Cat but for Computer Science rather than Quantum Mechanics.
“When I hear about Schrödinger’s cat,” Stephen Hawking once said, “I reach for my gun.” By the way, when did naming something after or invoking Turing become such a lazy attempt at implying intelligence? Was it Turing Pharmaceutical?
2014’s Alan Turing biopic “The Imitation Game” starring Benedict Cumberbatch takes its name from the original paper that introduced Turing’s test: “Computing Machinery and Intelligence.” There is some disagreement on whether the paper describes a single or multiple forms of Tests but in a nutshell the idea is the same: can we be tricked by machines to think they are human?
Alright, alright this was just an excuse to shoehorn that awful graph pun into the title
In “Most Human Human” Brian Christian ponders the boundaries of the Turing Test. Christian describes his research and preparation to enter the competition known as the Loebner Prize—a sort of championship to build the best-performing chatbot to pass a restricted form of the Turing Test. The title of the book comes from the twisted prize he intended to compete for: the most convincingly human contestant–chatbot or actual person–is awarded the title of “most human human.”
By focusing on the aim of tricking judges to believing that a chatbot is human the Loebner Prize is incidentally closer to the Simple Wikipedia definition of the Turing Test, especially considering a number of restrictions applied to the format.
Purchase et al. have applied the idea of the Turing Test to the drawing (or arrangement) of nodes and relationships in a graph. Can a human be tricked to believe that a graph layout was manually arranged?
In our experience as graph drawing researchers, it is often preferable to draw a small graph ourselves, how we wish to depict it, than be beholden to the layout criteria of automatic algorithms. The question therefore arises: are automatic graph layout algorithms any use for small graphs? Indeed, for small graphs, is it even possible to tell the difference?
The authors went to admirable lengths to make the test as fair as possible. They:
The conclusion reached by Purchase et al. was as follows:
In general, over all graphs and algorithms, participants can correctly distinguish hand-drawn layouts from algorithmically created ones: graph drawing algorithms (in general) effectively fail the Turing Test.”
There are a few caveats to this conclusion:
However, we did not find evidence that force-directed and (marginally) [“stress-based”] algorithms could be reliably distinguished from hand-drawn layouts – they therefore effectively ‘pass’ the Turing Test. We speculate that this is the case because of the prevalence of these algorithms in the popular media (e.g., for depicting social networks)
Taking the opportunity to canvas their audience, the authors also asked their participants a supplementary question: which of the two graphs was better? Deliberately not defining “better” they looked to record the subject’s instincts when it came to subjectively preferring graphs.
The participants were found to, on average, prefer the human drawing. Of particular interest were the results for orthogonal drawings which were both considered “worse” than hand-drawn graphs and did not pass the graph drawing Turing Test.
How do you think you’d fare in this test? Which of these two graphs is drawn by a human and which by an algorithm?
I’ve put together a short, completely unscientific experiment. Click here to make your guess and find out the answers.
Built from scratch with React + SVG manipulation Graphisual (video) from Lakshya Thakur is a pretty good attempt at a graph prototyping web app
Friend of the newsletter (Sourcerer) Dave B showcased visualizing a Neptune Graph Database using VisJS & SageMaker in a recent stream
Karate Club is a new unsupervised machine learning extension library for Python & NetworkX with an inspired name
Human Disease Network from Anjushree Shankar, featured as Tableau’s Viz of the Day last Monday
The speakers and talks at the (online, obv.) GraphQL Summit were of a consistently high quality. In particular I loved Ashi Krishnan’s coverage of federation, in short the blending of distinct GraphQL schemas into a unified index. The talk was equally interesting and mesmerizing as they included animated visuals built using 3d-force-graph to bring the content to life. I felt the visuals really helped to allow time for the technical content to breathe.
I stumbled-upon this “use this to get your friends into Math Rock” graph on Reddit this week. I’m not sure of the ultimate source but I love the mix of structured links with irreverent commentary.
I spend a lot of time procrastinating. A lot of that procrastination time is spent wondering whether the software I use is the absolute best for the task in hand. In the olden days when “email” didn’t largely just mean “Gmail”, I used to spend a lot of this time trialing and testing various email clients to ensure my workflow for triaging my inbox and sending emails was as ~efficient~ as possible. Yes, I’m aware this was a problem.
A particular favorite of mine was (and probably remains) Mozilla Thunderbird; appealing due to the rich ecosystem of plugins, each one promising a “level-up” of capability unmatched by your Outlook Expresses of the day.
Depressing news this week of 250 layoffs at Mozilla. If you’re hiring check out the directory of Mozillians looking for a new role.
Multi-person threads of cc-ing, fwd-ing, bcc-ing, re-ing & reply-all-ing are almost a literal nightmare for me—especially when Gmail sputters to a halt when attempting to show me chains of 30+ messages.
Threadvis brought a little bit of calm to these frayed threads and embeds an interactive graph of your current email thread nestled between your inbox and message pane.
All that’s to point you to this interesting post from Ryan Bell on the Wolfram Community forum effectively re-creating an alternative Threadvis in Mathematica.
Thank you for making it all the way to the very end. While you’re here why not forward this to a friend? Full newsletter archives are available for delayed consumption.
See you in a few weeks! Anyone else looking at the calendar and shaking their head in disbelief that it’s basically September?
]]>Hi, I’m Christian and this is my bi-weekly (fortnightly?) newsletter with interesting content and links orbiting the world of graph.
I’ve been busy this week with lots of little tiny changes to the source/target archives—the sort of little tiny changes that no one will ever notice but still take a remarkable amount of time. I think after over 10 years of hacking around with CSS I should probably take the time to learn the basics.
Following some unknown pull of the universe, I downloaded the developer beta of the latest macOS this week. Named after Big Sur, California, it’s slated for release later this year.
I don’t usually bother guinea-pigging these things but something about the promise of a fresh UI drew me in.
For Big Sur and recent macOS releases the common refrain from users is that Apple is making their desktop OS into a mobile operating system piece by piece, one curved bezel and enlarged dialog at a time.
Updates to the interfaces of the products we use are, as a rule, immediately and vigorously objected to by users. This backlash to new designs can sometimes be the death knell for a website or company. Just ask Digg: the social bookmarking site had enjoyed meteoric growth until 2010, when it began to die a slow death after a complete rewrite and slight functionality pivot left it’s millions of daily users cold. It’s still limping along in it’s latest incarnation, but not long for this world.
Success of other migrations are harder to evaluate: Strava took two years to renege on the removal of it’s “chronological feed”, which forced users to see their friends’ activities in an inscrutable order determined by some omniscient algorithm.
oh look, Cody ran 3.2km three months ago. Thanks Strava
If you scroll down to the comments on release announcements it won’t take long to find a detractor sharing a familiar sentiment:
“It’s like they’ve never even used the site themselves”
This of course isn’t the case. Unless a company is particularly negligent a whole process of design and re-development has been undertaken by a bevy of designers, editors, developers, engineers and managers.
This doesn’t mean the organization is unimpeachable in their decisions, it’s just they deserve the benefit of the doubt. You have to assume the organization is attempting to maximise some important metric with their new interface. It’s often the metric itself that’s ultimately misguided.
In The Design of Everyday Things, Don Norman introduced the “Iterative Cycle of Human-Centered Design” — a flywheel of making observations of a set of target users before generating ideas, prototyping, testing them before returning to observation. I see it as a flywheel because each stage of the process strongly influences the next as well as other concurrent product design processes.
Originally published in 1988, it’s an exceedingly well-regarded book that looks at the relationship between users, objects and products. One takeaway (there were many) that I found particularly humorous was Norman’s law, a succinct and familiar description for anyone who’s ever designed and built a product.
A project is behind schedule and over its budget the day it is started.
This and many other observations have changed (in a small way) how I see the world. I therefore shouldn’t be too surprised to have found some overlap between Norman’s book and Atomic Habits by James Clear.
perhaps by design (pun intended)
Clear outlines how small incremental improvements to an everyday routine that can make it easier to start and stick to a new habit. One example is that by designing your living space to make it easy to take the steps required to accomplish your habit—a guitar on a stand, an already-rolled yoga mat—you’re setting yourself up for success
Norman taught me that good design is hard to get right and Clear affirmed that, when executed well, it has the ability to be both transformative and invisible.
I’ve discussed how fresh takes on the design of popular products can be polarizing. In the last few years, Apple has moved away from overt skeuomorphism, a fascinating attempt to take real-world interfaces and transplant them into digital spaces.
The more egregious examples of this are long gone but examples of skeuomorphism persist, whether it’s the floppy disk save icon or the manilla file-folders that very few people use in their day-to-day. In his article “The Comeback of Fun in Visual Design” Michael Flarup highlights the playful aspects of Apple’s design shift in 2020 and showcases some modern skeumorphic components of icons.
Neumorphism caught my eye earlier this year as a bold, somewhat polarizing take on user interfaces. Reminiscent of pristine sci-fi dashboards and Swedish portable synthesizers and looks both futuristic and tactile.
For equal parts academic interest and fun I wondered what it would be like to apply neumorphic techniques to a network graph. Inspired by Adam Giebl’s interactive tool I put together a similar tweakable app to play with some of these stylings against the classic Les Miserable character network dataset.
![](/optim/assets/st13/3.png "Built in React using Semiotic, dat-gui-react, inspired by neumorphism.io and with data from Observable)
“I didn’t have time to make it work in Chrome, so it just works in Firefox”
I think my little app misses the point a touch. The interactive knobs and scales to change the design are inarguably the bits that should be neu-ified rather than just the network graph itself. Nevertheless I think it’s a novel take on network graph design that refreshes the playfulness of an interactive visual analysis application.
Climate change shifts — new to me, this article presents a number of origin-destination maps giving an idea of the projected temperature of towns across the US in 2080. (h/t Luz Ka)
At first I wondered why there weren’t any arrows to indicate direction of change but then I realized the reason for that should be obvious.
This week I’m listening to the audiobook of Andy Greenberg’s Sandworm, a riveting true-crime take on the introduction and growth of various cyberwarfare techniques over the last 50 years.
On Wednesday millions of Garmin device users lost access to their accounts due to a suspected cyber attack. Customers are only just managing to sync their fitness activities after five days and Garmin has reported that there’s no indication that any customer data was accessed by the attackers.
As the outage was initially reported, Daniel Cuthbert took some time to explore the footprint of Garmin’s servers using network forensics tool Maltego and various web services. He tweeted some of his findings along the way.
Brandon Locke has released an open source tool NERTwork to extract, label and visualize the co-occurrence of entities in text documents. He’s provided comprehensive steps to run the tool and a few choice examples of visualizations resulting from applying NERtwork to some key historical texts.
Co-occurence networks are a fundamental technique for understanding the shared connections between large collections of documents through the common extracted entities: in this case people, places and organizations.
NERtwork uses the Stanford Named Entity Recognizer (NER) which is a great tool but can produce exceedingly messy extracted entities. Locke recommends using OpenRefine to make the process of moderation and refinement a little easier.
One of my favorite newsletters is “Data is Plural” from Jeremy Singer-Vine — it’s a curated list of datasets sent out every other week.
Data is Plural may perhaps be the first newsletter I actively subscribed to. What was yours?
Recommending Data is Plural in my newsletter is an example of what I think of as “curation-all-the-way-down”. Individuals & organizations lovingly curate rows and columns in their datasets. These are in turn curated and promoted by others with a keen eye for interesting content. Then I’m here just curating the curators.
Shared the other week was the citation graph from the Caselaw Access Project; this dataset gives a full summary of the legal cases cited in court decisions. It’s a vast dataset covering 43 million citations.
Caselaw Access Project frequently release visualizations of their graph and I found this map and grid (read: adjacency matrix) view to be a particularly interesting aggregation of some core patterns in the dataset. Certainly a lot easier to digest than the 43 million source records.
This reminds me of the viral “Every State’s Least Favorite State” graphic from earlier this year—a visualization that would be greatly improved by shedding the confusing colour legend and using, I don’t know, a graph or something.
Whew, I wrote more than I expected this week. Let me know if you found it interesting! See you again in two weeks ☀️
]]>Hi, I’m Christian and this is my bi-weekly (fortnightly?) newsletter with interesting content and links orbiting the world of graph. I’d love to hear your feedback and suggestions—hit the reply button to let me know what you think.
Years ago I collected, and apparently committed to memory, words that define words. I don’t mean your garden-variety grammatical “verbs” or “adjectives”. Instead I was drawn to the polysyllabic language one could use to describe things we write and say every day.
I know, weird flex but okay.
Take paronomasia; it’s an uncommon word but you’ve almost certainly heard the other name for it. It’s a long way of saying “pun”—my false etymology is that it’s a contraction but “pun” and “paronomasia” are just two words that mean the same thing. “Pun” probably evolved from “punctilio” which sounds more like something you’d take out on the lake, espresso in hand.
Incidentally, Wikipedia reliably informs me that “graphomania” is an example of a pun in Japan—it’s not clear what this means but it’s also a pretty good never-ran name for this newsletter.
Synecdoche was another word. It means taking a small part of a thing to refer to it’s greater whole, for example referring to credit cards as “plastic”. Before the movie (and, let’s be honest, even now) I read the word as “Sigh-Neck-Dosh”— a result of reading these definitions but never using them in speech.
I know a bundle of these words and they surface in my mind more often than I’d imagine. They often come in contrasting pairs: initialisms/acronyms, anaphoric/cataphoric and hyponyms/hypernyms.
To make an anaphoric reference is to link or callback to a previously used concept. Cataphoric is the other way round: a reference to something that happens later. In most media these callbacks bring real depth to the material. A joke from a standup comedian made 45 minutes into her set, calling back to the beginning, lands that much harder. I liken this to lightning bolts of references in the heads of the audience; the synapses are firing.
I listened to a short podcast interview with Justin Duke last week, the creator of Buttondown, the service I use to send out source/target. I’m half a year into my newsletter journey and the podcast emphasized to me the importance and value of writing on a regular schedule. It’s satisfying to see the number of previous editions build up and the habit of curating, writing, editing (hi Katharine!) and re-drafting editions is helping me grow as a writer.
Writing content on the same topic every newsletter, I naturally find myself referring back to previous editions. The value of this is two-fold: new subscribers probably haven’t read the archives, but if I’m writing about something of interest to them they may find previous discussions worth exploring; secondly, I think it helps fire those lightning bolts for those who’ve subscribed all along. Turns out, week by week, I’m building the foundations of a reference graph.
If I extract each of these self-references I get a view that looks like this:
![](https://buttondown.s3.us-west-2.amazonaws.com/images/3c8b5784-19ce-4c15-9cbf-c3b513ff0e7c.png "An arc diagram of links back to previous newsletter editions"An arc diagram of links back to previous newsletter editions)
I chose to use an arc diagram to retain the chronological ordering of the editions from top to bottom.
Extracting links and other tags from the “World Wide Web” of linked pages is a decades-old practice—here’s a graphy example from just this week. As I take stock of the first six months of writing source/target I feel particularly reflective. As write more it will be neat to see my self-reference graph slowly extend down the page with each and every edition.
Hat tip to my esteemed colleague Phil Rodgers for sharing this flowchart diagram of relationships between penguins (and humans) at Kyoto Aquarium. It’s a truly fascinating depiction of the apparently intricate love lives of penguins.
There’s a high resolution version of the chart available and an unofficial English translation.
Posted a few days ago, a previous version of the chart went viral last year and spawned this CNN article, probably one of the most mainstream articles to ever describe graph analysis.
I binge-watched a 49-episode season of Terrace House last year and would gladly commit to the same length or more for the penguin version.
Lots of readers were taken with my lyrics graph visualizer from last time. One particularly inspired observation was describing the commonly-found “circular loop” graph as a “clover”. I think there’s potential for more with this app and I definitely want to add more features in the future.
I’m loving the accelerated innovation and efforts being made to turn virtual conferences into more playful, organic spaces for discovery and networking. Yotribe is one example of this: a visual chatroom where attendees can wander between spaces and strike up conversations. It’s reminiscent of Habbo Hotel but suitable for a work environment.
“Habbo Hotel? Oh, it’s like an isomorphic, pixelated hotel, but virtual and over slow dial-up internet… Hello are you still there?”
I’m not sure about the name; Google autocorrect keeps changing it to “YouTube” which is probably an instant fail when choosing a name for a company.
This isn’t just an attempt to refer back to as many old editions as possible but this interactive visualization of European migration flows is an alternative depiction of a similar dataset I shared back in March. There’s a full write-up of the methodology from authors Francisco Rowe & Nikos Patias along with accompanying code.
Thanks again for subscribing, I hope you enjoyed this edition. Please continue to send suggestions and thoughts to this email, I love hearing from you.
Stay safe, see you in two weeks.
]]>Hi, I’m Christian and this is my bi-weekly (fortnightly?) newsletter with interesting content and links orbiting the world of graph. I’d love to hear your feedback and suggestions—hit the reply button to let me know what you think.
In 2017 I became obsessed with the music of the Canadian band Destroyer; led by the singer and lyricist Dan Bejar. The name betrays a popular-leaning discography of albums ranging from glam-rock, obtuse chiptune orchestras, yacht-synth and Spanish folk guitar.
Destroyer isn’t for everyone. The biggest detractors cite his laconic, sometimes monotonous delivery as a barrier to entry. I found the music to meander in a way I found peaceful and compelling rather than boring. As I slowly listened to more and more albums from the Destroyer back catalogue I could see a whole world of lyrical motifs that became more apparent over time. This lyrical world-building is something I appreciate in some of my other favourite artists like The Mountain Goats & Owen Pallett.
When I started to pay more attention to the lyrics, I could see references to previous album names, future album names, other bands, Destroyer song titles, repeated references to the band name and even meta references to the song that’s currently being sung. Bejar is well known for these references littered through his music—so much so there’s a drinking game.
The connections between Destroyer songs and albums are so dense that I thought a lyrical analysis would be an interesting project to explore. I booted up a Python Jupyter notebook, scraped all the Destroyer lyrics from Genius and looked to generate n-grams from all of the lyrics and album titles. An n-gram is a basic Natural Language Processing (NLP) technique to split text into chunks of n-grams or collection of # n words. For example extracting the n=3 grams of the following sentence would give me the following:
[tell your friends about my newsletter] ⇒
[ [tell, your, friends], [your, friends, about], [friends, about, my], [about, my, newsletter] ]
On it’s own this isn’t particularly interesting, but I figured that if I were to look at the most common n-grams for all songs, I would get an cool breakdown of common phrases and themes.
It turns out that my approach was probably a little naive when it came to getting the output I was hoping for. In my (limited) experience, NLP leads to slippery slopes involving tweaking model parameters and data cleaning functions. I created a pipeline of text transformations that became quite unwieldy.
I had to put a pin in this project, but last month I felt compelled to dive back in. I had been wondering about applying a similar technique to any song or artist rather than the full repertoire of one artist. Instead of worrying about the connections across songs, what if I focused on the lyrics within a single song?
The new idea was to look at each word in a song and build a graph that focused on the connections between words based on the order they’re sung—not dissimilar to a Markov chain. A network of words would be naturally created as common words are linked across verses, bridges and choruses.
[ Very superstitious Writing's on the wall ] ⇒
[very → [superstitious] → [writing's] → [on] → [the] → [wall]
Inspired by Andrei Kashcha’s work and the vast collection of songs this technique could be applied to I built a webapp that does the following:
I scaled the word sizes by their relative occurrence and did the same for the link widths for frequency of word pairs. I’ve only visualized a small collection of songs but it’s been fascinating (and addictive) to see the variety between various songs. You can check it out yourself at https://cjlm.dev/lyrics-graph/ but here are some notable examples:
The arc in the top left is a pretty common pattern to see in these networks, typically showing the verses of songs.
Harder, Better, Faster, Stronger by Daft Punk
Unsurprising for anyone familiar with the song but quite an amusing, tight, network.
This is perhaps the closest graph I have found to a “single line” song—a reasonably-lengthed one for which there are no repeated words.
Of course I had to try it on the original inspiration for the project:
Painter in Your Pocket by Destroyer
So what’s next with this project?
There are some tweaks that could be made to the data cleansing to include or exclude specific stop words depending on their importance to the song (check out Take on Me by a-ha and try and spot the title in the lyrics!).
I’d also love to generalize the graph creation to work for full albums but will need a strategy for the over-linking of common, but uninteresting phrases.
Finally there’s always more work that can be done on the visuals. I like the minimalist aesthetic but highlighting other data points provided by the analytics would definitely take it to the next level. I wonder if graph diameter or similar metrics could give an interesting statistic for the lyrical form of your favorite song?
The popular webcomic xkcd gets more and more apt by the day. The recent comics on COVID-19 were a highlight for me. I don’t check the site regularly, so when I do I really enjoy binging weeks-worth of dry, science-adjacent humor. I often find the payoff of each comic is actually in the alt-text of the image itself.
Flicking back a few weeks I came across the comic focusing on carcinization, a great term for—well—I’ll let Wikipedia explain it in a little more depth:
an example of convergent evolution in which a crustacean evolves into a crab-like form from a non-crab-like form. The term was introduced into evolutionary biology by L. A. Borradaile, who described it as “one of the many attempts of Nature to evolve a crab”.
Isn’t this fascinating? Vaguely reminiscent of simultaneous invention (see also: cadmium, calculus, and, uh, Dennis the Menace) it turns out there are quite a few examples where various organisms have independently evolved into their crab-like form we know and…love?
Here are a few examples:
Exploring carcinization in graphs is the obvious next move here—sideways perhaps?—and there’s a variety of literature on the topic, including this excellently-named paper from 2019: “What is it like to be a crab?”. In true source/target meta fashion here’s a graph utilizing Connected Papers highlighted in the last edition to show prior art for this article, mainly papers on the use of complex networks for similar analysis.
Itching for more crab content? The eye-catching crustacean graph above accompanies this essay in National Geographic on the fascinating life of crabs.
For a visual cleanse after all that crab imagery, check out this (I swear I’m not sponsored by Nat Geo) short article on the family tree of citrus fruits. I found this through the fruit version of the “this X does not exist” trend. Check it out for some truly bizarre fruit mashups.
Here’s an introduction to link analysis from a journalistic lens. I never saw Channel 4’s Who Knows Who project. It looks like it’s been offline for a while and I have no idea how I’d run a Flash applet in 2020.
A small project from Nicholas de Jong to generate and visualize circle-of-trust networks from keybase.io—acquired by Zoom earlier this year.
I’m coming around to videos as a particularly efficient way to learn new skills. After my foray into Gephi last edition I found these great videos from Mathieu Jacomy that would have been very helpful when I was learning the ropes.
Thanks again for subscribing, I’d love to know what you think. Oh and don’t forget to send me lyrics graphs you find interesting. There’s (possibly) a prize for the first person to find a fabled “single line” song…
Stay safe, I’ll see you again in two weeks.
]]>It’s been almost a month since I’ve emailed—it didn’t feel right to send out source/target two weeks ago. I was torn between sending a note of solidarity and support for Black Lives Matter and adding to the noise when I should instead be listening.
As ever, continue to send me any links or content that you think your fellow subscribers would find interesting. I’m particularly interested in highlighting the work of marginalized creators in the field and I will showcase and amplify these voices in my newsletter.
Last time, in source/target #9, I discussed the role of moderators on social networking websites like Reddit. I was particularly interested in the news and fallout from the release of an incendiary list of moderators that pointed to potential content collusion across subreddits.
David Pierce at Protocol has written an in-depth article of some of these issues and others at the end of last month and it’s worth a read. I figured it would be fun to take some time to do my own original analysis for source/target.
My first step was to do some web scraping of RedditMetrics to get a list of subreddits in descending order of popularity. I figured it made the most sense to focus on the most popular subreddits as they have the furthest reach on the site. Other explorations of the moderator graph have focused on the smaller communities.
I ended up downloading an ordered list of nearly 20,000 subreddits—I took each of these in turn in turn and parsed them to get a list of moderators using Reddit’s built-in Moderators API. I really shouldn’t have been surprised by this but there are a lot of subreddits and, in turn, a lot of moderators for those subreddits.
Due to the rapid pace of change online this data is already stale; as noted by Pierce some moderators have quit Reddit as a direct result of the public posts about potential moderator collusion. Regardless we still have an interesting snapshot of moderators from the middle of April.
I usually start with the most naive data model possible to get an idea of the form of the graph I’m working with. In this case it’s the moderator, subreddit network.
I took this opportunity to try the JavaScript/TypeScript Open Source library graphology, a sort of “standard library” for working with graph data. I like to leverage existing libraries whenever possible and the graph API is well thought-out and allows me to define and manipulate a graph structure with NodeJS. This means I can write minimal code to export to common formats for other libraries and tools. These tools include Gephi for some heavy-duty visual network analysis lifting and various web libraries once I have an understanding of the data and wish to share it online.
Upon loading my data into Gephi the first thing I spot is the large amount of bots in the dataset. These aren’t as malicious as you may expect; bots play a valuable role in the Reddit ecosystem as moderators or in detecting bad behavior. As helpful bots are self-declared (they typically have “Bot” in their name), it’s easy to do a naive removal using their names. Bots are interesting to me but not too relevant to this exploration of human moderators, so I pruned them out of the dataset.
Next up for pruning: the science and askscience subreddits, which have the most moderators by a large margin.
It’s unlikely that a few moderators of these subreddits would have a disproportionate amount of power so I took them out of my visualization.
So, after these manual tweaks what does the network look like? This is my first time trying Gephi with large data sources so it took me lots of trial and error clicks in the GUI to get something that look reasonable. Part of the challenge is providing the right parameters to the ForceAtlas2 layout to produce a visual that isn’t just a cluttered mess. I’m not convinced I succeeded, but for now, here’s a look at the network of 82,578 nodes and 96,297 edges. Pink nodes are moderators, green nodes are subreddits.
As you can see above, there’s a very long tail of smaller subreddit communities here. To filter these out and focus on the most influential subreddits, I sized nodes by Page Rank and reduced the network down to the most commonly-connected nodes and edges (around 4% of the full network). Now, a number of prolific moderators against key subreddits jump out.
My work on this so far has been fairly rudimentary but after tinkering with various tools I have a stronger understanding of the dataset I’m working with. Stay tuned for further exploration of this data next month. In the meantime, if you have any helpful materials to get the most out of Gephi or parameters for your favorite layout algorithm I’m all ears!
Long-time readers of this newsletter will recall the first edition focused on networked-thought tool Roam Research. Since then Roam has exploded in popularity and there’s a prolific set of individuals and tools in the space scratching the connected-note-taking itch.
Recent addition of a bare-bones theming capability appears to have unleashed a wide array of creativity from users. Particularly exciting is this CSS + JavaScript attempt at interactive, live references between writing and sources. It’s a wizzy, modern take on “zippered lists” of compounded documents, a form coined and popularized by Ted Nelson back in 1960.
A number of Roam alternatives and complementary tools to be found in this collaborative Notion database. One that caught my attention due to the fast development cycles, support for offline notes and transparent pricing is Obsidian. Notes in Obsidian are written in Markdown and have the wiki-style, bi-directional links that help to build a network map of knowledge.
Obsidian has a strong WebGL graph component that scales well and works with a higher volume of notes than other tools. As with Roam, it shows how topics can be useful in connecting potentially disparate notes, both with recollection and discovery of themes across your work and research.
Speaking of research, a new visual web application “Connected Papers” was released earlier this month and makes good on the oft-referenced potential of drawing connections between academic papers, their authors, related citations and prior work.
These sorts of graphs have been promoted heavily in the COVID-19-related marketing sphere—here’s a similar static example of a co-citation graph. I’m impressed by the design and accessibility of Connected Papers. See here for more context on how the graphs are built and check out the selected examples on their homepage.
For more context on the connected notes phenomenon and how it relates to the world of knowledge graphs, Anne-Laure Le Cunff’s new essay is a wonderful look at the history of “maps” with a number of arresting examples.
This is definitely a candidate for the topic of a future source/target edition. Do you have any favorite family trees you’d like to share? It’s amazing to think of each symbol in this chart as a real person in the real world and to reflect on their individual impact.
Central to the graph is the Normal distribution but there are interesting discussions in the comments on how this graph representation reflects the relative “centrality” of each distribution.
Turns out this is a relatively simplified version of the relationships between distributions. Check out this site for an interactive version.
Lju Lazarevic (Neo4j) has been diligently streaming and documenting the creation of a “Wine Graph” in Neo4j, it’s not too late to join their streams on the official Neo4j Twitch channel.
Stay safe, I’ll see you again in a few weeks.
]]>Part of my motivation to start this newsletter was to give myself an outlet for working on and sharing personal graph projects. Eight editions in and these mythical projects have failed to materialize—until today!
In his 2008 book, “In Defense of Food” author Michael Pollan provides the following seven words as diet advice:
“Eat food, not too much, mostly plants.”
Craving more? As Pollan noted in an interview with NPR:
“That’s it. That is the short answer to the supposedly incredibly complicated and confusing question of what we humans should eat in order to be maximally healthy,”
Pollan’s advice may be short but each clause does some heavy lifting.
This is an example of a purposefully simplified approach to take for personal moderation but what about moderating the behavior of others? A familiar example of this task can be seen in moderating online environments where individuals post and share content. Ideally, such chat rooms, forums and comment sections would have a code of conduct that each piece of communication can be judged against. Unfortunately, the real world isn’t that simple.
Moderators tasked with keeping conversations on-topic can wield a seemingly-intoxicating amount of power online. A moderator of a niche group of, say, Doctor Who fans on a private server, may be able to censor or remove content with an assumed superiority over non-moderators in the group. I’ve mentioned it before, back in source/target #2, but this recent article on the moderation approach taken by Hacker News is particularly interesting.
There are a variety of tools at a moderators disposal, depending on the website in question. These tools can be wide-reaching: from rudimentary account management steps like disabling accounts to strange psychological approaches like shadow-banning.
In my nostalgia-tinged image of the internet it’s easy to assume that most individuals online are acting in good faith, but in reality, there are quite a few factors at play. Apparent anonymity (or pseudo-anonymity) on online forums (fora?) can give individuals a sense of invincibility as they provoke or “flame” others. A less aggressive but more widespread version of this has entered the popular lexicon as “trolling”—individuals who purposely cause discord for fun or malicious political, societal reasons.
Motivation for questionable behaviour online can also be borne out of financial incentive. Seemingly helpful messages could be obscuring a brand or company planting seeds of good intent and familiarity with users. You can find many possible examples of this on Reddit’s “r/HailCorporate” subreddit (one of many sub-communities on the site), a community with the unofficial slogan: “Let us show you the ads you didn’t know you were seeing.”
In online moderation there are a few different types of collusive, pre-meditative bad behaviour. Sock-puppeting refers to the coordinated effort to present multiple opinions or personas that give the impression of a wider base of support. Users are said to be brigading if they rally others to manipulate or influence decisions or actions. Examples of this can vary wildly. For example, it could be through organized downvoting to reduce visibility of an opinion or the harassment of an individual in the real-world. Incidentally, brigading is outlawed as per “rule #2” of Reddit.
Front Page of the Internet
Reddit has had its own share of content and policy challenges since inception. This has led to moderation challenges and the site has often found itself on the backfoot as it responds to allegations of bias and general failure to act. One incident that comes to mind was from 2016, when the CEO was found to have edited the comments of others in various posts on the site. This, unsurprisingly, led to outrage from users as they realized their comments and messages weren’t as immutable as they perhaps thought. This illuminating interview with ex Reddit product head Dan McComas was striking, especially in his summary of the overall contribution of the site:
”I Fundamentally Believe That My Time at Reddit Made the World a Worse Place”
Just last week a screenshot was posted on Reddit (and reposted many times) that drew attention to the overlap between a small number of moderators on some of the most popular subreddits. This revealing screenshot in turn sparked moderation as it was seen as an example of targeted harassment against the moderators themselves. When I saw this screenshot I immediately thought the one-to-many mapping of subreddits to moderators would lend itself quite nicely to some graph-thinking. There are hundreds of thousands of subreddits moderated in turn by thousands of individual users—the overlap between these would surely show interesting trends and features of subsets of the Reddit community.
As both Reddit and graphs have been around for a fair amount of time there’s a vast array of prior work on this topic. Here’s a great approach from 6 years ago with an accompanying list of other examples. The most recent example graph analysis on the topic I found was this from last week.
As the data for these examples are either not readily available or somewhat stale I took the opportunity of a long weekend at home to brush up on my web scraping skills (trying out a few new technologies in the process). I used RedditMetrics to get a list of subreddits in descending order of popularity and parsed through each in turn to get a list of moderators using Reddit’s built-in Moderators API.
I find it difficult to restrict the scope of a graph-related research project, especially before getting a feel for the shape and volume of data you’re set to be using. After a weekend of scraping, munging, modelling and early visualisation I’m ready to take a deeper dive into the 2020 subreddit/moderator graph and look forward to sharing more with you in two weeks.
I love Observable as a platform for guided code walkthroughs with editable and interactive web examples. In this notebook Tom Shanley has gone into considerable detail as he walks you through his deconstruction and exploration of additional capability for his d3-sankey-circular fork of the d3-sankey library.
I read a letter to the editor today from a 17-year-old who described the challenge of working with “old technologies” like email. If email is “old” they have probably never heard of "ARPANET” – the precursor to the internet. Here’s a charming, hand-drawn, origin-destination graph of it’s form fifty years ago :
There are some extremely fresh, innovative data visualization approaches in this Interactive Data Visualization Final Project showcase from MIT. One of my favorites is this example of a graph exploration of the frequency of Chinese words from Beining (Jenny) Zhang.
The UN has suggested that COVID-19 may force the closure of one in eight museums globally. Also from the MIT showcase check out cool gamified depiction of art gallery data in graph form from Diana Nguyen & Darius Bopp. You can try it out live here. I’ve played with the extremely comprehensive Rijksmuseum API in the past—exposing this data to the public is a savvy way to promote and raise awareness of these important institutions.
Thanks again for subscribing. Stay safe, you’ll hear from me again in two weeks.
]]>The newsletter is out a day late this week as I was attending the virtual Knowledge Graph Conference and wanted to draw from the whole event. A warm welcome ( 👋 ) to all those who found source/target through the conference. I hope you like the newsletter, you can check out the archive here to see all the previous editions.
Don’t forget you can respond directly to this email if you have any feedback or suggestions—I’d love to hear from you.
As a teenager I was an avid reader of two magazines that I’d buy from the newsagent in my town: the Beano and Computeract!ve.
For those who aren’t familiar, the Beano was the home of the famous comic book character Dennis the Menace. I bet you’re picturing Dennis right now. Frankly, most of you will be way off. Some argue that the American version of Dennis the Menace was the original as the character was conceived 5 days before the British version. Of course these people are just plain wrong.
Anyway, I digress. Computeract!ve —the other magazine I’d buy—appealed to my young, computer-obsessed mind. It covered the confluence of two very exciting new developments: computer software and the internet. Check out these retro covers to get a flavour of the contents:
The magazine proudly noted a recommendation from the “Plain English Campaign”, a group that promotes the use of simple, easy-to-understand language in all content. I used to think this was some sort of ultimate badge of honor for publications but looking into it now it’s a curious little group.
In the past few years I’ve come to recognize writing as an important skill worth practicing and honing, especially when you’re looking for maximum clarity. It’s especially important in the technology domain as it’s easy to “over-write” or rely on jargon and domain-specific language. If you’re interested in writing, I recommend the classic Zinsser book “On Writing Well.” It helped me understand that words that don’t contribute to the overall message can often be removed from technical writing.
As regular readers will know, my background is primarily in graph theory and industry applications of graph analytics and visualization. My go-to graph database uses a model commonly known as a Property Graph: a deceptively simple, intuitive model for building and reasoning with connected data.
I wasn’t quite sure what to expect from the Knowledge Graph Conference. I’ve had some exposure to “Knowledge Graphs” before but my understanding of them paled in comparison to Property Graphs I work with day-to-day. My aim was to gain a greater understanding of the delineation between the two.
My first takeaway was that I’d naively walked into a hot topic that has been discussed at length for decades.
But first of all congratulations to the conference organizers; it’s heartening to see a conference move to virtual in a way that didn’t prevent the sharing of interesting ideas and opinions.
Wading into the world of “Knowledge Graphs”, it’s easy to get lost in a world of jargon. In fact, one might expect a stern phone call from the Plain English Campaign. This is entirely as expected; the field skews academic since knowledge graphs and the direct ancestor “semantics” interleave with logic, knowledge science, philosophy and mathematics. To a lowly graph practitioner such as myself it can be quite hard to get a handle on the field.
As core community member & Principal Scientist Juan Sequeda noted in a blog post from 2018:
“… we don’t want people searching for “Knowledge Graphs” and finding a bunch of papers, problems, etc instead of real world solutions (this is what happened with Semantic Web).”
The same goes for terminology: here’s just a small flavor of the words and acronyms you’ll find when doing a cursory exploration of the space. How many do you think you could define on the spot?
Ontology, Taxonomy, Reasoning, RDF, Triple Stores, Inference, SPARQL, Semantics, Predicates
In her talk, Melliyal Annamalai (Oracle) noted that
“Knowledge Graph Conferences typically attract 100s not 1000s”. Could this reflect the accessibility of the field, or rather the restriction imposed by the terse disagreements over definitions?
The work being done in the world of knowledge graphs is interesting and important, yet it is my understanding (as hinted in the quote from Sequeda above) that discussion and tooling around the Semantic Web has impeded growth and accessibility of the field. As Ying Ding (University of Texas, Austin) mentioned in her talk on drug discovery techniques:
"people say you need a PhD to write SPARQL queries.”
Let’s return to the pervasive question: what exactly is a knowledge graph? I’m not going to take a stab at providing a definition here, as tempting as it may be. The “holy grail” definition to solve all disagreements and answer all questions isn’t likely to exist. Instead I’m going to focus on some real-world use-cases discussed at the conference.
Here are three personal takeaways from the Knowledge Graph Conference this week.
I often feature smart and innovative graph visualizations in this newsletter. Many of the conference talks reminded me that knowledge graphs in industry typically surface themselves in, frankly, unsexy ways. It’s hard to get people excited about faceted search interfaces, no matter how compelling the design. Nevertheless these systems are fundamental to business users and researchers so—does it matter?
In contrast, this demo from conference sponsor Causality Link demonstrated a clear visual comparison of the relative impact of external factors to various entities in financial markets.
But is this graph just a visual distraction from the excellent, underlying insights generated by the product that could be presented in another, superior way?
A throwaway comment from Bethany Sehon caught my attention. In her presentation with Brian Donohue she walked through the work of the Enterprise Data Management team at Capital One as they sought to wrangle the data and definitions used by business teams.
I know first hand from working with global enterprise companies that the amount of data produced by employees is overwhelming. Mass profilgation of data silos, personal knowledge bases, stray documentation and disjointed folksonomy make it seemingly impossible to make informed decisions.
Bethany mentioned the importance of looking after the “crown jewels”— those critical pieces of information and knowledge that are buried in the know-how data created and curated by employees. The extensive work undertaken by teams such as Bethany & Brian’s is seen as especially important if you think of it as a way to protect and surface those jewels.
I especially enjoyed the talks from individuals working on some of the top RDF graph database companies today. In particular, the talks from Ora Lassila (Amazon Web Services) and Michael Grove (Stardog) stood out. They both took a no-nonsense approach to explaining concepts and core competencies of their respective databases with full coverage of what it means to build “knowledge graphs.”
Near the end of two days of tremendous but concentrated coverage of knowledge graph topics, Grove’s reminder of the fundamental formulation that
“Data + Context = Knowledge”
felt especially helpful. Maybe, at least for me, the definition of “Knowledge” in “Knowledge Graphs” doesn’t have to be more complicated than that? Special Mention goes to use-cases presented by some great speakers: Konstantin Todorov on verifying controversial claims, Huda Khan on the discovery of library resources and Ron Bekkerman on Real Estate Ecosystem graphs. Look out for the release of recordings in the near future.
Graphext has done a great job building a platform that makes it easy to load data and create annotated insights against a wide array of data.
As Lake notes the article covers the use of “multidimensional hypergraphs to build a generative model of the universe.”
I think it’s natural at this time of lockdown to feel as if you could be more productive. Whenever Wolfram’s name is mentioned nowadays I can’t help but think of the following photo from this article:
Stephen is sporting his strap-on, walking computer desk to get some exercise whilst simultaneously crushing beefy problems in theoretical physics.
Since then there’s been a hearty update to the app with the addition of a slick timeline. Creator Ilya Boyandin is presenting at the Zurich Data Visualization online Meetup later this month.
I’m a big fan of the illustrated aesthetic and love the hand-annotated links. Read the comments on Twitter for extra context and some recommended reading.
Thanks again for subscribing to source/target, I hope you and yours are safe and well. I’ll see you again in just under two weeks.
]]>Don’t forget you can respond directly to this email if you have any feedback or suggestions—I’d love to hear from you.
Do you ever feel you’ve completely missed out on some quintessential pop culture? I regularly get this with movies. It turns out I’ve managed to somehow miss watching the vast majority of movies you see on top ten lists.
The Godfather? Jaws? Pulp Fiction? Blade Runner? It’s a Wonderful Life? Forrest Gump?
I haven’t seen any of them.
When these movies come up, I seem to be able to bluff my way through the core concepts or make at least one reference to it. Maybe I’m actually misremembering and I’ve seen them a long time ago? Weirdly, in some cases, the soundtrack has found its way into my psyche and buried itself for recall at the oddest moments. In almost all cases, I’ve probably seen something that has subtly (or not so subtly) parodied the story or feel of one of these “must-see-before-you-die” movies.
Due to the widespread self-isolation orders, I found myself finding time to watch all of the Indiana Jones movies and had this strange form of deja vu. It’s probably down to the unimpeachable soundtrack from John Williams; when the main theme swells near the end of the movie after being teased at length there’s a strong emotional resonance.
One classic centerpiece in these movies are the transitioning travel scenes. Indy’s plane is shown to be travelling between destinations with a red line marking the path along the way.
In 2020 the time taken to show this line crossing the globe is quite indulgent but I’m sure it was relatively impressive when originally shown in 1981. It also works on a separate level as it’s portraying 1930’s aviation in the Indiana Jones universe.
Of course this map is a graph, or more specifically an origin-destination graph. Here’s a collated version from all the movies:
A similar (and also classic) visual set-design piece is the use of photos and papers on a wall connected by yarn attached with pushpins. Extra points if the yarn is red. Here’s one from HBO’s documentary McMillions showing the connections between Jerome Jacobson and those he recruited to claim millions of dollars worth of prizes via stolen game tokens.
You’ve perhaps seen this trope so many times without giving it a second thought but I promise you’re now going to notice these everywhere.
I’ve spoken before on the prevalence of what I thought were grimly called “murder boards” but I recently discovered they are more commonly known as “crazy walls.” Fundamentally, they’re a tactile, highly visible, analogue way to give the impression of complexity and conspiracy. They work well as a visual device to show off-screen analysis. They’re also a shortcut to indicate time has passed and that individuals are fully invested in the analysis of connections between key players. In other words, as a viewer, you can almost imagine yourself walking up to the wall and drawing the fresh connection that cracks the case wide open.
It’s always nice to stumble upon collections of things once you know the name for them. Phil Gyford has been collating excellent examples on his Crazy Walls Tumblr since 2011. It’s fun to scroll through and see the similarities between various walls. I suspect the most familiar example of a crazy wall is this one from A Beautiful Mind (and no, true to form, I haven’t seen it…)
There’s a loose classification at play here, suggesting some walls simply exist to expose a frenetic energy. The archetypal example of this comes from It’s Always Sunny in Philadelphia:
You probably recognize this one as a meme that is generally seen as a shorthand for “conspiracy theorist”.
On the other end of the spectrum we have extremely clean, organized boards such as these two from Wes Anderson’s Isle of Dogs.
They’re reminiscent of the practice of knolling—another form that you probably didn’t realize had a name—and bring a sense of calm to the situation at play.
(I love good meta content so here’s one more crazy wall I couldn’t resist sharing: a crazy wall of TV crazy walls)
In writing this newsletter I’m exploring how real-world connections between people and things exist beyond the categorization into node-link diagrams or graphs. I think crazy walls are the closest thing we have to a physical, recognizable manifestation of the craziness of connections. While the vast majority of people have likely never bought yarn, a bulletin board and post-its to build their own crazy wall, the existence of them in popular culture is surprisingly familiar and in a crazy way, almost comforting.
Usage of web conferencing has exploded in the past few months. Take popular web conferencing tool Zoom, used by more than 200 million callers in March, up from 10 million in December.
There’s something satisfying about selecting virtual backgrounds when using a webcam, a feature Zoom provides to those blessed with a powerful-enough laptop. I’m not sure whether it’s because it feels especially futuristic or just liberating to hide whatever mess may be behind your desk. Regardless, there’s quite the global competition to find the most subtle, hilarious or outrageous backdrops for your now-fully-remote workday.
All that’s to say here are a few Zoom virtual backgrounds you’re welcome to use to show off your graph credentials.
Let’s start off with something arty. Here’s an eye-catching neural network visualized as a large directed graph using Gephi. This was created by Matt Fyles (GraphCore).
Yet another example of something you won’t stop seeing now I’ve pointed it out: particles.js is a library used by literally thousands of websites as an atmospheric page background. You could make your own background here by why not rep the source/target color palette?
Familiar to anyone who’s taken an “introduction to graph theory” course, here’s Euler’s Seven Bridges of Königsberg. It was Euler’s birthday on the 15th, why not celebrate his legacy with this iconic map?
And here’s one more, strictly for fans of Neo4j… Here you’re the result of a Cypher query in the Neo4j Browser.
Oh and Phil totally beat me to the punch with his collection of crazy walls, all perfect for use as a virtual background.
I missed this article from the DVS Journal Nightingale last year. In it Johnathan Dunne walks through using hive plots and similar techniques for the visualization and exploration of categorical data.
TiddlyWiki has been around for over 10 years but was featured on Product Hunt last week. It’s another example of a knowledge base where the core storage is of connected data. I recently found the plugin TiddlyMap to allow maps to be built on top of a TiddlyWiki instance.
NebulaGraph has a reasonable coverage of the differences between common graph query languages but as usual I find the comments from the Hacker News submission to be just as illuminating. Maybe I’m just drawn to anecdata but there’s a lot of interesting perspectives in that thread.
Miro is a remote collaboration tool for teams. Research Engineer Andrey Gasparian has posted a good primer on their blog looking at the use of graph visualization for understanding collaboration.
Thanks again for subscribing. Stay safe, I’ll see you all again in two weeks.
]]>I reached a minor milestone this week: 100 subscribers (and counting)—thank you for signing up and for your comments and feedback. I have a few tweaks to the format I’m looking to try but overall I’ve enjoyed putting this together every other week. I hope you’ve enjoyed it so far.
Feel free to forward this to a friend if you think they’d like it. You can respond directly to this email if you have any feedback or suggestions.
For obvious reasons there’s been an incredible uptick in webinars and conferences switching from meatspace to online over the last few months. Many pre-existing events are making necessary switches but on top of this, there’s a global bid to occupy the attention of everyone self-isolating. This is especially the case for all parents juggling jobs and childcare.
On that topic, I watched something a little different this week: Data Visualization Pro Elijah Meeks presented an introduction to networks and graphs to an audience of Grade 5-9s. It was far-reaching and a valiant effort at engaging his audience with what could have easily become a very technical discussion to a group of distracted children:
As a graph practitioner, I found it refreshing to look at the field from a complete beginner’s standpoint and was reminded of the far reach of our space. The children on the call seemed genuinely interested in the networks that they interact with every day (Family trees! Computer networks! Transport between home and school!)
One interesting, real-world network that Elijah showed—and one I had completely forgotten about since elementary school—was the food web. For those who don’t remember, this is a network of the animals that consume other animals and often looks something like this example from WikiHow:
I remember being fascinated by this concept at school, particularly how far-reaching the web was and how you seem to be able to keep drawing it forever. Analysts and other graph users seem to love calling network graphs “webs”, perhaps it’s a holdover from the fabled “world wide web” and tangible “food web” taught in school?
The idea of a food chain has been around since the 10th century, courtesy of Arab scientist and philosopher Al-Jahiz. The contemporary use of the “food web” concept (the one I remember from school) appears to derive from Charles Elton’s “food cycle” defined in his book Animal Ecology from 1927. The earliest known published example of what we call a “food web” is the following weevil-focused graph from Pierce, Cushman & Hood (1912)
I love the decision to summarize the types of species in this graph with shapes built of tessellated squares. The authors already understood the seemingly infinite connections between species. For a larger, more recent example take this “simplified food web” — it’s almost unintelligible!
— David Lavigne (2003). Chapter 2
It turns out this is perhaps intentional:
Food webs are the road-maps through Darwin’s famous ‘entangled bank’ and have a long history in ecology. Like maps of unfamiliar ground, food webs appear bewilderingly complex. They were often published to make just that point.
– Pimm, S. L.; Lawton, J. H.; Cohen, J. E. (1991). “Food web patterns and their consequences”
I recognize some some core graph features intrinsic to a food web such as the above:
Directionality is important. The arrows indicate which animal preys on which; if you were to switch it around it wouldn’t make any sense.
Self-links are also a thing. After recoiling at the complexity the first thing I noticed in this graph is that certain species are cannibalistic:
Self-links or loops are often a bane for graph analytics but in this case they’re an important part of the ecosystem.
Finally just like in the above (highly scientific) WikiHow example of a desert biome, we’re always going to find supernodes at the centre of our ecological communities. One classic species playing this role is “bacteria”, meanwhile, the Northwest Atlantic web seems highly-dependent on “cod”.
The intersection of graph and biology is vast; I’m barely scratching the surface of this topic and there’s a reason one of the most popular graph platforms was originally for biological use-cases. But for the layperson, food webs are yet another example of an engaging window into a rich & compelling world of connections.
Last edition I took a look at a number of graph tools built to help developers understand the source code behind applications. I focused on tools that help highlight and enforce rules around quality as well as the dependencies between components and third-party code brought into a repository. One such tool I didn’t get a chance to mention was the Visual Code plugin nDepend which has a dizzying array of features for visual exploration of codebases.
I was taken by this video that focuses on the benefits of using a dependency matrix representation of a graph over the more common-place node-link diagram. Matrices vs. graphs is a topic I want to cover in more depth because this is a perfect example of function over form (as you can see below).
Another graph underpinning source code repository is that of the individuals contributing to codebases. Gource, an especially vibrant multi-platform application that’s been around since 2009 and it definitely has to be seen. Take a look at this recording from last year visualizing the development of the core code behind cryptocurrency Bitcoin.
I’ve used Gource to visualise a number of repositories over the last 8 years and it’s never failed to produce interesting and eye-catching results. This is especially true when showcasing results to an audience composed of contributors to the source code repository being visualized.
The thing I like the most about Gource is how dynamic the visualization is: seeing individuals flying around the screen creating and editing files with gusto is really fun. I often talk about focusing on the utility of graphs and the need to be cognizant of the actual benefit of “thinking graph” rather than being distracted by pretty pictures. However with a tool like Gource I certainly think there’s a middle ground for a sort of non-functional benefit. Seeing contributors move around like superheroes can bring energy and camaraderie to a team as it really highlights the work ethic and contribution of developers to a code base. It’s inspiring! In this article Leonardo Faria (Thinkific) describes this exact use-case: showing off team contributions to code bases at monthly Product town halls.
Another strong use-case for using graphs to explore source code repositories is to aid program comprehension. It can take hours, days, even weeks to “grok” or understand a new codebase. Any tool that can help with this situational awareness can help reduce a major cost factor in maintaining software systems.
Open Source software Sourcetrail looks like a great approach for C, C++ & Java developers. Check out this walkthrough from Bartłomiej Filipek for an exploration of the benefits and pitfalls.
I’m looking forward to attending the (digital) Knowledge Graph Conference as they have a stellar line-up and there’s no travel costs required.
Graph database vendor Neo4j have just announced Connections, an online event with a focus on graph data science.
Here’s a nice primer from Karthik Deivasigamani (Walmart Labs) on building a product knowledge graph, in particular using heuristics and Natural Language Processing to extract entities from titles and descriptions.
Arrows is a very popular tool for prototyping and diagramming, this article from Ljubica Lazarevic (Neo4j) has some great tips and tricks for working with the tool. I feel a lot of these tricks should be integrated into the tool. Perhaps this directed graph editor from Uber could be a good basis for a fresh approach to Arrows?
Thanks again for subscribing, I hope you found this week’s edition interesting. See you all again in two weeks time.
]]>This week I’m still looking to avoid the c-word and bring you some content to distract you, even for a little while. Please take care out there.
If you like source/target please do forward it to a friend! You can also respond directly to this email if you have any feedback, suggestions or content submissions.
This week I’m taking a first look at a number of use-cases for applying graph thinking to the way we explore and analyze source code.
The story of left-pad is reasonably well-known at this point but for readers who aren’t familiar here’s it is in a nutshell:
Back in the heady days of 2016 a JavaScript developer unpublished a number of packages on the NodeJS package manager global repository, npm. One of the unpublished was the innocuous-sounding left-pad package consisting of 11 lines of code.
Due to the way npm worked this simple removal of 11 lines of code almost immediately had a knock on effect for applications around the world. Developers at Facebook, and elsewhere faced cryptic errors in their workspace and took to Github to report the issue. In an attempt to restore order npm were forced to “un-un-publish” the package.
It turned out thousands of applications depended on packages that, either directly or indirectly depended on left-pad. The story is a little more complex than the above, have a look here for an entertaining look at the full picture.
The issue was a stark reminder of the complex web of software dependencies that are intrinsic to modern development, especially so for the JavaScript ecosystem. While npm (recently acquired by GitHub (Microsoft)) took steps to prevent the issue from arising again it’s clear that unchecked dependencies are a liability.
It turns out it’s actually really hard to get a handle on the dependencies used by your average software repository. The five direct dependencies you may use in your React application could, in turn, import five more. It’s a classic example of exponential growth; pretty soon you’ll have either an extremely long list or, if you’re smart, a visual representation of the tree of complex dependencies you’ve pulled into your application.
This analysis and awareness is particularly important comes down to the intrinsic risk of using third party code. When it came to the missing left-pad package, there wasn’t any malicious intent for its deletion. This isn’t always the case. So far this year there have been 39 advisories released by npm to warn of packages compromised by malicious parties.
The other reason dependency analysis is important may seem a little surprising. All packages released and published in repositories like npm (should) have a license protecting the rights of the original author. These can range from permissive or restrictive depending on the intent and aims of the creator. Close to the restrictive end of the license spectrum is the GNU General Public License. GPL and some other licenses are known as copyleft which, loosely speaking, is the opposite of copyright: instead of protecting intellectual property from being distributed or copied this protects the ability to distribute or copy the software.
If GPL or similarly licensed code found its way into a closed-source commercial application, that license is seen as “viral” and means the company is obliged to release the entirety of the application that contains the GPL piece. This would be disastrous for most commercial companies as users and competitors would have access to the source code free of charge. It’s therefore imperative that large companies understand the exact licenses in use by dependencies of their applications. I recommend this article from David Marin at Toptal for a closer coverage of concerns around mixing software licenses.
Tools to analyze dependency trees have been around for a long time, a number of companies offer applications and services that successfully monitor dependencies to minimize the risk of insecure or commercially-risky code. Readers of source/target are, of course, most interested in the ones with a strong connected, visual emphasis.
Sticking to the JavaScript ecosystem, dependency-cruiser is a neat package to analyze dependencies and enforce rules including potentially incompatible licenses. It also includes the powerful ability to export dot files that can be easily converted into extremely clean dependency graphs.
There are some great examples of these in the dependency-cruiser documentation. If you want to try it yourself Netlify has a good article walking through using dependency-cruiser for your project.
Featured back in source/target #2, Anvaka’s code galaxy is a whizzy space-inspired application that allows you to fly around a galaxy of dependencies. There’s also a 2d version linked from PikaPKG.
If you’re keen on rolling your own graph dependency analysis, academic stalwart Gephi is a great option. In this article from just over a year ago, Matthias Meschede walks through the process taken to build a dependency graph dependency visualization of a particularly large collection of open source software.
So far we’ve focused on dependencies between libraries but the same approach is used to explore the individual procedures, functions and calls that make up any given application. I’m going to take a closer look at some of these next time but I couldn’t resist mentioning a few below.
I’m a recent, extremely reluctant convert to Visual Studio Code from Sublime Text. The conversion is partly due to the rich ecosystem of actively-maintained extensions available in VS Code as well as a mild case of FOMO. One such extension is GraphBuddy: an extremely slick integration that exposes an interactive graph of your Scala project. I don’t know Scala but I’m looking forward to the TypeScript support that’s in the works.
Right on cue there’s an update this week to Observable, the impressively dynamic data notebook that includes an interesting take on the “minimap” that’s all the rage in your favorite text editor. Instead of showing clumps of text it actually takes the dependencies between “cells” in your application and distills them down into a neat little widget. Check out the motivation in this illuminating Twitter thread.
Because all graph roads lead to Neo4j I have to give an honorable mention to jqAssistant which has been focusing on the efficiency, inconsistency management and rule-compliance of source code for a number of years now. As it has Neo4j on the backend it’s a very powerful approach to source code quality assurance. This is partly due to the capability to write Cypher to streamline rule creation and exception detection. Check out this podcast and transcript from Rik van Bruggen from a few years ago for a good introduction.
Finally I’m a big fan of CodeFlower. Built with d3.js it generates compact little force-directed networks giving a birds-eye view of code. Nodes are sized by the number of lines of code so it’s a nice way to spot bloat in your application.
Another music example is this chord diagram from On the A Side showing off the interconnectedness of Toronto artists and their records. This is a neat graph but I’d love to build something a little more comprehensive one day. So many projects so little time!
To close out the music theme I updated a blog post for work recently that uses the DBpedia knowledge graph to plot out music genres and influences in graph form. I’d love to do more work in this area in the future.
Jan Žák has posted a follow-up to his popular graph visualization walkthrough on using Pixi.JS to visualize large volumes of graph data in the browser. I’ve always been a little intimidated by the abstractions behind GPU programming but this article and accompanying code could be exactly what I needed to get started.
There have been countless Game of Thrones graph projects but this is the first Lord of The Rings one I’ve seen. It walks through the use of projection, an important tool when working with graph data. The APOC support for this in Neo4j is excellent.
Finally, check out this hierarchical graph todo list application doing the rounds on Hacker News. The graph approach isn’t compelling enough to tempt me away from Workflowy/Roam Research but it’s interesting nonetheless.
That’s it for now. Let me know if you have any feedback or suggestions. Otherwise I’ll see you again in two weeks.
]]>It’s satisfying to see this little newsletter reach edition #4. The cadence of every other week feels right as well, not too regular but right on cue when you’ve almost forgotten about it. What do you think?
I’m very grateful to buttondown.email for an excellent, no-nonsense service for newsletter sign-ups and mailings. I’ve moved my archive over to my new personal site to keep everything in one place. You can now check out the archive here.
If you like source/target please do forward it to a friend!
Every once in a while there’s a world event or craze that leads to a massive spike in interest on a particular topic. Yet very few, if any, seem to have captured the same general interest as the novel Coronavirus, or COVID-2019. Take a look at this—admittedly unscientific—Google Trends view of the popularity of the search term vs. the most popular other topics I could find:
This isn’t surprising. COVID-2019 is, of course, a global event with severe repercussions for personal connections, public safety and industry.
To be honest the content being produced on this topic is pretty overwhelming. It’s felt like a bit of an arms race for companies and organizations to release flashy websites and graphics to convince you that they are the authoritative source on this sort of analysis.
On that note, I really appreciated Amanda Makulec’s pointers and related article calling for practitioners to pause, think and carefully consider what they are communicating before producing content around the virus.
So that’s why today I’m not going to cover the anxiety-inducing minute-by-minute updates on COVID-2019. Instead I’ll look to the recent past at the network science that informs our understanding of the spread of diseases.
I picked up a copy of Linked last week. Written by the renowned and influential Albert-László Barabási, it’s a breezy primer for those interested in the networks that support the world around us. It’s also a good introductory text as it avoids diving into the gory mathematical details. Of particular interest to me was chapter 10: fads & viruses. In it Barabási uses a variety of compelling real-world examples to explore how infections and trends percolate through a network of contacts.
Linked was published nearly 20 years ago and there are some dated references; the world of 2020 is a different one to 2002.
For a start, there are a number of awkward references to formative technology concepts of interest. Take “crackers”, a term used to describe hackers with a penchant for breaking computer systems in a malicious way—as a contrast to “hackers” who mess around with computers for personal satisfaction.
There’s also a mention of groundbreaking and polarizing research into “parasitic networks”—the attempt to get a remote computer to execute some function without being detected. To the technologically-minded reader in 2020 this would be your bog-standard “botnet.” I wonder what terms we’ll find quaint in 20 years time?
I should also point out that the depiction of Gaëtan Dugas at the beginning of the chapter recounts the homophobic narrative pushed by a since-discredited book from 1984. Dugas was a Canadian widely-regarded as the primary case (or “Patient Zero”) for the AIDS epidemic in the United States. Barabási notes that “it is not clear whether Dugas brought AIDS to North America” and, indeed, this theory was thoroughly debunked in 2016.
To gain an understanding of and prevent the spread of diseases, experts analyze the complex networks of connections between individuals in a population. In medicine this analysis is known as epidemiology and typically focuses on tracing the connections that make up your “contact network”.
A contact network includes but isn’t limited to your social network. Your friends and family make up your social network while your contact network also includes anyone you interact with in your day-to-day life. It could include colleagues at work, strangers on your bus commute home or an acquaintance you shook hands with at a meet-up or a wedding.
As described in this excellent summary from Maclean’s, it is now understood that Dugas may have been the victim of a potential misunderstanding and summary bias in the original study:
“When Darrow published his report, Patient 57 was renamed Patient O, with the letter “O” standing for “Out-of-California.” Somewhere along the paper trail, the “O” got confused with a zero. Adding to the confusion was the fact that Dugas was the study’s “original” patient, placed at the centre of the cluster diagram, between patients from L.A. and New York.”
Here’s the original network from the 1984 paper for reference—I think it’s a timely reminder of the capacity for data analysis and visualization to be a vessel for misinformation.
Hubs are a particularly interesting concept in contact networks; these are nodes or individuals with a higher-than-average number of connections to others in the network. The classic example of a hub are airports at the core of the network of flight routes across the world.
In a population of individuals a hub may be a particularly “popular” person. It’s common for hubs to be individuals who are in contact with many others in a comparatively short span of time—often doctors or other healthcare workers. Hubs are critical in analysis as they could easily become super-spreaders of diseases in a population.
The concepts that underpin epidemiology are applicable to a number of other areas elegantly described in Linked with further examples. These include:
Chapter 10 from Linked is a good starting point if you’d like to learn more. For a lower-level mathematical view you can turn to the seminal text Network Science, also from Barabási. As for online resources the coverage of Transmission Network Analysis from Orgnet provides a useful walkthrough of contagion through the lens of COVID-2019 and has some effective visualization examples to boot.
One final recommendation around this topic: my partner recently read “City of Omens: A Search for the Missing Women of the Borderlands” by epidemiologist Dan Werb, PhD and has shared a lot of snippets from it with me. "I learned so much about how epidemiology works and how we can/should dig deeper to look out for our most vulnerable populations.” Dr. Werb has published extensively on global drug policy and HIV policy and is the winner of a 2014 Canadian National Magazine Award for his popular science writing.
Back in the first edition of source/target I featured work from @MenanderSoter that analyzed followers and cliques on Twitter using a graph rather than the lists that Twitter provides. This has since been taken a (giant!) step further to become a searchable, interactive application built around some Twitter community groups of interest. The FAQ is worth a read for some additional context around the community detection and layout algorithms used. The visualization implementation uses three.js and it’s quick to render if a little fuzzy around the edges.
One good (relevant) example is Barabási and other network scientists in this particular community. Also take a look here for a brief but insightful exploration of a particular network subset from Anne-Laure Le Cunff (Ness Labs). Le Cunff highlights a great visual summary technique: Twitter users with a large intersection of neighbors are shown as a single node which removes a lot of cluttering links from the view.
There’s been lots of early, positive feedback for the comprehensive introduction to Knowledge Graphs made by many prominent academics in the field. To say it’s comprehensive is probably an understatement; there are 18 co-authors and 547 references (27 pages full). Full disclosure: I haven’t read all 130 pages, but I intend to try!
Swinging over to commercial use-cases, this application preview from Data Visualizer Du Hoang (Xaxis) caught my eye on LinkedIn. It’s called Copilot and it looks to be an interesting blend of traditional dendrograms and node-link diagrams. Take a look here for a brief write-up and summary of the intent to prioritize form over function in this application.
That’s it for now. Let me know if you have any thoughts or suggestions. Otherwise I’ll see you again in two weeks.
]]>This is the third edition of the newsletter and I’m really interested to hear if it’s at all interesting. Drop me a line to let me know what’s good and what’s… less good.
This edition was sent out to subscribers on the 27th of January. Sign up at the bottom to get source/target delivered to your inbox every other Thursday!
When you look up at the night sky what do you see? If you’re graph-inclined (as I’m sure my readers are!) you probably spot constellations straight away. Indeed, an early historical example of a network graph is that of the night’s sky. As ever there’s a practical application for visualizing these graphs: the positions of the stars in this network is important for, say, navigation or the forensic analysis of photographs.
To an observer looking upwards it makes sense that these stars appear to be in a fixed point in the sky. The same can be said when we work with graphs with a geospatial element—typically latitude/longitude coordinates. There’s a snapshot or range of fixed locations for the vertices that makes sense to the viewer. There’s limited benefit to running a layout algorithm to highlight clusters of interest as you lose the geospatial context that makes the view so useful.
There are lots of competing terms available when we talk about graphs. This is especially true for the confluence of graphs and geospatial data. I’ve spoken to many intelligence & insurance fraud analysts who refer to graphs as “maps”. For individuals in the orchestration and network management space; “network maps” is synonymous with “graph”. Sometimes it seems the only universally-acceptable term is “hairball”.
I hadn’t heard of “origin-destination” as a term to describe geospatial graphs until last year but it’s an apt way to describe something I previously referred to as “graphs on maps”. Just like “source & target”, “origin & destination” explains the role of the nodes we’re considering.
One stellar example of this sort of visualization is flowmap.blue – a project I admire for bringing together a number of technologies into a slick and cohesive package:
Take a look at my top three favorite Flowmaps to see what I mean:
Let me know if you have any favorites. If you don’t have newsletter fatigue already you can sign up to the flowmap.blue release newsletter on the homepage.
Just in time for this week’s look at origin-destination graphs here’s a recent survey and attempt at building a taxonomy of geospatial graphs from Sarah Schöttler. I could probably write a whole thing about some of these.
To bring us back down to Earth here’s a mildly-frustrating article from TED that clearly prioritizes busy and distracting visualizations over insightful ones.
Janos Szendi-Varga has updated their Graph Technology Landscape repository for 2020, with guest coverage on the GraphAware blog.
It’s not an easy job to label and classify hundreds of companies, projects & books but Janos has a pretty good go of it.
Hat tip to Micah Stubbs for highlighting a “no-code knowledge graph” webapp from ThinkNum. Take a look at this blogpost for a neat approach for analyzing Facebook platform data with the tool.
The Future of Graph Panel from the Global Graph Summit has been posted to YouTube and it’s a breezy coverage of the processes and challenges facing graph practitioners in 2020 and beyond.
I took a stab at visualizing some Pokemon data a little while ago but there’s a fresh blogpost from Joe Depeau (Neo4j) exploring it in a lot more detail. I’ve been spending some time with the Cypher query language recently and (as showcased here) it’s really second-to-none for graph-focused querying.
I stumbled upon this blogpost from Snow Plow Analytics in 2018 and it’s a thorough look at the pitfalls and benefits of different event-data modelling for graphs. Definitely worth a bookmark for the next time you’re umm-ing and ahh-ing over the right model for your event data.
I enjoyed this article from Venkatesh Rao. It highlights Roam Research (see source/target #1!), newsletters (like this!) static websites (like mine!) and Twitter threads (!) as an emerging space of a sort of lower-fi-internet. The article gets pretty rambly by the end but there’s nothing wrong with that, right?
That’s it for now. Let me know if you have any thoughts or suggestions. Otherwise I’ll see you again in two weeks.
]]>source/target will share content that uses “graph-thinking” to find real insight into the world around us.
Thanks for signing up! I’d love to know what you think. Please feel free to reply to this email with comments, suggestions & criticism. Don’t worry, I can take it.
As with any novel technology, there is an expectation that applying graph techniques to your data and model will immediately generate insights. This isn’t the case. Visualization or analytics based on connections needs to be approached just like any other project: with an understanding of which tools to use and the output you’re expecting to generate. The great thing is that this doesn’t have to be complex!
This week I’m featuring a developer of compelling, graph-centric applications whose understanding of this is apparent from his work.
I first became aware of Andrei Kashcha’s work when I stumbled upon his Amazon product graph site, YASIV. Here you can provide a keyword or title and be rewarded with a rich visualization of the connections.
The intent is clear and by showing the organic clusters of books people buy, one immediately spots patterns:
I’ve previously used techniques such as frequent itemset mining to efficiently generate sets of objects that are often seen together. In the case of YASIV we’re leveraging Amazon’s vast product graph API for our networks.
Since 2016 I’ve enjoyed seeing Andrei’s projects bubble up all over the internet. It’s a testament to his creative eye and enthusiasm that his projects capture the imagination of users around the world.
Take his “vs” graph app. As with the Amazon product graph Andrei leverages the vast amount of data collected by Google’s search bar. The app scrapes the suggestions when users start typing a query. Doing this we can effectively crowdsource suggestions when users type “[X] vs” to give a visual model of the alternatives or contrasting topics.
Here’s an example for the popular data visualization library, d3.js:
In the accompanying help section on the site Andrei signs off his description with “passionately, Andrei.” I think his design decisions highlight that passion for presenting data in a dynamic and interactive form.
Let’s focus on the animation: upon execution there’s a smooth, almost mesmerizing animation as results pop into view and float around the screen as if looking for a home.
The animation here is masking the time taken to scrape and parse the results from Google and it reminds me of the apparent historical reason for mirrors accompanying elevators.
As it settles the connection animation emphasizes the backbone or skeleton of the network. In this application the size of nodes and links don’t reflect anything more complicated than simply the degrees of separation from the initial query but it still provides visual variety.
Power users can generalize this project using the hidden “pattern” parameter to visualize other auto-complete phrases. However without leveraging the chain of topics these can lead in pretty underwhelming results. This is because generic outer searches can bear no relation to the original product or entity of interest.
By restricting the domain down to a specific subset of entities this app blossoms into one that provides countless surprises to the user. Now turning to the popular link-sharing community Reddit, Andrei took a smart use of a similarity metric to generate graphs of sub-communities (“subreddits”) related by comparing lists of users posting in each.
Context and interactivity is critical in applications like this and Andrei doesn’t skip a beat: a quick click of an interesting subreddit shows the latest discussion, while double-clicking on a node sends you down a rabbit hole by showing the related subreddits.
Andrei has countless other projects that I don’t have time to explore here but it’s well worth digging around in his Github profile. Here are a few (very) honourable mentions:
I’m looking forward to seeing what other projects Andrei cooks up in the future. Follow him on Twitter to keep an eye out.
Neo4j released version 4.0 of their leading graph database this week. With the addition of sharding, security and the ability to easily run multiple databases at once it’s a compelling release. The Neo4j platform continues to be the most accessible graph database and their Sandbox and vast online content make it really easy to get started.
Jennifer Reif (Neo4j) has written a thorough summary of the best places to look online for information about the new release.
Another edition of source/target and another conference I wasn’t able to attend. This time it’s FOSDEM in Brussels earlier this month. Amazingly nearly 90% of the talks already have recordings online. The Graph DevRoom has a strong track record of excellent talks and there are a few on graph algorithm libraries this year worth a watch.
Released last year FeX: Forum Explorer is a graph-oriented browser for Hacker News comments built by Andrew McNutt. Instead of a page of nested, indented comments we can get a global view of all comments and how they relate.
Hacker News is known for it’s interesting approach to content moderation so it’s interesting to see that reflected in the comment graphs in such a nice dynamic form. Compare and contrast this comment graph on Brexit-related technology news with this comment graph around a new release of Firefox. Bonus points for the keyword and topic extraction and subsequent sub-graph highlighting and chunking; very nifty.
Julien subsequently deleted the original tweet due to an inaccurate interpretation of the complexity this represented. This inadvertently highlights the risk of jumping to conclusions when faced with an interesting graph, something I find easy to do as a graph practitioner.
Thanks again for subscribing, I hope you enjoyed this edition. Please tell your friends (the nice ones) and don’t hesitate to hit that unsubscribe button if this just isn’t your bag.
]]>source/target will share content that uses “graph-thinking” to find real insight into the world around us.
Thanks for signing up! I’d love to know what you think. Please feel free to reply to this email with comments, suggestions & criticism. Don’t worry, I can take it.
There’s lots of buzz online around the note-taking web application “Roam Research”. Billed as a tool for “networked thought” Roam makes it easy to auto-magically build up a network of interconnected notes on different topics. Through “bi-directional links” users can hop back and forth between notes that reference the same idea or concept. This is particularly interesting for note-takers: references to other pages are automatically promoted to pages that may already reference concepts mentioned in previous notes. Cool!
You can hear from the many fans on Twitter and there’s an impressive collection of logos on the homepage showcasing users from reputable organisations around the world.
I believe the appeal of Roam in 2020 is a result of the slick web application experience trading on the interest around “graph-adjacent” tools.
Of course, this type of modelling notes as a graph isn’t particularly innovative. You don’t have to dig too far into Roam recommendation threads to find someone speaking fondly on their “second brain” systems leveraging org-mode or, aptly-named, The Brain.
Even less of a game-changer is the notes graph network view baked into the tool. Built with CytoscapeJS this gives a rudimentary view of all of the links between your pages. nodes/notes are sized by the amount of content found in each page. Hidden away in a context menu is the ability to swap between Dagre & Cose layout algorithms.
Your mileage may vary depending on how anal you may be about tagging and linking different concepts in your notes but more often than not you’re going to get a tangled mess of connections.
I’d like to see an improvement in this area of the tool. It’s not easy to create a truly useful visualization as part of a wider application—it’s easy to get distracted with pretty pictures that catch the eye but ultimately disappoint when it comes to extracting Real Insights™.
Some power users on the Roam Slack have forged this path themselves. Exporting their notes out of Roam they’re leaning on tools such as Mathematica to get a wholly more interesting visualization of their notes and research. These approaches showcase just how powerful this model and subsequent visualizations can be with a little extra effort.
Visualization by @vkryukov
I’m still digesting this impressive work on Disinformation spread across Twitter by Alexa Pavliuc, funded by the Mozilla Foundation.
Colouring nodes by the date ranges is a neat trick and the accompanying videos are electric and feel very tactile. Expect more on this next time.
Still on Twitter but for a different purpose. @MenanderSoter wonders aloud in this excellent thread on analyzing followers and cliques on Twitter as a graph rather than as lists of followers. I love his exploration of his Twitter network where he actually maps what he sees in the charts with the real people he interacts with online.
Slipping from the content to the medium for a second I feel the tweet thread tells us a little about around storytelling with graph data. A lot of practitioners using graph analytics and visualizations assume that a slick transition to tell a story is important (I’m looking at you scrollytelling) but sometimes well-annotated static screenshots with accompanying text gives the content more room to breathe.
It probably wouldn’t be a graph newsletter without a mention of graph databases. Perhaps content tags would be helpful for those who care less about databases (or visualization for that matter). What do you think?
This month’s DB-Engines rankings show that ArangoDB has overtaken OrientDB according to whatever arbitrary rating they use over there. I suspect OrientDB will continue to fall out of favour, there’s a certain dilution of their brand following the acquisition by CallidusCloud and then the in-turn acquisition by SAP. It’s only a minor drop right now but we’ll continue to see other technologies/vendors eat OrientDB’s CallidusCloud’s SAP’s lunch.
I didn’t make it to the Global Graph Summit in Austin this year but there are a few talks I wish I’d seen. Marko A. Rodriguez’ talk on what he’s describing as a “database virtual machine” seems like a fever dream featuring trends in economic structures, Amazon and Open Source Software. Plus it has this wild slide:
Max Demarzi (Neo4j) is staying true to form in 2020 and writing great articles about Neo4j and related technologies. His frank breakdown of database performance benchmarks and the horrors of unchecked query languages is as informative and drily humorous as ever.
Leo Meyerovich (Graphistry) posted an interesting take on the graph database landscape drawing from his extensive experience working with customers on large-scale graph problems. I like his (what I’ll call a) quadrant of OLAP/OLTP/BI/App Dev classifications.
Here’s a neat approach to avoid complicated, pesky graph layout algorithms; simply arrange nodes into a a grid and overlay the links accordingly. Here the nodes are in alphabetical order which of course doesn’t do anything to minimize edge crossings. Nevertheless I’m surprised how nicely this works—especially when selecting nodes of interest.
Speaking of hybrid chart types with selection highlighting there’s an interesting bubble chart in this Washington Post article (halfway down the page). It shows the amount of words uttered by House managers and lawyers 6 days into the impeachment trial. Links are revealed on hover so it rewards users. When the data volume is small I love fluid force-simulation visualizations as I feel their instability reflect the people and actions behind the data.
Here’s a great Tumblr blog of examples of network visualization library Cytoscape being cited in research papers. I love a lot about this. It’s fascinating to see deeply-technical datasets typically from biology and related domains visualized in wildly different network charts. It’s a rich library of (sometimes questionable!) color choices, styling designs and data models.
That’s it for now. Please tell your friends (the nice ones) and don’t hesitate to hit that unsubscribe button if this just isn’t quite the sort of content you were expecting.
]]>