Sankey Diagram of the Speaker of the House Votes

This week the House was struggling to elect a Speaker. I wanted to see how the votes changed between rounds, so I made this Sankey diagram to visualize the process.

Speaker of the House Votes

Figure 1: Speaker of the House votes by round

Here's the source code (written in AWK) to process the Clerk's data and output as Sankeymatic.com input.

speaker_sankey.awk:

#!/usr/bin/awk -f
# Copyright Remington Furman, January 2023.

# Generate a Sankey diagram to track the Speaker of the House votes
# from House Clerk XML data.
#
# Generate diagram here:
# https://sankeymatic.com/build/?layout_style=auto&default_node_colorset=a&default_flow_inherit=outside_in&label_first_pos=before&font_face=sans-serif

BEGIN {
    r_color  = "#900"
    d_color  = "#009"
    p_color  = "#909"
    nv_color = "#999"

    candidate_colors["Jeffries_"]       = d_color
    candidate_colors["McCarthy_"]       = r_color
    candidate_colors["Banks_"]          = r_color
    candidate_colors["Biggs_"]          = r_color
    candidate_colors["Donalds_"]        = r_color
    candidate_colors["Jordan_"]         = r_color
    candidate_colors["Zeldin_"]         = r_color
    candidate_colors["Hern_"]           = r_color
    candidate_colors["Donald_J_Trump_"] = r_color
    candidate_colors["Present_"]        = p_color
    candidate_colors["Not_Voting_"]     = nv_color
}

FNR==1 {
  FNUM++
}

/<recorded-vote>/ {
    match($0,/>([^<]+)<\/legislator>.*>([^<]+)<\/vote>/,m)
    voter = m[1]
    candidate = m[2]"_"
    gsub(/ /, "_", voter) # Replace spaces.
    gsub(/ /, "_", candidate)
    gsub(/[\(\)\.,]/, "", voter) # Remove punctuation.
    gsub(/[\(\)\.,]/, "", candidate)

    votes[FNUM][voter]=candidate
    candidate_totals[FNUM][candidate]++
}

END {
    for (round in votes) {
        split("", defections) # Clear array.
        PROCINFO["sorted_in"] = "@val_str_asc"
        for (voter in votes[round]) {
            candidate = votes[round][voter]
            node_colors[candidate round]=candidate_colors[candidate]
            if (round == 1) {
                printf("%s%d [1] %s%d\n",
                       voter, round,
                       candidate, round)
            }
            else {
                previous_candidate = votes[round-1][voter]
                if (candidate != previous_candidate) {
                    defections[candidate]++
                    printf("%s%d [1] %s%d\n",
                           previous_candidate, round-1,
                           voter, round)
                    printf("%s%d [1] %s%d\n",
                           voter, round,
                           candidate, round)
                }
            }
        }
        if (round != 1) {
            for(candidate in candidate_totals[round]) {
                same_votes = candidate_totals[round][candidate] - defections[candidate]
                if (same_votes != 0) {
                    node_colors[candidate "voters" round]=candidate_colors[candidate]
                    printf("%s%d [%d] %svoters%d\n",
                           candidate, round-1,
                           same_votes,
                           candidate, round)
                    printf("%svoters%d [%d] %s%d\n",
                           candidate, round,
                           same_votes,
                           candidate, round)
                }
            }
        }
    }
    for (node in node_colors) {
        printf(":%s %s\n", node, node_colors[node])
        printf(":%s %s << >>\n", node, node_colors[node])
    }
}

The XML data can be downloaded from the House Clerk's website:

https://clerk.house.gov/Votes?Question=Election%20of%20the%20Speaker

For example:

http://clerk.house.gov/evs/2023/roll016.xml

Once all the data is downloaded it can be processed like so:

#!/bin/bash

FILES=$(ls roll*.xml | sort -n)

./speaker_sankey.awk $FILES > speaker.sankey

The output (speaker.sankey) can be fed to https://sankeymatic.com.

All the files (including data) can be downloaded here:

speaker_votes.tar.gz

Update [2023-01-11 Wed]

I colorized the output to match party affiliation and sorted the names for a better layout.

© Copyright 2023, Remington Furman

blog@remcycles.net

@remcycles@subdued.social