Sankey Diagram of the Speaker of the House Votes
This week the House was struggling to elect a Speaker. I wanted to see how the votes changed between rounds, so I made this Sankey diagram to visualize the process.
Here's the source code (written in AWK) to process the Clerk's data and output as Sankeymatic.com input.
#!/usr/bin/awk -f # Copyright Remington Furman, January 2023. # Generate a Sankey diagram to track the Speaker of the House votes # from House Clerk XML data. # # Generate diagram here: # https://sankeymatic.com/build/?layout_style=auto&default_node_colorset=a&default_flow_inherit=outside_in&label_first_pos=before&font_face=sans-serif BEGIN { r_color = "#900" d_color = "#009" p_color = "#909" nv_color = "#999" candidate_colors["Jeffries_"] = d_color candidate_colors["McCarthy_"] = r_color candidate_colors["Banks_"] = r_color candidate_colors["Biggs_"] = r_color candidate_colors["Donalds_"] = r_color candidate_colors["Jordan_"] = r_color candidate_colors["Zeldin_"] = r_color candidate_colors["Hern_"] = r_color candidate_colors["Donald_J_Trump_"] = r_color candidate_colors["Present_"] = p_color candidate_colors["Not_Voting_"] = nv_color } FNR==1 { FNUM++ } /<recorded-vote>/ { match($0,/>([^<]+)<\/legislator>.*>([^<]+)<\/vote>/,m) voter = m[1] candidate = m[2]"_" gsub(/ /, "_", voter) # Replace spaces. gsub(/ /, "_", candidate) gsub(/[\(\)\.,]/, "", voter) # Remove punctuation. gsub(/[\(\)\.,]/, "", candidate) votes[FNUM][voter]=candidate candidate_totals[FNUM][candidate]++ } END { for (round in votes) { split("", defections) # Clear array. PROCINFO["sorted_in"] = "@val_str_asc" for (voter in votes[round]) { candidate = votes[round][voter] node_colors[candidate round]=candidate_colors[candidate] if (round == 1) { printf("%s%d [1] %s%d\n", voter, round, candidate, round) } else { previous_candidate = votes[round-1][voter] if (candidate != previous_candidate) { defections[candidate]++ printf("%s%d [1] %s%d\n", previous_candidate, round-1, voter, round) printf("%s%d [1] %s%d\n", voter, round, candidate, round) } } } if (round != 1) { for(candidate in candidate_totals[round]) { same_votes = candidate_totals[round][candidate] - defections[candidate] if (same_votes != 0) { node_colors[candidate "voters" round]=candidate_colors[candidate] printf("%s%d [%d] %svoters%d\n", candidate, round-1, same_votes, candidate, round) printf("%svoters%d [%d] %s%d\n", candidate, round, same_votes, candidate, round) } } } } for (node in node_colors) { printf(":%s %s\n", node, node_colors[node]) printf(":%s %s << >>\n", node, node_colors[node]) } }
The XML data can be downloaded from the House Clerk's website:
https://clerk.house.gov/Votes?Question=Election%20of%20the%20Speaker
For example:
http://clerk.house.gov/evs/2023/roll016.xml
Once all the data is downloaded it can be processed like so:
#!/bin/bash FILES=$(ls roll*.xml | sort -n) ./speaker_sankey.awk $FILES > speaker.sankey
The output (speaker.sankey) can be fed to https://sankeymatic.com.
All the files (including data) can be downloaded here:
Update
I colorized the output to match party affiliation and sorted the names for a better layout.