After the FIFA Women’s World Cup this summer I became very interested in the US Women’s National Team (USWNT) and have been following their games since. When coming up with an idea for a network analysis assignment, I thought it would be interesting to see a network analysis of the goals from the last year to see if there were any patterns that emerged. Soccer is very much a team sport, and if a goal was scored by a play that involved two players the names of both the goalscorer and the assister are record. I decided to make a chart of who was assisting who on goals. I was curious if there were any definitive patterns, or pairs that would stand out in the group.
Before starting this project I was inspired by the network analysis graphs of the characters in Les Miserables that we looked at in class. I liked how color-coding the nodes by “type” of character helped illuminate some of the logic of the text. Like characters tended to appear with one another, while the whole graph was mostly centered around Valjean. I wanted to take that approach to my chart, and see if it would confirm assumptions I had about goalscoring. Namely that the midfielders, or to a lesser extent defenders, feed the ball up and assist the forwards’ goals.
There wasn’t a dataset that tracked who was assisting on goals. While it is easy to find information on goalscorers, this piece was often left off. So I had to build my own dataset. Luckily Wikipedia has a page on US soccer in 2019, which listed every game they played and linked to a news report about the game, which did mention who the assists were. So I build a edges table that recorded the assist (source) and goalscorer (target), type (they were all directed) and the date of the goal. For the purposes of this chart I didn’t include goals that did not have an assist, or own goals made by the opposing team. In order to make sure I had enough data to make an interesting network, I expanded my dataset to include goals from 2018 and 2019.
In addition I am a node table that recorded the names of all of the players on my edges file, the positions they (typically) play, and the total number of goals they scored in 2018 and 2019 (including goals without an assist). I wanted to make the size of the node based on this total number, and the colors of the node based on their position.
I used Gephi to create the actual network analysis using the steps outlined below.
After creating my datasets I imported my nodes table, then edges table into Gephi. I performed the basic statistical analyses on the data and tried a few different layouts to make sense of the network. Pretty quickly I saw that the Yifan Hu layout balanced spreading out the network to keep it from being too strongly clustered together, while keeping it from spreading too thin.
I assigned the nodes the following colors: purple/pink for forwards, orange for midfielders, green for defenders, and blue for the goalkeeper. Gephi automatically truncated my edges data to simplify it. For example Rapinoe assisted Morgan on multiple goals, so Gephi automatically assigned them a larger weight. The thickness of the edges increased the more times any two players were connected. Because of this though the dates became useless. I am not sure which date became the one assigned to the edge, but the others were deleted.
This came into play when I tried to animate my network. After searching I was able to find easy to follow instructions on how to do that. I used the section on transforming an existing column to create a time interval. However because every single date was no longer recorded the resulting animation was effectively meaningless. But it was a fun road to go down!
In the end, as mentioned, I used the Yifan Hu layout, had my nodes sized by the total number of goals scored, and colored based on the position of the player, the edges weighted by the number of times that pair was connected, and colored by the position of the source player.
As I initially thought, the players in the center of the chart were generally forwards. In fact Morgan, who had the most goals total in the time frame, was the center-most player. Lloyd and Morgan were the two highest players based on in-degree, and they had scored the most goals overall. The next level of the graph around Morgan were mostly other forwards and midfielders, the next circle outside of that were where the defenders typically fell. Goalkeepers don’t typically assist on goals: Naeher did appear on the graph, but was pushed to the outskirts. There weren’t many surprises, for the most part players who are positioned further back on the field were generally feeding the ball up to the forwards. Forwards were also assisting each other quite a bit. Rapinoe, Press, Heath, and Horan were very successful bridges, assisting many players, and they all typically play on the wings, while Lloyd and Morgan are generally central players who score often but don’t assist many other players.
There are a few aspects of the chart I would like to improve on if this project were to be continued. While having the edges take on the color of the source is clear when different positions are connected, when it is an edge between two players of the same position it isn’t clear which way it is pointing. Either it should be arrows, or there should be someway to specify who the source and who the target is. I would like the labels of the nodes to be perhaps positioned just below the node rather than in the middle of it. I am not sure how much control Gephi gives you over that, but it makes it tricky to read some of the names. Overall I probably should have made the text larger.
Finally since the time that I created the dataset, the USWNT played two more games, finishing their 2019 season. Those games had a combined 8 more goals and I would love to see those included to the dataset to round everything out and have it truly represent the team dynamics of 2018 and 2019. Another way to potentially expand the project would be to do small multiples of every year under Jill Ellis as the head coach of the team from 2014-2019.