Craigslist has been an online source for selling or trading objects and services since 1995. The amount of traffic and money exchanged on the site quickly placed it under both legal scrutiny and social observation. While it does not offer a unique type of transaction, the platform itself was initially a novel approach to bartering. Allowing free and unlimited posts, anonymous users and a variety of categories has encouraged sections to expand and develop.
This report analyzes data collected from the Missed Connections department on Craigslist. A general definition describes missed connections as “a type of personal advertisement which arises after two people meet but are too shy or otherwise unable to exchange contact details” (Wikipedia) or “section of craigslist where people can post about strangers they saw on the subway, at work, in elevators, at rock shows” (Urban Dictionary). Unlike other ads on Craiglist which facilitate ambiguous contracts between strangers before they meet in real life, this section uses the internet to re-connect people who have first encountered each other in person.
Over the course of 7 days, messages posted in Missed Connections from users located in New York City’s five boroughs were reviewed and compiled into one dataset. The visualizations created with this information capture trends in gender, location and age. This reviews whether the content of the posts has strayed from the original intent and whether gender or interest affects the rate of posting.
Materials and Methods
The popularity of Craigslist is, in part, due to relaxed information requirements. This quality imposed by the platform made it tricky to create a clean dataset. Posters only need to input enough information to place their message in the correct borough/city and category (‘Missed Connections’ in this instance). There are options to include a specific location, age, body type and other attributes, but these are not mandatory. ‘Specific location’ is almost always used to note neighborhood, while a secondary source of geographical information exists in the body of the message. This is where the setting of the sighting would be mentioned, such as subway, cafe, theater, etc.
The data points chosen for review were age, neighborhood, borough, post type, gender, place and subway line. The last category was chosen to further define the most highly noted place for sightings. After connecting this formatted dataset, the tools featured by Tableau created rich analysis and interactive visualizations. Drag and drop options were provided to review information by column and row.
This NYC Missed Connections story was told through four visualizations divided by ‘Types of Post’, ‘Age’, ‘Location’, and ‘Subway Lines’. For the first, posts in Missed Connections appear divided between three message types: date (someone you saw and want to reconnect with), love/hate Letter (an undesignated note about general feelings toward a relationship), or a solicitation (requesting professional or personal services). For the second, age is reviewed among posters. The third reviews which city sites are hotspots for sightings. This influenced the final visualization of subway lines, which goes off of the most popular place noted for a missed connection.
Types of Posts:
An area chart shapes the number of posts by type to review trends by gender. Immediately it becomes clear how many more posts are made by men (both m4m and m4w). There is also an inverse trend between m4m and w4w, the latter almost solely posting hate and love letters while the former exists in the other two categories.
A line chart was appropriate for visualizing the average age of posters. Color divides the posts by interest and gender, while horizontal lines provide information about the wide range in age of users – between 96 and 19 years old. The dominating age group is 25-29 but early 30s come in at a close second. Our most diverse interest group is m4w.
Location was an opportunity to effectively use a bar chart. There were a variety of spots named for sightings and viewing how these differed (or remained the same) based on interest groups, created a dynamic visualization. Interest groups and location mentions were in descending order by number of posts to display frequency of occurrences by group. Across the board, the subway is the number one hotspot for checkouts with the street coming in at second. The option to view location posts by day is provided through a filter to uncover whether trends exist by time of the week.
So many posts list the subway as the place of their sighting that it seemed worthwhile to review which platforms are noted more frequently. While there are a few popular lines with a similar average rate of mention, the L beats out all of them with nearly twice as many notes. This is displayed using packed bubbles grouped and sized by the sum of their records with color showing details about the subway lines. Posts which note meeting on the subway but fail to mention a line were excluded.
View Interactive Story and Dataset: