This series features mid-course projects for our Data Science Bootcamp. Students were first tasked with posing an interesting data question and finding a dataset to address that question. Next, they spent time cleaning, wrangling, and exploring the data, before designing and building an interactive Shiny app to display their findings and allow for further exploration.
As an art-lover, Data Science Cohort 4 student Savannah Sew-Hee was intrigued by the data that New York City’s Museum of Modern Art (MOMA) publishes on their artwork and artists. For her mid-course capstone project, she wanted to do an inventory analysis that would be useful for other companies, but use a data set that appealed to her personal interests.
By doing this inventory analysis, I can take what I have learned and made for this R shiny app and apply it to a company’s data set of their products in order to help a company visualize the products they have, [identify] discrepancies in their dataset, and suggest new unique products to add to their collection.
Using two CSV files she obtained using the Museum of Modern Art’s API, Savannah explored the data to discover how diverse their collection of artwork is, both in terms of nationality and gender of the artist and the category and medium of the art.
She initially had five things she was hoping to learn through her analysis.
With her questions identified, it was time to clean the data. Savannah quickly discovered her first challenge would come from the dates field. It’s not uncommon to find different formats when multiple people update a database, and the MOMA’s database was no exception. Savannah found multiple formats in the data field, including for example, Dec. 2, 1955, 1960-1962, and c. 1884. Her solution was to convert each date value into a single year, allowing for comparison across pieces. She was also curious about how old the artists were when their work was created, so she added a new column into the dataset in which she calculated the artists’ ages. She also added a geographic region variable based on the artist’s nationality.
Savannah used Python and the pandas library to clean the dates and reorganize the columns as well as for some preliminary exploratory analysis to see how well the resulting data could answer her questions.
Savannah’s Shiny app, built with R with the help of the tidyverse library, features three dashboards, Artists, which explores the artists in the MOMA collection, Acquisition Over Time, which explores the mediums and categories/types of artwork acquired by decade, and a Text Analysis of the Collections Mediums.
((sunburst plot right aligned)) In the Artists dashboard, she created a sunburst plot with plotly to explore the country and region the artists are from. “The visual demonstrates which parts of the world the museum draws most heavily from (North America and Western Europe),” she shares.
A word cloud on the Acquisitions Over Time dashboard, visualizes the artworks’ mediums and was created using tm and wordcloud.
Finally, on the Text Analysis dashboard, “the scatter plot uses text mining to plot pieces of artwork near artwork with similar mediums,” Savannah explains. “This is useful for curators or art enthusiasts looking to acquire similar pieces of artwork by medium. They can use this feature of my app to identify similar pieces that they or their client might be interested in.”
Savannah noted several interesting results.
“I learned that the majority of the artists in MOMA come from countries with predominantly western cultures and 85% of artists are male. This aligns with the national average of artists in museums globally, and aligns with the demographic of artists whose artwork has been preserved over time.”
She also observed two findings with the scatter plot.
Discover more of what Savannah learned by visiting her Shiny app.