In Fall 2020, the final group project in Problem Solving With Data asked us to use newly-learned R and Python skills to analyze tweets to answer self-selected research questions aimed at addressing some kind of social good. My two-person team opted to look into Disability Twitter, a topic I proposed. I also pulled, filtered, and merged the data, as well as performing a large chunk of the content analysis and writing the corresponding sections of the report.
While there are many aspects of the analysis I would do differently under other circumstances (see p. 28), it was a great opportunity to match my interest in/knowledge of a Twitter community with developing technical skills.
After Yahoo acquired Tumblr, Search leadership asked me to find a way to feature Tumblr content in web search results.
I started out with several things to consider:
Understanding the type of content on Tumblr
Determining what content, if any, could map to real web search user needs
Figuring out what metadata we could extract from Tumblr posts and whether it was enough to work well in our content management platform
Learning as much as we could from what little data the Tumblr team could share with us
Because I was unable to discover much evidence of existing Yahoo search-to-Tumblr content behavior in our logs, and the nature of Tumblr’s content is freewheeling and relatively unstructured, we had to experiment.
The first test featured content from specific Tumblr users (celebrities, online personalities, organizations–entities with discrete matching queries) in a simple image carousel. Limitations of this approach: only image-type posts could be displayed, so blogs with text posts, links, etc. would appear with limited results or none at all, despite frequent updating; we could only trigger on keywords that had a clear match to a single blog (e.g., Beyonce, ZooBorns). As a result, coverage was low, and leadership tasked us with significantly expanding the experience.
To accomplish this, I needed to rely on automatic triggering methods that offered far less control over what content appeared in search results. Despite concerns about relevance and quality, we launched a test for a small percentage of search traffic. The initial test had to be taken offline within days because, although the backend team took steps to remove content flagged as “adult,” pornographic results (and worse) slipped through.
Search leadership was determined, however, and resources were provided to dramatically improve the indexing for quality and cleanliness. The backend team also added logic for when to return content at all, based on timeliness and other factors. A designer was brought in to collaborate a unique template for Tumblr that accounted for the variable types of content and included more Tumblr branding (color, logos). The UX and content improvements launched in bucket, and although metrics weren’t impressive, it didn’t cause major problems, and the feature launched for all desktop web traffic.
Seeking to experiment further in hopes of improving and better understanding its performance, I took the initiative to categorize queries that triggered the Tumblr module and identify categories that might be well-served with Tumblr content. I used existing keyword lists roughly mapping to a dozen or so categories and set up a test bucket version of the module with only these categories with logging for each. I also wanted to see if other factors affected performance, including where the module appeared on the page (“slotting”) and how consistently it appeared (whether to ignore backend display logic). I tracked and compared my experiment’s performance to the primary module’s on a weekly basis, using that data to make small tweaks to each category along the way.
The great Tumblr in search experiment ended after about a year and a half, when leadership decided the investment was no longer justifiable. Despite the effort’s ultimate failure, I was recognized for my contribution and creativity.
Key categories in my final experiment did show some lift in performance: food, books, holidays, fictional characters, TV series, and movie series.
Tree knowledge graph with growing information extracted from Calpoly’s website and images via Flickr Creative Commons APIs
(no longer live)
Search leadership asked our editorial team to identify and develop “low hanging fruit” reference-type content as part of a competitive parity initiative. In some cases, we took content that had already been curated for standard search features and turned it into expanded right-side “Knowledge Graph”-style elements.
For existing content, this was effectively a UX or template migration: the content was there, it just needed to be moved to a different format. Trees and food nutrition facts were two good examples. The only trick was that the images already curated weren’t large or high quality enough to serve as a large “hero”-type banner image, so I took advantage of Yahoo subsidiary, Flickr, which has a search-based API that let us serve only Creative Commons-licensed user images.
Though these efforts did not represent high-volume queries, the effort did not go unnoticed by leadership, and it also served as an opportunity for less experienced team members to build their skills and flex creative muscle.
Community Season 6 – featured Yahoo Screen video content + keyword list cultivation for TV series knowledge graph
(no longer active due to deprecation of Yahoo Screen)
When Yahoo Screen made its foray into full-length original TV series with Community and others, we were ready to go in search. I made sure that the latest episodes carousel had excellent coverage and TV Series Knowledge Graph contained accurate, detailed profiles.
Vertical search experience embedded in web search results – set up in support of Yahoo’s Digital Magazines strategy (Tech, Style, Movies, etc.)
Yahoo launched several new media verticals called “Magazines” and did not migrate any corresponding vertical search experiences, which were based on an older platform. Instead, a search product manager was tasked with adding vertical content to web search, filtered according to the user’s site of origin, and they enlisted my support to create and launch the necessary features in our search content management platform.
Each query sent from a search box included more than just the user’s query–it contained referral information, usually unique to the property or even page. I used this information to determine when a search experience should appear, as well as pass variables to a backend. The news backend, which indexed news from hundreds of sources worldwide, including Yahoo’s own sites, could return articles matching any query, filtered by property and sorted by freshness.
Expanding on a template design already in use for news results in general web search, I created a “vertical search” feature that included up to 10 results with thumbnails for each and paginated results if there were more than 10 stories matching a given query. This large search feature appeared on top of the usual web algorithmic links and any other non-monetized search features.
Product owners also asked for Magazine stories to be highlighted in web search results (in a less aggressive form, of course). I created simple “navigational” features with a max of 3 stories to appear on searches by Magazine name and featured writers to satisfy their primary need. Because Magazine stories were indexed along with all news content, these stories could appear in regular news search results without any extra effort.
No-maintenance, low-effort vertical search launched on all new Magazines. Sites and big-name authors were effectively promoted in web search.
2016 US Presidential Election search features, including candidate Knowledge Graph with fundraising data from OpenSecrets.org and polling from Real Clear Politics.
Factchecked quotes from the Politifact Truth-O-Meter.
Where candidates stand on the issues, with researched positions from ProCon.org and “explore related” suggested queries manually curated.
I was approached by search product teams working on distinct experiences around the then-upcoming 2016 U.S. primary election to offer feedback on proposed designs and organize editorial efforts in content curation and quality validation.
Led team effort to develop detailed list of potential features along with timeline, content source(s), and priority. Features based on team knowledge and real user search data/query patterns.
Researched content sources for features that would be relevant in early 2016 (general politics, candidate research). Key requirements included high-quality, politically neutral data; structured in a way that was compatible with our content management platform; served in XML or JSON format or able to be extracted and converted to usable form.
Joined product team meetings to give updates on content development and share feedback on whether design matched real content and user needs.
Created and launched experiences on web search and provided support for other platforms using our work.
By February 2016, we launched several features on web search:
Presidential candidate knowledge graph, incorporating party affiliation, donation data from OpenSecrets.org, polling data, and political office history.
Latest quotes with “truthfulness” rating from Politifact for all presidential candidates.
Candidate stances on 20+ key political issues extracted from ProCon.org with manually curated browse element to help search users explore candidate opinions.
Political cartoon of the day.
Additionally, detailed requirements for election results and future election experience ideas were documented.
Search leadership wanted to take advantage of our then-new content management platform to release a complete suite of Olympics results features in each of 12 key markets, including the Arabic language site Maktoob. As an editorial leader and tool expert, I was tapped to organize the global team in this complex, ambitious effort.
While the search front-end engineering team developed templates designed specifically for the Olympics–the first time our platform was used for an important tentpole experience–our editorial team organized into content/query experts and technical builders capable of wrangling backend data and tricky template mapping. I oversaw these efforts and maintained detailed tracking of efforts on a per-market basis.
In lieu of engineering-heavy front-end localization, I created an editorially-driven “localization” data source that was easy to use in the content management platform and simple for the global team to input and update specific text strings for UX copy. This made it easier to build centralized featured and simultaneously deploy in almost every market.
Keyword creation was an immense undertaking: we built whitelists of thousands of athlete names and variations (including event and country); numerous patterns were developed to address results by event/sport and country.
I was responsible for keeping stakeholders up to date, ensuring all delegated work was completed in time, supporting pre-launch QA, and understanding how it all worked well enough to address bugs and concerns as they arose.
Our successful global Olympics experience demonstrated the power of the content management platform and the non-technical editors who worked with it. It also highlighted ways to improve the process to reduce engineering overhead and make even more complexity and customization possible.