FamilySearch Wiki:History of Content Organization, Browsing, and Categories

The Past
When FamilySearch Wiki was first published on the Internet in early 2007, it was deployed using an open source content management system called Plone. In other words, the wiki was first published with Plone, not the MediaWiki software. In Plone, we were using a folder structure to organize articles. This was problemmatical, however, for articles which seemed to fit the topics of multiple folders. Should an article about the Western States Marriage Index be placed in the Idaho folder, the Utah folder, or the United States folder? The whole purpose of using folders was to make it easier for users -- especially authors -- to quickly survey all the articles we had on a topic. But if we were constantly having to place an article in one folder when it really fit the topics of several, would that purpose be served? Clearly, we needed a way to make a many-to-one association between topics and articles.

This led us to Keywords. In Plone 2.x, authors could associate articles with keywords. This would offer the required many-to-one association between topics and articles. However, we saw how huge a task it would be to populate the system with FamilySearch's deep and wide topic authorities in multiple languages and train authors in their use. Our tiny team didn't have the resources to tackle the project, and meanwhile, the search engine was doing a pretty good job of finding the articles we needed for production work, so we put this project on hold.

In autumn 2007, FamilySearch directors decided to switch the site to a different platform. We would migrate the content from Plone to MediaWiki because it had been proven that MediaWiki sites that did what we were trying to do could be scaled to a large audience. In December 2007, we launched Beta 2, inviting all past contributors of the Plone site to join us in testing the new MediaWiki platform.

MediaWiki has no folder structure for content. It incorporates the use of Categories, which allow users to browse all the articles associated with a topic. Although Categories allow browsing, we have not yet explored whether they can be used to filter MediaWiki's search results.

Since good metadata requires tagging standards, since we have migrated to another platform that allows tagging, and since standardizing our tagging will be a very large project, we planned to address the issue later.

Then life -- or at least integration with other systems -- got in the way. In 2008 we spent a lot of time testing a tool called Semantic MediaWiki whose purpose was to improve categories from a taxonomy to an ontology. This was billed as something that would improve search a great deal. But the tool was one only an engineer could love, not one that we could expect our contributor base to understand, so we didn't deploy it. Later, we worked with the Standards or Authorities team here at FamilySearch to see how our categories could align with theirs. Still later, as we designed elements of a series of community aplications we wanted to build, we explored aligning the wiki's categories with those being developed for all the FamilySearch products.

Each time we explored solutions or integrating with these other products, it paralyzed us from spending time to build, organize, or fix the wiki's current categories. Why, we wondered, should we spend time working on something we'd just be replacing soon? So the wiki's categories went undeveloped through 2011.

2011
In 2011, we decided we couldn't wait any longer. Although Shangri-la solutions like ontologies, alignment with the rest of FamilySearch, or alignment with the FamilySearch Catalog were enticing, they weren't getting us anywhere. We made final efforts at understanding the current state of thinking of the other FamilySearch teams concerned with authorities, and decided to forge ahead with categories developed by the team that governs development of the main FamilySearch site. In the first half of 2011 we implemented two of that team's four levels of categories. The final two we planned to implement in the fourth quarter.

Browsing by Place or Topic
Genealogical topics include two major subsets – those which are associated to a place and those which aren’t. We plan to use Mediawiki’s Categories to classify our system’s content. Mediawiki allows users to create Categories with Subcategories beneath them, which is good for classifying, say, towns within a county. However, Mediawiki’s Special Page listing all Categories lists all Categories and Subcategories in a flat list, not a hierarchy. We recommend that product management test users to see whether they would find it useful to include another page which lists all Categories and Subcategories in a hierarchical or outline form.

Authorities for Topics and Places
When seeking information in the wiki, users need to see browsing options and search results which are consistent, unambiguous, and if possible, even familiar. The Library of Congress authorities (subject headings) meet these criteria for subjects other than places. Where the LC catalog fails is in the disambiguation of common place names used for multiple levels of jurisdiction. For instance, place names like Grant, Washington, Montgomery, Jefferson, Lake, and Summit are used for towns, townships, parishes, counties, and sometimes states. If our wiki were to employ a category of “Washington” without including its jurisdiction, and a user types “Washington” into the wiki’s search engine, the search results will contain entries for all these jurisdictions.

Filtering
Users need to be able to filter a search not only by [all places named Washington], but by [Washington that is a county in Washington state].

In Mediawiki, users can create a category United States with a subcategory Washington. However, they can’t add more subcategories named Washington for counties, parishes, townships, and towns with that name. A category name can be used only once.[1] Standards Clarify Place Categories

To create place categories that users and the system will find unambiguous, we must employ standards found in genealogy programs and the FamilySearch Catalog. An entry for Washington Township would look like this:

United States, Washington, Washington, Washington

…or this:

Washington, Washington, Washington, United States

…or this:

United States, Washington State, Washington County, Washington Township

We need to choose one of these three standards and implement it.

Categorizing by Wiki Code vs. WYSIWYG Interface
Categorizing an article by adding wiki code is problematical for two reasons. First, it’s not simple. For the same reasons normal people prefer a WYSIWYG operating system over DOS or UNIX, normal people also prefer WYSIWYG controls over wiki coding. Just as the DOS operating system was a barrier to many people using computers, wiki coding is a barrier to many people categorizing Wikimedia content.

Another problem with categorizing-by-coding is that one must spell the category exactly right. If they fail to get every character correct, the system creates a new category. Since our system’s place category names will be long, such as United States, Washington, Washington, Washington, the probability for error in adding category codes will be high. This necessitates the recruitment and management of a fairly large category cleanup team, which is a high-maintenance solution.

Proposed Solution
Since categorizing articles by adding wiki code increases errors and decreases the number of active authors, other solutions should be considered. Users wanting to categorize an article should be able to choose a category from a hierarchy of all active categories. They should also be able to search for the category name rather than browsing the hierarchy.

Disambiguating Article Titles
Customers who search by a common place name like Washington who don’t know how to filter their search by Category will find in their search results articles for all places named Washington. To make their options less ambiguous, we will recommend that authors use standards in titling their articles. Instead of titling an article Vital Records or Vital Records in Washington, we may recommend they use something like Vital Records of Washington (town), Washington County, Washington. Again, we need to select a standard and implement it.

[1] In Mediawiki, categories are identified only by name. To categorize an article for Washington County and another for Washington State, users may choose to add to the article’s body text. If the categories for the state, county, and township had the same name, the user (and even the system) wouldn’t know how to tell them apart.