In the last year I've done a bunch of research around how Microsoft implemented metadata, specifically taxonomies and terms, in SharePoint 2010. This research has lived in a OneNote file for a long time but finally I've started to pull it together and put it into something a bit more cohesive and something easier to comprehend and use as a reference.
In this post I'm going to document and talk about all the parts that you'll find (and some you might not find) in SharePoint 2010 that allow Microsoft to implement the taxonomy capabilities throughout the product.
Note that I am only going to cover metadata as it relates to taxonomies in this post… not syndicated content types (spaka: enterprise content types).
Before I go too far I do want to give a shout out to two people who've written a good bit about this these two topics. Both Wictor Wilen (here & here) and Ari Bakker (here & here) have some great posts on these subjects. Yes, some of the stuff you'll find here you might find on their blog posts. So why am I writing about it? Well, my goal is to make this more of a reference post and I also want to post my take on it. I've also reached some different conclusions on some of the aspects of this so here we go...
First, let's look at all the pieces and parts that makeup the SharePoint 2010 metadata story as they relate to taxonomies (term sets):
At the core the taxonomy store is founded on the new Managed Metadata Service (MMS) application. This service application does two things:
- Provides and manages taxonomies to all site collections.
- Manages the content type syndication, or as SharePoint calls them "enterprise content types", where one site collection serves the role as the "hub" and pushes select content types to other site collections.
These two service offerings are available to any site collection that is in a content database associated with a Web Application that has been linked to an instance of an MMS via a service application proxy.
The MMS service application is really comprised of four core pieces, only two of which are important when it relates to taxonomies:
- Term Store Manager: Browser-based experience for creating & managing term sets and keywords.
- Database: This single database is where the taxonomies are stored and where information related to the syndication of content types is kept. Want to move a taxonomy form production to staging or dev? Check out my other post on the subject.
- Timer Jobs: A single timer job, Taxonomy Update Scheduler, is used to update Web applications on an hourly basis with any changes to terms used within site collections. Specifically this scheduled process looks for all terms that have changed since it last ran and updates a special hidden list (TaxonomyHiddenList) in the top-level site in a site collection with any changes. This list is covered in more detail below...
What if you are looking to do this in a hosted scenario or multi-tenancy and partitioning is important to you? I'm not going to try to explain it any better than the expert, Spence Harbar, does. Check out his series on multi-tenancy, specifically the part about partitioning the MMS.
NoteAn instance of the MMS service application is also referred to as a term store within the SharePoint taxonomy API.
There are multiple ways to consume and interact with the taxonomies provided by the MMS. First let's take care of one that most people will look for, but need to know up front the sad state of affairs.
Yes, there is a Web service for taxonomies in SharePoint 2010:
TaxonomyClientService.asmx. Unfortunately you can just ignore that its there. It only contains a few methods and some don't even have associated WSDL's and those that do are not very useful. If you look at the documentation for it in the SDK on MSDN you'll see virtually every method description starts with "Reserved for internal use." What the heck does that mean to you? It means they built it for limited functionality on the some of the Office 2010 client app's Backstage screen. Aside from that, it's worthless.
If you need to talk to taxonomies from off the server via a Web service, you need to roll your own.
SharePoint 2010 includes two field types that are added to the base SharePoint Foundation 2010 (SPF2010) field types once one of the server licenses are installed. These two are defined in the
[..]\14\TEMPLATE\XML\fldtypes_taxonomy.xml file and are loaded when SharePoint loads (when IIS starts up). The two types are:
- TaxonomyFieldType: Listed as Managed Metadata when creating a site column or list column.
- TaxonomyFieldTypeMulti: Hidden, not seen in the browser user interface.
Both field types are based on the SPF2010 lookup field types. The first one is visible and is used to create Managed Metadata columns as site columns or list columns that allow single selections. The latter one is used for when you say a field can store multiple terms. It isn't visible, but when you check the "Allow multiple values" check box, SharePoint makes sure it uses the correct field type in creating the field.
You'll find the code for both field types within the Microsoft.SharePoint.Taxonomy.TaxonomyField class which has a few properties that are worth noting:
- CreateValuesInEditForm: Boolean value that specifies whether the new Term objects can be added to the
TermSetwhile typing in the
- IsKeyword: Boolean value that indicates whether the
TaxonomyFieldvalue points to the Enterprise Keywords
- IsPathRendered: Boolean value that specifies whether the default
Labelobjects of all the parent
Termobjects of a
TaxonomyFieldobject will be rendered in addition to the default label of that
- IsTermSetValid: Boolean value that specifies whether the
TermSetobject identified by the
TermSetIdproperty exists and is available for tagging.
- Open: Boolean value that specifies whether the
TaxonomyFieldobject is linked to an open
TermSetobject or a closed
- SspId & TermSetId: GUID value of the unique IDs for the MMS instance (
SspId) and the unique ID of the term set in the term store.
- TextField: GUID that identifies the hidden note field in an item (more on this below).
The TaxonomyField class references two other classes like many other custom field types. The TaxonomyFieldValue class is used when accessing Managed Metadata columns via the API and exposes properties such as the Term label and unique ID among others. This class also has a companion, TaxonomyFieldValueCollection, that's used when the Managed Metadata field allows multiple selections.
Another class is the TaxonomyFieldControl which is used to load the field control when editing the list. This triggers loading of the
TaxonomyPicker.ascx field rendering control, the dialog with a term picker icon we see when we go to edit a Managed Metadata column.
Finally we have the
TaxonomyFieldEditor.ascx, the user control used when you select Managed Metadata as the column type where you select the term set to back with this control.
One more *important* thing to understand about the field type. The
TaxonomyFieldType, specifically the class
TaxonomyField, has a little trigger-like capability that all field types have. You can implement an
OnDeleted event (think trigger) that fires when you update the column (not the data within it, the column itself). When you add a column of type Managed Metadata to a list the
OnAdded() method fires. This method verifies that two event receivers have been added to the current list (these are detailed below) & the hidden site column
TaxCatchAll is in the list (more on this below). When you remove the column the
OnDeleted() method makes sure it was the last Managed Metadata column in the list and if so, it removes the event receivers.
As with everything in SharePoint, we'll find some Features that make a lot of this stuff work. For the taxonomy stuff, there are five Features worth noting… first a quick reference of what you'll find:
Now a brief explanation of what each one does:
- TaxonomyFeatureStapler: Staples the TaxonomyFieldAdded Feature to quite a few site templates. Which ones? Check out the table right after this list...
- TaxonomyFieldAdded: This could be considered the most important Feature of all… it does the following things:
- Adds links to the Site Settings & List Settings pages with a site collection.
- Creates the hidden list TaxonomyHiddenList in the top-level site of the current site collection (more on this list below).
- Creates site columns TaxKeywordTaxHTField, TaxCatchAll and TaxCatchAllLabel to the current site collection (more on these columns below).
- TaxonomyTenantAdmin: Simply adds a link to the Term Store Manager tool to the tenant admin page in Central Administration.
- TaxonomyTenantAdminStapler: Staples the TaxonomyTenantAdmin Feature to the TenantAdmin site template.
- TaxonomyTimerJobs: Creates the two timer jobs associated with the MMS.
So which site templates does the TaxonomyFeatureStapler attach the TaxonomyFieldAdded feature to?
|Site Template||Template Name|
|MPS#0||Basic Meeting Workspace|
|MPS#1||Blank Meeting Workspace|
|MPS#2||Decision Meeting Workspace|
|MPS#3||Social Meeting Workspace|
|MPS#4||Multipage Meeting Workspace|
|SGS#0||Group work site|
|OFFILE#0||(obsolete) Records Center|
|SPS#1||SharePoint Portal Server Site|
|SPSPERS#0||SharePoint Portal Server Personal Space|
|SRCHCEN#0||Enterprise Search Center|
As explained above, the TaxonomyFieldAdded Feature adds a few columns to the site column gallery in the top level site in the site collection. These are as follows:
So, what's special about these? The TaxCatchAll & TaxCatchAllLabel columns are lookup fields pointing to the special hidden list in the site collection (TaxonomyHiddenList). They point to the CatchAllData & CatchAllDataLabel fields in that hidden list and are used within each list that has a column of type Managed Metadata.
Site Collection RootWeb Hidden List
As explained previously the TaxonomyFieldAdded Feature creates a special hidden list in the top level site within a site collection. This list, the TaxonomyHiddenList, serves a unique role for taxonomies within SharePoint 2010 primarily in the area of performance. When a user update a list item's Managed Metadata column, the actual term(s) selected are added to this hidden list. The values in the hidden list are then referenced from the content list using the lookup fields (TaxCatchAll & TaxCatchAllLabel) that point back to the hidden list.
There is a performance benefit here as when the Taxonomy Update Scheduler job runs hourly, it looks for terms in the term sets that have changed and get a list of those. It then, hourly per web application, walks through each site collection in the web application and finds this hidden list. If it finds within the hidden list a term that matches it's master changed list, it updates that term in the hidden list. Therefore all lists that contain Managed Metadata columns see these changes as they are simply referencing the hidden list via lookups.
Each item in the hidden list represents a term referenced in a Managed Metadata column in a list within the site collection. The hidden list has the following columns:
|Title||Name of the term.|
|IdForTermStore||GUID of the term store (aka: MMS instance).|
|IdForTermSet||GUID of the term set.|
|IdForTerm||GUID of the term.|
|Path||Term path selected.|
|CatchAllData||Data used by search.|
|CatchAllDataLabel||Data used by search.|
|Term[LCID]||Localized term for each language pack installed.|
|Path[LCID]||Localized term path for each language pack installed.|
The two fields CatchAllData and CatchAllDataLabel need a bit more explanation. When SharePoint writes to a Managed Metadata column within a content list, an internal method is called that updates the values in these two columns in this hidden list. Both contain compressed data that is used for search. CatchAllData is a pipe delimited string of compressed GUIDs of the term store and term set for the term and then as many GUIDs from the term's ancestry as it can fit in the field. The CatchAllDataLabel column contains the same stuff, except it doesn't contain GUIDs rather it contains the labels used in the term.
Next up are the columns found in an actual content list or document library where a Managed Metadata column is created. When you create a new Managed Metadata column on a list, SharePoint does a few things behind the scenes. First, here's a list of the columns:
|DisplayName||InternalName||Data Type||Visible||Created By|
|Taxonomy Catch All Column||TaxCatchAll||Lookup||No||SharePoint|
First it creates your field using the field type TaxonomyFieldType or TaxonomyFieldTypeMulti (depending you said it can or can't do multiple values or not). This field is basically a lookup field that points to the Term[LCID] or Path[LCID] column (depending on if the user selected to show the whole path or just the term in display mode) in the special TaxonomyHiddenList to a specific item that is the actual term selected. At the same time SharePoint adds another hidden field of type Note that contains a reference to the actual term in the term set that was selected.
If this is the first time a Managed Metadata column is being added to the list, SharePoint also creates another additional hidden field called TaxCatchAll. This lookup field points to the CatchAllData column in the special TaxonomyHiddenList.
When a new column of type Managed Metadata is added to a list SharePoint ensures two event receivers are also attached to the list. These two event receivers, TaxonomyItemSynchronousAddedEventReceiver & TaxonomyItemUpdatingEventReceiver, fire when items are added and updated to the list. Their job is to update the hidden columns in the content list whenever the values within the Managed Metadata column change.
That wraps up almost all the pieces and parts involved in the taxonomy stuff in SharePoint 2010. With this knowledge it helps to understand what happens when columns are added and updated in a list as well as troubleshoot any problems you may be having in your deployment.