Link Attributes Extension: Template Building Block

Earlier we talked about customizing an existing default popup in Tridion, and the post gave birth to the start of our Link Attributes GUI Extension.  As a recap, we were creating an extension that would allow us to add custom link attributes onto our hyperlinks added by the Hyperlink button.  We decided that the GUI Extension would add hash parameters to the link’s href attribute, and that’s pretty much where the post left off.  Today we will be adding a Template Building Block that will be responsible for stripping out the hash parameters that were added by the GUI Extension and converting them to attributes on the link element itself.

In another post we mentioned using HtmlAgilityPack in our Template Building Blocks, so although the regex for this task would be fairly simple, I’ll be using that library in our TBB.  If you are going to be using this extension and this TBB yourself, you’ll have to add the HtmlAgilityPack DLL to the GAC on your Tridion servers as well as reference it in your TBB project.

Next you will want to create a new class in your TBB project and add the following code:

using System;
using System;
using HtmlAgilityPack;
using Tridion.ContentManager.Templating;
using Tridion.ContentManager.Templating.Assembly;

namespace ContentBlooom.TemplateBuildingBlocks
{
    /// <summary>
    /// This TBB is responsible for converting the linkAttr- hash params created by the GUI Extension  into actual
    /// attributes on the link element that they've been placed.
    /// </summary>
    [TcmTemplateTitle("Link Attributes Converter")]
    public class LinkAttributesConverter : ITemplate
    {
        public void Transform(Engine engine, Package package)
        {
            bool outputModified = false;
            Item outputItem = package.GetByName(Package.OutputName);
            string outputString = outputItem.GetAsString();

            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(outputString);
            doc.OptionOutputOriginalCase = true;

            var linksWithAttributes = doc.DocumentNode.SelectNodes("//a[contains(@href, 'linkAttr-')]");
            if (linksWithAttributes == null)
            {
                return;
            }
            
            foreach (var link in linksWithAttributes)
            {
                string url = link.Attributes["href"].Value;
                string hashString = url.Substring(url.IndexOf("#") + 1);
                string[] hashParams = hashString.Replace("&amp;", "&").Split('&');
                
                bool hasLinkAttributes = false;

                url = url.Substring(0, url.IndexOf("#"));
                hashString = String.Empty;

                foreach (string hashParam in hashParams)
                {
                    if (hashParam.StartsWith("linkAttr-"))
                    {
                        // If its a link attribute, add it as an attribute and then remove it from the hash string
                        hasLinkAttributes = true;
                        outputModified = true;

                        string[] rule = hashParam.Split('=');
                        string attributeKey = rule[0];
                        string attributeValue = rule[1];

                        link.Attributes.Add(attributeKey.Replace("linkAttr-", String.Empty), attributeValue);
                        url = url.Replace(hashParam, String.Empty);
                    }
                    else
                    {
                        // Keep any existing hash info there...
                        hashString += hashString.Length == 0 ? "#" : "&amp;";
                        hashString += hashParam;
                    }
                }
                if (hasLinkAttributes)
                {
                    link.Attributes["href"].Value = url + hashString;
                }
            }

            if (outputModified)
            {
                package.Remove(outputItem);
                outputItem.SetAsString(doc.DocumentNode.OuterHtml);
                package.PushItem(Package.OutputName, outputItem);
            }
        }
    }
}

Simple, no? You’ll want to put this TBB directly after your Building Block that provides the output (your DWT, Razor… whatever). Or even possibly as the first item in your Default Finish Actions. Make sure to create a component that has some links created with our modified Link popup from the earlier article, and run and execute.

Output prior to Link Attribute Converter:

<article>
    <p>Testing link attributes and what not and other stuff.</p>  
    <p>Should include a <a href="tcm:14-103941#linkAttr-newLink=test" title="a title">component link</a> as well as a normal <a href="http://www.example.com#linkAttr-custom=check" title="blah">http type of link</a>.</p>
    <p>We also need to see how it plays with <a href="http://www.test.com#blah=meh&amp;linkAttr-t=v">links with hashes</a> already.</p>
    <p><a href="mailto:e@mail.com#a=b&amp;c=d&amp;linkAttr-mailed=them" title="title">Multi hashed Link</a></p>
 </article>

And our output after our TBB:

 <article>
     <p>Testing link attributes and what not and other stuff.</p>  
     <p>Should include a <a href="tcm:14-103941" title="a title" newlink="test">component link</a> as well as a normal <a href="http://www.example.com" title="blah" custom="check">http type of link</a>.</p>
     <p>We also need to see how it plays with <a href="http://www.test.com#blah=meh" t="v">links with hashes</a> already.</p>
     <p><a href="mailto:e@mail.com#a=b&amp;c=d" title="title" mailed="them">Multi hashed Link</a></p>
 </article>

Not Quite Done

Our extension is not quite done just yet. Stay tuned once more as we add some enhancements and fixes to our little extension, such as the ability to add more than just one link attribute!

Searching and Modifying Output In Tridion Template Building Blocks With HtmlAgilityPack

In your Tridion career, you’ve probably written countless of C# Template Building Blocks. You’ve probably rolled your own custom Link Resolver, your own “Add or Extract Binaries From…”, your own cleanup templates… dozens of Building Blocks where the goal was to search for specific html patterns and attributes or modify the output in some way. And, if you’re like me, you’ve probably had to write numerous regex expressions to accomplish your tasks. I’ll admit that I’m no regex ninja or anything, its usually through trial and error that I get my regular expressions working correctly. Recently however, I attempted to write an expression that handled nested elements that could go any number of levels deep, with possibility to have several different variations of the pattern that I was looking for. My regex skills was just not good enough, and I thought for sure there must be a better way to parse the html output string of these patterns.

The search was short, but I found exactly what I was looking for: Html Agility Pack

Using HtmlAgilityPack in your Tridion Template Building Block is simple… just reference the DLL, and make sure you install the DLL into the GAC on the CMS and Publishing servers.

Now that we are set up and ready to run, let’s go ahead and write some code that will take the output from the package, and load it into an HtmlDocument.  Your code should look something like the following:

using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using Tridion.ContentManager.Templating;
using Tridion.ContentManager.Templating.Assembly;
using HtmlAgilityPack;

namespace CodedWeapon.Samples
{
    [TcmTemplateTitle("HtmlAgilityPack Tester")]
    public class HtmlAgilityPackTester : ITemplate
    {
        private TemplatingLogger _logger = null;

	protected TemplatingLogger Logger
	{
		get
		{
			if (_logger == null) 
                            _logger = TemplatingLogger.GetLogger(this.GetType());

			return _logger;
		}
	}

        public override void Transform(Engine engine, Package package)
        {
            Item outputItem = package.GetByName(Package.OutputName);
            string outputString = outputItem.GetAsString();

            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(outputString);
        }
    }
}

If you’re familiar with XmlDocument, then HtmlDocument should not look so strange to you. All we are doing in the above is grabbing the output, and passing the output (as a string) into our HtmlDocument instance. Simple enough right? And now for the fun… seeing how simple it is to use this library to get the exact elements that we are looking for. Since a lot of the time we work with links, I’ll show some examples of grabbing some anchor tags.

Grabbing Every Anchor

Lets say we want to grab every anchor element present on the output:

HtmlNodeCollection nodes = doc.DocumentNode.Descendants("a");

You can loop over each of the nodes and perform any necessary operations that you want:

foreach (var node in nodes)
{
    Logger.Debug("Found node with url: " + anchor.Attributes["href"].Value); 
}

Just like with XmlDocument, you can also use XPath:

HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//a");

Querying for Specific Elements

You can use XPath to search for the specific elements that you are looking for. For example, if we wanted to match all links with a specific url:

HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//a[@href=\"http://www.example.com\"]");

Or what if we wanted to grab every link that actually contains a title attribute:

HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//a[@title]");

Or better yet… what if we want check any node that does not contain the title attribute at all?

HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//a[not(@title)]")

If you’re not a fan of XPath, then you’ll be happy to know that you can also use LINQ like in this following example where we are looking for any link that has an href attribute of “#”:

var nodes = doc.DocumentNode.Descendants()
    .Where(n => n.Attributes["href"] != null && n.Attributes["href"].Value.Equals("#"));

Modifying Output

Normally when we’re searching for elements it’s because we may need to modify the markup in someway, either by adding attributes, removing elements, etc. In the following example, we are adding a custom attribute onto our anchors that contain a value of “#” in the href attribute:

foreach (var node in doc.DocumentNode.SelectNodes("//a[@href=\"#\"]"))
{
    node.Attributes.Add("data-my-custom-attribute", "true");
}

But what if we have a more complicated requirement where we need to strip a <tcdl:ComponentPresentation> tag (but ensure that we keep the inside markup). One may be tempted to try the following:

foreach (var node in doc.DocumentNode.Descendants("tcdl:ComponentPresentation"))
{
    node.ParentNode.RemoveChild(node, true); // this strips the tcdl tag while preserving the children... but its modifying the collection...
}

package.Remove(outputItem);
outputItem.SetAsString(doc.DocumentNode.OuterHtml);
package.PushItem(Package.OutputName, outputItem);

However, while running the above, you’ll get an error that says Collection was modified; enumeration operation may not execute. This is because you cannot modify the collection (add/remove) while enumerating it. But, we can keep a record of the nodes and operate on them after:

List nodesToChange = new List();
foreach (var node in doc.DocumentNode.Descendants("tcdl:ComponentPresentation"))
{
    nodesToChange.Add(node);
}

// Now we strip the wrapping tags...
foreach (var node in nodesToChange)
{
    node.ParentNode.RemoveChild(node, true);
}

package.Remove(outputItem);
outputItem.SetAsString(doc.DocumentNode.OuterHtml);
package.PushItem(Package.OutputName, outputItem);

Hopefully this will end the regex blues out there for anyone who may be struggling!