Complex Taxonomies Through Directory Structure

February 1st, 2020

I’m trying to figure out a way to represent several blog posts as part of a series. I could probably do it through some simple front matter on related posts, but I felt some need to do it just using the directory structure.

The structure will be posts > [category] > [series] > [post-content].md. Any markdown files directly under the category directory will appear on their own. Those within a series directory will be grouped with that series. The series directory requires a _series.md file to provide details about the series. Everything else should be magic.

The first step is to examine the path for each markdown file and use that to populate some new fields in the GraphQL schema. This will be accomplished using Gatsby’s onCreateNode API.

/* gatsby-node.js */

const { createFilePath } = require(`gatsby-source-filesystem`);
const { slugToTitle } = require(`./ultilities`);
exports.onCreateNode = ({ node, getNode, actions }) => {
  const { createNodeField } = actions;
  if (node.internal.type === `MarkdownRemark`) {
    const slug = createFilePath({ node, getNode, basePath: `posts` });
    const hierarchy = slug.split(`/`).filter((el) => el);
    const parent_nicename = slugToTitle(hierarchy[hierarchy.length - 2]);
    const fields = {
      slug,
      parent_nicename,
      parent_slug: `/${hierarchy.slice(0, hierarchy.length - 1).join("/")}/`,
      category: hierarchy.length > 2 ? hierarchy[1] : hierarchy[0],
      series: hierarchy.length > 2 ? hierarchy[1] : null,
      series_overview: false,
    };
    if (`_series` === hierarchy[2]) {
      fields.slug = `/${hierarchy[0]}/${hierarchy[1]}/`;
      fields.category = hierarchy[0];
      fields.parent_slug = `/${hierarchy[0]}/`;
      fields.parent_nicename = slugToTitle(hierarchy[0]);
      fields.series_overview = true;
    }
    const createNodes = (keys) => {
      keys.forEach((key) => {
        createNodeField({
          node,
          name: key,
          value: fields[key],
        });
      });
    };
    createNodes(Object.keys(fields));
  }
};

This ugly wall of Javascript is going to take all the markdown files that Gatsby processes and assign them a handful of fields that will be used when creating post lists and individual post pages.

Now the GraphQL API is going to return an individual post node like this:

{
  "fileAbsolutePath": "/eben.gilkenson.com/src/posts/moto/2020-plans.md",
  "frontmatter": {
    "title": "Planning Trips for 2020"
  },
  "fields": {
    "category": "moto",
    "parent_nicename": "Moto",
    "parent_slug": "/moto/",
    "series": null,
    "series_overview": false,
    "slug": "/moto/2020-plans/"
  }
}

A post that’s part of a series looks a little different:

{
  "fileAbsolutePath": "/eben.gilkenson.com/src/posts/code/learning-gatsby/css-modules-with-postcss.md",
  "frontmatter": {
    "title": "Setting up CSS Modules with PostCSS"
  },
  "fields": {
    "category": "learning-gatsby",
    "parent_nicename": "Learning Gatsby",
    "parent_slug": "/code/learning-gatsby/",
    "series": "learning-gatsby",
    "series_overview": false,
    "slug": "/code/learning-gatsby/css-modules-with-postcss/"
  }
}

Or, in the case of a series overview:

{
  "fileAbsolutePath": "/eben.gilkenson.com/src/posts/code/learning-gatsby/_series.md",
  "frontmatter": {
    "title": "Learning Gatsby"
  },
  "fields": {
    "category": "code",
    "parent_nicename": "Code",
    "parent_slug": "/code/",
    "series": "learning-gatsby",
    "series_overview": true,
    "slug": "/code/learning-gatsby/"
  }
}

All of this extra metadata will come into play when the actual HTML pages are created from the GraphQL data. There will be three templates used to generate pages:

category.js produces a list of posts and series within a certain category
series.js will combine the series overview content with a list of posts in that series
post.js formats the actual individual post pages

Gatsby’s createPages API will take in the list of markdown posts and generate pages using these templates.

const path = require("path");
exports.createPages = async ({ graphql, actions }) => {
  const { createPage } = actions;
  const posts = await graphql(`
    {
      allMarkdownRemark {
        edges {
          node {
            id
            fields {
              category
              series
              series_overview
              slug
            }
            frontmatter {
              date
            }
          }
        }
      }
    }
  `);

  // find the latest post date in a series
  const latestDate = (posts, series) => {
    const filtered = posts.data.allMarkdownRemark.edges
      .filter((edge) => {
        return (
          edge.node.fields.series === series && edge.node.frontmatter.pubDate
        );
      })
      .map((edge) => {
        return edge.node.frontmatter.pubDate;
      })
      .sort();
    return filtered[filtered.length - 1];
  };

  // collect for category pages
  const seriesDates = {};
  const categories = [];

  // loop over all markdown entries to generate posts and series pages
  posts.data.allMarkdownRemark.edges.forEach(({ node }) => {
    if (node.fields.series_overview) {
      seriesDates[node.fields.series] = latestDate(posts, node.fields.series);
      createPage({
        path: node.fields.slug,
        component: path.resolve("./src/templates/series.js"),
        context: {
          id: node.id,
          series: node.fields.series,
        },
      });
    } else {
      createPage({
        path: node.fields.slug,
        component: path.resolve("./src/templates/post.js"),
        context: {
          id: node.id,
          slug: node.fields.slug,
        },
      });
    }
    // collect unique, root level categories
    if (
      node.fields.category !== node.fields.series &&
      categories.indexOf(node.fields.category) === -1
    ) {
      categories.push(node.fields.category);
    }
  });

  // create a post/series list page for each category
  categories.forEach((category) => {
    createPage({
      path: category,
      component: path.resolve("./src/templates/category.js"),
      context: {
        category: category,
        seriesDates: seriesDates,
      },
    });
  });
};

This code fetches all the markdown posts from Gatsby’s GraphQL API and loops through them, generating pages for every post and series overview. It also collects the unique categories to be looped over later and finds the date of the most recent post in a series, which will be used to place the series overview in the correct spot on the listing page.

What’s Next?

There’s obviously a lot more to do before these posts are actual, usable webpages, but this explains the processes of taking a simple directory structure and one extra markdown file and using that to create hierarchical taxonomies.

The next step for me is to figure out how to reproduce the same functionality using a headless CMS, such as Sanity. I’m not sure if it will be easier or harder.