The generate-sitemap GitHub action generates a sitemap for a website hosted on GitHub Pages, and has the following features:
- Support for both XML and txt sitemaps.
- When generating an XML sitemap, it uses the last commit date of each file to generate the <lastmod> tag in the sitemap entry. If the file was created during that workflow run, but not yet committed, then it instead uses the current date.
- Supports URLs for html and pdf files in the sitemap, and has inputs to control the included file types.
- Also supports including URLs for a user specified list of additional file extensions in the sitemap.
- Checks content of html files noindex directives, excluding any that do from the sitemap.
- Parses a robots.txt, if present at the root of the website, excluding any URLs from the sitemap that match disallow rules.
- Sorts the sitemap entries in a consistent order, such that the URLs are first sorted by depth in the directory structure (i.e., pages at the website root appear first, etc), and then pages at the same depth are sorted alphabetically.
- Assumes that for files with the name index.html that the preferred URL for the page ends with the enclosing directory.
- Provides option to exclude .html extension from URLs listed in sitemap (GitHub Pages automatically serves the corresponding html file).
The generate-sitemap GitHub Action is implemented in Python, and the source code repository is hosted on GitHub; and it is licensed under the MIT License. Also in the GitHub repository you will find detailed instructions for use including several sample GitHub workflows.
The generate-sitemap GitHub Action is developed by Vincent A. Cicirello. It was originally implemented for my own use, but I have decided to share it with others.
|Vincent A. Cicirello's website||sitemap-generation.yml||sitemap.xml|
|Documentation site for Chips-n-Salsa||docs.yml||sitemap.xml|
|Documentation site for JavaPermutationTools||docs.yml||sitemap.xml|