You are here

Importing a static HTML site to Drupal

Submitted by Autumn on July 12, 2013 - 11:00am

Recently, one of our clients was looking to import a static site to Drupal. There was no RSS feed, no XML file, and not all of the pages were terribly similar. On top of that, the structure of the site didn't seem to have any particular file system, making anything sequential out of the question.

Some modules we've looked at when trying to import a static HTML site with no XML or RSS include:

Thus far, this is the workflow we have been able to accomplish using Feeds and Feeds XPath Parser.

  1. Enter a URL from the old site (e.g. http://example.com/directory/tags/mypage.html).
  2. Click the import button.
  3. A node is created.

And even here, we have some problems.

  • Legacy URLs aren't maintained.
  • No formatting is brought over.
  • If we have hundreds of pages, we don't want to repeat that process hundreds of times (or look for each page).

Unfortunately, this isn't quite what we're looking for. Ideally, the workflow would go something like this.

  1. Enter the base URL (e.g. example.com).
  2. Click the import button.
  3. Import checks each link and matches it against the base URL.
  4. Those that are either relative or part of the base URL are imported as nodes.

So now we're reaching out to the Drupal community in hopes that somebody has run into this same scenario that we're dealing with.

---

Richir Outreach is dedicated to helping advocacy organizations achieve their goals via online outreach and web design. Shop our products at http://shop.richiroutreach.com.