Time for something a little more practical…
Adding a 3rd party newsfeed to a CMS is something that many may eventually want to do. Newsfeeds with daily-fresh content can add a lot of SEO value to a website. This basically involves downloading a formatted XML file (e.g. from a company like Adfero) and merging the articles into your CMS, so that they display in your lovely news archives and on your front page (and wherever else you want to see them).
This is the second time I have worked with adding Adfero news feeds to a system, so you can learn from my mistakes without having to make them yourself.
The previous project included a pass-through system, where links were dynamically inserted into the Adfero copy on-the-fly. The links were matched to keywords and phrases. A new URL was provided to clients to pickup the modified newsfeed, now containing hyperlinks embedded into the original articles.
The current project is a bit more straight forward, merging a newsfeed into a CMS system, but built on my knowledge of the previous project.
The secret to merging News feeds into a CMS is “DO NOT merge the feed directly into the CMS database”. Honestly, there are so many things that can change (possibly even your entire CMS) that you want to keep this process separated into discrete reproducible/testable steps.
Stage 1 processing:
- Run 1-4 times a day, depending on how lively your newsfeed is.
- Download the Newsfeed file to a local file.
- Also allow for specific local file processing as well (so you can restore from XML news archive files)
- Extract a hierarchy of articles, categories and photos
- Add new articles, categories and photos to the News tables
- Update changed articles, categories and photos in the News tables
- Ignore any completely unchanged articles
The end result of all this is a local database (usually 3 tables) of all past and current articles. Ready for you to do with what you wish!
Stage 2 processing (anytime after step 1):
- Always run though the stored archive of all articles (does not take long, even for many 1000s).
- Find the correct parent for the CMS item (there may be more than one… more about this in Part 2).
- If an existing CMS item exists for an article, and has changed, update that CMS item.
- If no existing CMS item exists for an article, create a new article.
- If an existing CMS item exists for an article, and has not changed, do nothing.
- If the CMS item is under the wrong parent (e.g. it has been deleted) move it back.
The 3 separate groups of data you will work with are
- The XML Feed file
- The local news database tables
- The CMS database
Best to take this system one chunk at a time. In part 2 we will look at the XML feed format and describe how you can process it in a systematic way. The techniques used can be applied to many types of XML file.