Schedule Scraper
Export RoSI/ACORN timetable to iCal format

Go to GitHub JavaScript jQuery

Generate a .ical file from your RoSI or ACORN timetable. The iCal file can then be easily imported into your application of choice, like iCalendar or Google Calendar.

Usage

Three super easy ways to use this script!

Bookmarklet (recommended)

Run the script from a bookmark. Simple and reusable!

  1. Drag and drop Schedule scraper into your Bookmarks bar.
    Chrome: You may need to push ctrl + shift + B to open your Bookmarks bar.
    Firefox: Follow Mozilla's instructions to show your Bookmarks bar.
    Safari: You may need to push cmd + shift + B to view your Bookmarks bar.
  2. Navigate to the timetable to be exported.
  3. Click the bookmark!
    Note: You may need to actually press the "Download .ics file" button that appears above the timetable in order to trigger the download. If this button does not appear then something has gone horribly wrong.

Run in address bar

  1. Navigate to the timetable to be exported.
  2. In your address bar, delete everything and type javascript:, then paste the following code.
    
        
  3. Hit enter!

Run in JavaScript console

  1. Navigate to the timetable to be exported.
  2. Open the JavaScript console with ctrl + shift + J.
  3. Paste the following script into the console.
    
            
    Firefox: You may need to type allow pasting into your console to enable pasting into the console. Follow the instructions if they appear.
  4. Press enter!
    Internet Explorer: Press ctrl + enter instead. And stop using Internet Explorer.

Challenges

Cross-Origin Requests

Most of the data comes from the currently loaded page at https://acorn.utoronto.ca/sws/welcome.do?welcome.dispatch#/, but the script also makes a request to http://www.apsc.utoronto.ca/timetable/fall.html - the "master" timetable. It pulls professor names and course start dates from this page, which is not included in ACORN. Unfortunately, these two pages reside on different subdomains, so most modern browsers block this request under the Same Origin Policy.

To resolve this issue, the data needs to be reformatted as JSONP by an external server (since you ARE allowed to request scripts cross-domain). I used http://alloworigin.com/ since it was allowed HTTPS connections for free. Unfortunately, I managed to send the server into an infinite loop, but a quick reboot managed to fix that issue.

Because loading the 100 MB file through a proxy was taking too long (and unreliable, as the server continually timed out), this feature has been temporarily disabled. I am currently looking to use Cobalt, a third-party RESTful web API for U of T data.

Unfortnuately, since the maintainers of the project are currently unavailable, the script currently makes reads a pre-scraped file in the repository, defaulting to hard-coded sessional dates if a course cannot be found.

Asynchronous Javascript

Because AJAX requests are by definition asynchronous, the work to process them has to end up in callback functions. This results in an inversion of control due to callbacks nested within each other deeper and deeper.

makeAjaxRequest(function(data) {
        processData(data, function(processedData) {
            displayData(processedData);
        });
    });

"Inversion of control" due to nested callbacks

One way to handle this would be to implement an event queue that waits for each AJAX call to complete before starting the next. This structure would implicitly pass the output of one function to the input of the next. There is also room for an error handling function.

executeSequentially(makeAjaxRequest, processData, displayData, errorHandler);

AJAX queue runner

However, the prevalent method is to use Promises.

makeAjaxRequest().then(processData).then(displayData);

AJAX calls using promises

The promises are resolved or rejected inside the callbacks, but the caller never has to worry about that. Promises can simply be chained onto one another and the results of each will be passed to the next.

For the purposes of this project, the Deferred objects provided by JQuery were sufficient, so I avoided the usual loading of a promise library.

When to execute?

Not only must the script execute its own asynchronous components at the correct time, but it must also start execution after the document has been loaded. Usually, it is easy to attach the script to an onLoad() event so the script will be run as a callback, but since we do not know when the user will run the script, this is impossible. For example, if the user runs the script before the page is ready, then the script may fail to execute (since part of the page is missing). If the user runs the script after the page has been fully loaded, the onLoad() event has already been called and the script will never be run. Instead, we use the following short snippet to run the script immediately if the page is loaded, but attach the script to onLoad() if not.

document.readyState == 'complete' ? run() : window.onload = run;

run immediately if page is loaded, otherwise wait for the page to load.

Whitespace

When testing on my own schedule, it was easy to scrape the correct segments of HTML into JavaScript objects. However, during beta testing it was revealed that arbitrary whitespace/linebreaks could appear throughout the content for different people. This complicated the line-based scraping algorithm, which eventually had to be replaced with a tag-based one. Superfluous use of .trim() also helped mitigate the effects of additional whitespace.