Posts Tagged ‘PHP’

Datadotgc.ca – A Drupal case study: Part 2

Colin Calnan | Wednesday, June 23rd, 2010

This is the second part of Drupal Case Study on integrating the CKAN data repository with Drupal 6. Part 1 covered the following:

  • What is CKAN?
  • CKAN’s API
  • The Foundation
  • The Build
  • Theming
  • Homepage Chart

Caching

API calls are expensive. There’s no doubt about that. Particularly when you’re returning large amounts of data. To avoid any issues of the CKAN API being exhausted from requests and to ensure that the site remained responsive, I decided to leverage Drupals caching mechanisms and pretty much cached everything I could, within reason. The Chart, Tag Cloud, Tag lists, Ministry lists, All Packages list and all individual packages are cached. The issue with caching on this site is that if a package gets updated on the CKAN instance, we need to know about that on our Drupal site immediately and then clear the appropriate caches so that the most recent data can be retrieved.

For caching I created a table called ‘cache_ckan’, that stores everything I need. To create this table I used the schema of the existing cache table and put that in my .install file in my module directory.

/**
 * Implementation of hook_install().
 */
function ckan_install() {
  drupal_install_schema('ckan');
}
 
/**
 * Implementation of hook_uninstall().
 */
function ckan_uninstall() {
  drupal_uninstall_schema('ckan');
}
 
/**
 * Implementation of hook_schema().
 */
function ckan_schema() {
  $schema = array();
  $schema['cache_ckan'] = drupal_get_schema_unprocessed('system', 'cache');
  return $schema;
}

Whenever this module is enabled this schema will be run and the table will be created.

What is stored in the ckan_cache table?

There are various items stored in the cache table.

  1. The Homepage chart data
  2. Tag lists
  3. Ministry lists
  4. List of all datasets

Let’s take the list of all packages as an example. I covered how I implemented the paging in my previous post. As this list is paginated it’s important that every page be cached to improve the speed of the site. As the paging mechanism is already implemented it’s just a case of creating a cache table entry (ckan:all{page-number}) for each page, and then checking for it’s existence when loading the page.

if(($cache = cache_get('ckan:all'.$page, 'cache_ckan')) && !empty($cache->data)) { // If cached data exists for this page...
	$results = $cache->data;
} else {
	$ckan = ckan_ckan();
 
	$start = 0;
	$items_per_page = variable_get('ckan_items_per_page', 4);
	if($page) {
		// If we're in a page, we need to set where to start the list
		$start = $page * $items_per_page;
	}
 
	// Set the offset to the number of records in
	$offset = $start;
	// Limit to the number of items per page 
	$limit = $items_per_page;
 
	try {
		$results = $ckan->advancedSearch(array('groups' => 'canadagov', 'all_fields' => '1', 'offset' => $offset, 'limit' => $limit));
	} catch (Exception $e){
		return $e->getMessage();
	}
 
	// If the API call worked
	watchdog('ckan', 'Called CKAN API for list of all packages');
    	cache_set('ckan:all'.$page, $results, 'cache_ckan');
}

This method is very simple and very effective. It means the pages load lightning fast and only one page of data at a time is retrieved.

How does the cache get cleared/updated

Datasets/Packages change all the time on the CKAN instance, so how do you make sure that the Drupal site has the most current data. This module has two ways of managing that.

1. Using hook_form to redirect to CKAN

As the CKAN nodes on Drupal are created on the fly and hold very little information, there is really no need to access the EDIT form for these nodes. Whenever an admin user clicks the edit tab on the node, they are automatically redirected to the appropriate CKAN package editing screen. hook_form is called to retrieve the form that is displayed when one attempts to “create/edit” an item. For CKAN content types, the user is redirect to the CKAN instance.

/**
 * Implementation of hook_form
 *
 * Redirect the user to ca.ckan.net package edit screen on edit
 */
function ckan_form(&$node, $form_state) {
  if($node->type == 'ckan') {
  	drupal_goto('http://ca.ckan.net/package/edit/'.$node->body);
  }
}

When the CKAN form is submitted, CKAN then redirects back to the Drupal site and calls a specific URL that tells Drupal to call CKAN again to get the package information and populate the node. To clarify, the process is

  1. Redirect http://www.datadotgc.ca/node/X/edit to http://ca.ckan.net/package/edit/{name of X}
  2. On save of CKAN Package, redirect to http://www.datadotgc.ca/{special_url}/{name_of_X}
  3. Load the node with {name_of_X}
  4. Call CKAN to get the (updated) data for Package {name_of_X}
  5. Save the node with updated data

Using Cron and an Atom Feed

CKAN provides an Atom feed of recent updates to the Packages. Cron checks this feed every time it runs. If the feed has changed since the last cron run, then we know there have been updates and we clear all of the caches.

/**
 * Implementation of hook_cron()
 *
 **/
function ckan_cron() {
	// Get the md5sum of the current atom feed
	$current_feed = trim(md5_file('http://ca.ckan.net/revision/list?format=atom'));
	watchdog('ckan', 'Current feed md5: '. $current_feed);
	// Retrieve the previously stored md5sum
	$previous_feed = variable_get('ckan_atom_feed_md5', $current_feed);
	watchdog('ckan', 'Previous feed md5: '.$previous_feed);
 
	// If there have been changes
	if($current_feed != $previous_feed) {
		watchdog('ckan', 'ATOM feed has updated, clearing caches and deleting nodes');
		// Flush all the caches
		cache_clear_all('*', 'cache_ckan', TRUE);
  	        // Set the previous feed md5
		variable_set('ckan_atom_feed_md5', $current_feed);
	}
}

Tag cloud creation

I borrowed some code from the Tagadelic module to achieve the tag cloud

/**
 * Build a tag cloud based on the settings provided
 *
 * @return	String	A themed list of weighted tags
 */
function ckan_tag_cloud() {
	// If there is cached data
	if(($cache = cache_get('ckan:tags', 'cache_ckan')) && !empty($cache->data)) {
		$results = unserialize($cache->data);	
	} else {
		$ckan = ckan_ckan();
		$results = $ckan->getTagCount();
		watchdog('ckan', 'Called CKAN API for tag cloud');
		cache_set('ckan:tags', serialize($results), 'cache_ckan');
	}
 
	// Let's sort them by weight first off
	foreach ($results as $key => $row) {
    $tag[$key]  = $row[0];
    $weight[$key] = $row[1];
	}
	array_multisort($weight, SORT_DESC, $results);
 
	// Now let's get the top X number of tags
	$results = array_slice($results, 0, variable_get('ckan_tagcloud_total', 40));
 
	// Now build the tags
	$tags = ckan_tag_build_weighted($results);
	// Sort them
	$tags = ckan_tag_sort($tags);
	// Theme them
	$output = theme('ckan_weighted_tags', $tags);
	return $output;
}
 
/**
 * Theme function that renders the HTML for the tags
 * @ingroup themable
 */
function theme_ckan_weighted_tags($tags) {
  $output = '';
  foreach ($tags as $tag) {
    $output .= l($tag['name'], 'data/tag/'.$tag['name'], array('attributes' => array('class' => "tagcloud level".$tag['weight'], 'rel' => 'tag'))) ." \n";
  }
  return $output;
}

Using the CKAN Search API for all lists

Ok, so what’s this all about? CKAN has some nice API calls like /api/rest/package/PACKAGE-REF that return a list of Packages. However these return the name/id of the Package ONLY. In our case, for our listings, we wanted other data, such as the tags attached to the Package as well as a brief description.

The only way to get this data was to do a search API call /api/search/package and pass some extra parameters, in this case all_fields=1 and department={name of Ministry}.

all_fields=1 tells the search to return all Package fields, not just the name/id; just as is if you called /api/rest/package/PACKAGE-REF.

department={name of Ministry} tells the search to return all packages that have a department of {name of Ministry}. The lovely folks at CKAN added this functionality for us on request.

What does this look like, well it’s pretty simple really. Call the advancedSearch() function. Pass it a few parameters and it returns you all the data you need. Here’s the function itself:

public function advancedSearch($parameters){
	foreach($parameters as $key => $value) {
		$querystring .= $key .'='. urlencode($value) .'&';
	}
	$results = $this->transfer('api/search/package?'. $querystring);
	if (!$results->count){
		throw new CkanException("Search Error");
	}
	return $results;	
}

And here is that function being called for the list of Ministry Packages. The offset and limit are for the paging mechanism:

// Call the function
$results = $ckan->advancedSearch(array('department' => $ministry, 'all_fields' => '1', 'offset' => $offset, 'limit' => $limit));

There’s a lot more functionality in this module, more than I can go through in a blog post, even 5 posts. If you’re trying to integrate Drupal with a CKAN instance and are not sure where to start then please leave a comment and I’ll get back in touch.

Displaying Webform results in a block

Colin Calnan | Wednesday, January 14th, 2009

We use the webform module for most of our form needs on our client sites. It’s a pretty good module that provides most of the functionality required. The most important part of the webforms module is the submission data. The reason for the form in the first place is to gather data that can be analyzed and then acted on. Webform has a results area where you can view all the submitted data, analyze it, download it and export it. It does this using some useful built in functions such as webform_get_submissions.

However, say you want to get the number of submissions to a particular field in a form and then display the most popular choices in a block, a little similar to a poll but the data is collected as part of a webform. For our example, the form is collecting a list of reasons for donating and then displaying the top 5 reasons in a block on the site.

One way to achieve this is via a module. We create a block using hook_block and then display the top 5 reasons in it (explanations are in comments in code):

function mymodule_block($op = 'list', $delta = 0, $edit = array()) {
  switch($op){
    // Create the block listing in the block admin page
    case 'list':
      $blocks[0]['info'] = t('Top 5 Reasons for donating');
      return $blocks;
      break;
 
    case 'view':
      switch($delta) {
        case 0:
          if (module_exists('webform')) {
 
            // Set the node id to the webform we require the results of
            $nid = 999;
 
            // Load the webform node
            $node = node_load($nid);
 
            // Retrieve the list of reason choices from the webform field, split at line breaks 
            // The choices are entered in a textarea when creating a form 
            // in the format "i_have_money|I have lots of money"
            $reason_key_pair = explode('', $node->;webform['components'][1]['extra']['items']);
 
            // Loop through the textarea to create an array in the form 
            // $submission['i_have_money']=>;I have lots of money.
            foreach($reason_key_pair as $key =>; $value) {
              $new_array = explode('|', $value);
              $submissions[trim($new_array[0])] = $new_array[1];
            }
 
            // Execute the query to return the top 5 reasons for donating from the webform_submitted_data table
            $result = db_query("SELECT sd.data reason, COUNT(sd.data) total
              FROM {webform_submissions} s
              LEFT JOIN webform_submitted_data sd ON sd.sid = s.sid
              WHERE sd.nid = %d AND sd.data != '%s'
              GROUP BY sd.data
              ORDER BY COUNT(sd.data) DESC
              LIMIT 0,5", $node->nid, '');          
 
              // Create an array which will hold the reasons - which we will then display in the block
              $reasons = array();
 
              // We run through the submissions and match them to the choices from the webform textarea values
              while ($row = db_fetch_array($result)) {
                $safe_key = $row['reason'];
 
                // If a user chosen option actually exists in the current options available, then display it
                if(array_key_exists($safe_key, $submissions)) {
                  $reasons[] = $submissions[$safe_key];
                }
              }
 
              // If there are no results we need to display a message
              if (count($reasons) == 0) {
                $reasons[] = array('data' => t('There are no reasons'));
              }
 
             // Theme the results as a list to be styled with CSS
             $block['content'] = theme('item_list', $reasons, NULL, 'ol', $attributes = NULL);
          }
        	break;
        }
      return $block;
      break;
  }
}

Enable the module, then go to the blocks admin page and your block will now be available. Simply set the title and position it where you want it.

An example of this block can be seen in the right sidebar on the BCNDP website.

 


t. 604.684.2498 | f. 604.721.4007 | e. turningheads [at] raisedeyebrow.com