"Discovering the DiscoverLibrary user: a tour of a few ways to retrieve Primo statistics"
Dale Poulter
Vanderbilt University
Sources of Primo statistics: BIRT reports, Google Analytics, Web logs, MySQL queries of Oracle DB
Why multiple sources? different data is collected; different methods; different collection times; different uses for the data
What are you going to use the data for?
BIRT reports (Business Intelligence and Reporting Tools) - open source, Eclipse-based reporting tool provided in Primo back office.
Searches Oracle DB based on limited user input
Graphical
Easy for any user to use
Delivered reports focus on weekly numbers
New reports can be created; split by institution
Provides info on Primo response times
Good data, but not always the data you really want
Google Analytics (added to Primo end-user interface by the customer):
Free
Easy to do, no SQL needed
Completely web accessible
Only need to add GA code to header or footer file - create custom header/footer in back office - don't put in delivered header or footer file (header.html - this is overwritten during upgrades)
Can create GA codes for each of your websites - compare apples to apples for different sites' activity
Specify date ranges
Browsers used
Access by mobile users - useful to determine which mobile devices are used, so not wasting time, e.g.: on apple dev if most devices are android
User behaviors (time spent, loyalty, e.g.: are users spending more time on Primo than on traditional catalog?)
SQL Queries: - will show a GUI interface later; s+ command in command line
Able to view more details: facets, tags, Get it, Did you mean?, etc - almost every click in Primo is recorded; means have lots of data but need to maintain log files and server space
Options almost unlimited
Can format data for Excel
Dale shares his favorite query for timestamp data.
SQL Maestro - cheap (he quotes $1000 for a site license); a GUI environment for creating SQL statements - look at structure of tables, linkages - a nice interface
Weblogs:
library_server.log - positive and negative that all this data is in this log - a lot of info to parse. Open it up and watch Primo get used in real time.
Real-time searches - including systematic searches (robot or human harvesting; API errors, etc)
Determine search errors
Often requires translation
User search behavior
Demo data shows all the information one can get. Can see all scopes, search terms - helpful for diagnosing problems with individually-scoped search boxes
What to do with all this data?
Raw number of searches done, including scoping of existing searches
Every 10 minutes they harvest search terms from log into a MySQL database of search terms
Make a search term cloud - allow tag cloud to be displayed (filtered/cleaned up for student display). This cloud appears in a "visual data" installation in their library's newly-renovated lobby. See it here.
Live demos of stuff:
BIRT reports: it's not that these don't give you data; there's a lot of data here. They don't necessarily give it to you in the way you want it. No reason to be concerned about running these reports during production hours - no real impact on end user response time. May be slow to respond, but needs to extract contents of Oracle DB and then get results formatted by Eclipse software. Often it's easier to do SQL queries for big reports than waiting on BIRT. Never write to Primo DB via MySQL, but can read from it. Demonstrates SQL Maestro and exports data to Excel. Demos some Excel manipulation of query results - shows their users aren't using tags.
Comparing Primo data outputs to catalog data outputs (number of searches per month/day, etc.) - Dale goes to a web interface for pulling this info from the catalog (they have a SIRSI product).
Google Analytics live demo for Primo pages (At Vanderbilt Primo is branded as "DiscoverLibrary"). Dale advises to think through before doing it how many GA codes you want. Too many becomes useless at some point because you can't compile everything together. Maybe not as detailed as MySQL but are good snapshots - also good for administrators who need quick, recent data for meetings.
What else can you do with this usage data? Shows video (linked above) of "visual data" display of recent search terms projected on the floor of the library's lobby in shapes. All shapes dynamically respond to foot traffic in the foyer. They can force to always have topic/author/etc. text in the display. This is useful for author visits and other events.
Question: Does hosted Primo allow direct query of Oracle by MySQL client?
Dale's answer: he doesn't know. Vanderbilt's Primo install is locally hosted.
Why multiple sources? different data is collected; different methods; different collection times; different uses for the data
What are you going to use the data for?
BIRT reports (Business Intelligence and Reporting Tools) - open source, Eclipse-based reporting tool provided in Primo back office.
Searches Oracle DB based on limited user input
Graphical
Easy for any user to use
Delivered reports focus on weekly numbers
New reports can be created; split by institution
Provides info on Primo response times
Good data, but not always the data you really want
Google Analytics (added to Primo end-user interface by the customer):
Free
Easy to do, no SQL needed
Completely web accessible
Only need to add GA code to header or footer file - create custom header/footer in back office - don't put in delivered header or footer file (header.html - this is overwritten during upgrades)
Can create GA codes for each of your websites - compare apples to apples for different sites' activity
Specify date ranges
Browsers used
Access by mobile users - useful to determine which mobile devices are used, so not wasting time, e.g.: on apple dev if most devices are android
User behaviors (time spent, loyalty, e.g.: are users spending more time on Primo than on traditional catalog?)
SQL Queries: - will show a GUI interface later; s+ command in command line
Able to view more details: facets, tags, Get it, Did you mean?, etc - almost every click in Primo is recorded; means have lots of data but need to maintain log files and server space
Options almost unlimited
Can format data for Excel
Dale shares his favorite query for timestamp data.
SQL Maestro - cheap (he quotes $1000 for a site license); a GUI environment for creating SQL statements - look at structure of tables, linkages - a nice interface
Weblogs:
library_server.log - positive and negative that all this data is in this log - a lot of info to parse. Open it up and watch Primo get used in real time.
Real-time searches - including systematic searches (robot or human harvesting; API errors, etc)
Determine search errors
Often requires translation
User search behavior
Demo data shows all the information one can get. Can see all scopes, search terms - helpful for diagnosing problems with individually-scoped search boxes
What to do with all this data?
Raw number of searches done, including scoping of existing searches
Every 10 minutes they harvest search terms from log into a MySQL database of search terms
Make a search term cloud - allow tag cloud to be displayed (filtered/cleaned up for student display). This cloud appears in a "visual data" installation in their library's newly-renovated lobby. See it here.
Live demos of stuff:
BIRT reports: it's not that these don't give you data; there's a lot of data here. They don't necessarily give it to you in the way you want it. No reason to be concerned about running these reports during production hours - no real impact on end user response time. May be slow to respond, but needs to extract contents of Oracle DB and then get results formatted by Eclipse software. Often it's easier to do SQL queries for big reports than waiting on BIRT. Never write to Primo DB via MySQL, but can read from it. Demonstrates SQL Maestro and exports data to Excel. Demos some Excel manipulation of query results - shows their users aren't using tags.
Comparing Primo data outputs to catalog data outputs (number of searches per month/day, etc.) - Dale goes to a web interface for pulling this info from the catalog (they have a SIRSI product).
Google Analytics live demo for Primo pages (At Vanderbilt Primo is branded as "DiscoverLibrary"). Dale advises to think through before doing it how many GA codes you want. Too many becomes useless at some point because you can't compile everything together. Maybe not as detailed as MySQL but are good snapshots - also good for administrators who need quick, recent data for meetings.
What else can you do with this usage data? Shows video (linked above) of "visual data" display of recent search terms projected on the floor of the library's lobby in shapes. All shapes dynamically respond to foot traffic in the foyer. They can force to always have topic/author/etc. text in the display. This is useful for author visits and other events.
Question: Does hosted Primo allow direct query of Oracle by MySQL client?
Dale's answer: he doesn't know. Vanderbilt's Primo install is locally hosted.
Comments