Monday, February 13, 2012

Notes from Code4Lib 2012

Some rough notes from Code4Lib 2012 Conference in Seattle. I'll link to the videos archives as time allows.


Monday, February, 6. Preconference Workshops.
Linked Data - Dan Chudnov
Corey Harper: http://www.tagasauris.com/ does Crowdsourcing social network sites to tag photos. But how do you map "canonical" descriptions/metadata to that data…approaching "aboutness". Then geo-tag. Recruiting others (turk it) to use Google Refine to map the data.
Devon Smith (OCLC): Moving beyond MARC. Decoupling MARC and remodeling into Linked Data. Beginning to explore the model, which vocabularies, FRBR relationships.
Dan Chudnov: Socially networked repo for linked data.
Brainstormed on DChud's idea.

Blacklight
4:05
You can see it on the cliobeta side bar.
4:05
It really took like 10 minutes at best to add it in.
4:13
4:15
4:16

Installed Blacklight on my Mac and imported a tiny set of MARC records from Voyager.

Tuesday, February 7, 2012. Keynote.
Dan Chudnov, GWU
"ketone" :)
"I love you, but we blew it! We have turned away too many people."
Retraced his family and personal history noting that "things fall apart." Retraced history and current state of Code4Lib, warning that "if we don't change, this thing will fall out from under us". People will stop trying to get in next year to attend the conference if we don't make room for them.
Non-techies want to learn to do what we do.
We must "Hack or Die"!
PyCon is a good model. 2 days of pre-conf training at all levels. Post-conf sprint days.
"Chicago's ready, are you?"

Tuesday, February 7, 2012. Session 1.
Git and Mercurial
Using version control on metadata
Mentioned in IRC: richard anderson at Stanford recently release a comparison of version control systems for "repository objects"
Case 2: Zephir metadata management system for HathiTrust. "Individually, this has been a fantastic decision." Everything works great, but when you're dealing with 10mill files, it's wicked slow.
If it looks like code, even if it's data, it will probably work.

Linked-Data-Ready Software for Libraries, eXtensible Catalog
Setting stage for linked data: converting MARC data to FRBR, designed schema for mapping to triples, developed platform to create linked data.
Bulk conversion of existing metadata, sync data conversion to existing systems, allow libraries to do it themselves, provide a way to experiment with data, make linked data available to developers (find out what libraries need, what developers want)
NCIP toolkit has a Voyager driver? Should we use this with VuFind and also to provide linked data.
Future plans for tools: vocabularies, expert enrichment of metadata
Scholarly Practice Participatory Program

Your catalog in Linked Data. Document at achelo.us/talks/marc2rdf
"Just take some baby steps". 
Perl script to convert macro to RDF triples, host a working SPARQL endpoint, and publish linked dataset to endpoint

HTML5 Microdata and Schema.org
HTML5 Semantics, but still don't tell us anything about our content. That's where HTML5 Microdata comes in.
Google found that RDFA was too complex for page editors, thus HTML5 Microdata.
Itemscope, Itemtype, Itemprop makes up entire schema. Everything is an item of type ItemType that has properties ItemProp.
Schema.org hosts schemas for ItemTypes
Catches: Only certain types trigger Google rich snippets, missing Types for many Objects, sparse properties for types.
Communities are just beginning to develop around defining ItemTypes, e.g. Marvel proposing Comics and Periodical Schemas.
Call to action: Try it out, report back to community

ALL TEH METADATAS! Declan Fleming
How we used RDF to manage digital assets. 
You can't simply make up the triples -> METS as container model. MODS for descriptive metadata, PREMIS/MIX/XDRE for Admin/Rights/Technical metadata, METS for structural metadata, file locations map to storage
DAMS metadata workflow (also starting to use for research data, 5 pilot data collections)
Why RDF? Flexible. Easy to change our minds if we want. Triples expressed in any form given the proper stylesheet. Serialize RDF back to XML and onto disk.

HathiTrust Large Scale Search
Performance: phrase queries too slow. Solution: CommonGrams
For large-scale search, favor precision over recall
Reading positions index is performance hit
[Look at slides again later and read blog posts]

Relevance Ranking in the Scholarly Domain, Ex Libris on Primo
Being methodical
Evaluation Metrics: MAP, MRR

Kill the Search Button 2, Michael Nielsen
The Handheld Devices Are Coming
"No child in the pipeline"
Gestures for a mobile search app
Hands are a really good tool, but unfortunately current paradigm is based on glass => "we slide"
Devices don't provide clues or feedback for gestures
Alternative = Direct manipulation, gesture-driven, palpable, tactile, feedback
Mobile projects at State University in Denmark: mobile search apps: HTML5 app for searching.
App for barcode scanning to location mapping.
Large number of possible interactions with smartphones
Chose to go native and build iPhone app. Wait for better support for HTML5
Cool! Checkout demo video
There are no standard mobile gestures. They may be individual or may not be appropriate or all.

Design for Developers, Lisa Kurt, U of Nevada
Basics of graphic design
It's difficult for people to identify why they like something. Vocabulary: color, composition, typeface
Study the designs that you love and those that you hate.
Why do we care? Maintain credibility with your audience. "You don't wanna have a rainbow s*!#storm, basically." [best quote of the conference]
"Don't use clip art that looks like clip art."
Alignment, space, pleasing art, balance, restraint.
Design a little and then walk away.

The Golden Road, U of Santa Cruz
Grateful Dead Archive Online
Using Omeka
User contributions through Omeka, Archive data through ContentDM
Also using ARK identifiers from EZID.
[Collaborate with Scott on Omeka/Hydra. Slides?]
Passing METS record from Omeka to Merritt (digital object storage http://merritt.cdlib.org/). Possibly also use Fedora for backend storage.
All metadata and objects directly into ContentDM. Problems interacting with ContentDM. Resorted to running batch scripts against Web Admin client.
METS/MODS from ContentDM

Hydra Breakout
some people exploring replacing Fedora with Microservices underneath
Hydra adds access control and gated discovery. Each head provides customized functionality to solve a specific need
Blacklight is read-only discovery
at least 15 institutions actively creating Hydra heads and using Hydra
Notre Dame's Atrium project is designed to be used as just a Blacklight extension or with the full Hydra stack for collections mgmt.
IRC channel #projecthydra
Head for curating dataset is at top of priorities, but real-world solutions have fizzled.

Breakout reports
Building OSS communities. Building a Library a la Carte community. U of Oregon maintaining. Testing an Amazon cloud deployment. FOSS4Lib bringing together libraries and programmers for pitching and developing ideas.
Schema.org and Microdata. Interest in use for citation data, datasets.

Tuesday, February 7, 2012. Lightning Talks.

  1. XTF. 
  2. Save MLAK. Code4LibJapan. 
  3. Created wiki for sharing information - actually worked well. Tshirts and bags for donations.
  4. Vendors Suck. Andrew Nagy, Serials Solutions. Not really. Your library's problems aren't unique. Call your vendor and ask to talk with the project manager.
  5. Heat Maps, not just for input analysis. Let's get grad students to teach instruction sessions. Identifying times of greatest need because grad students have limited time and have to travel.
  6. ElasticSearch. Gabriel Farrell. All JSON. Clustering and sharding out of the box. All interaction is with http and JSON. JSON-based document store. 
  7. NISO wants to hear about your problems. What environment or conditions are needed for addressing problems for interoperability? Several working groups available.
  8. Finding Images in Book Page Images. PicturePages, Eric Larson. Grabbing book page images with curl, run thru ImageMagick with some crazy processing.
  9. Rock and Roll Hall of Fame Library and Archive using Blacklight against EAD/MARC.
  10. Finding Movies with FRBR and Facets. Making it easier to find movies in libraries. People want to find movies, but libraries describe *publications*. Users don't care about a lot of the stuff we do, but they do care about versions (Blu-Ray, DVD, language). FRBRized records. only one hit per movie title, with multiple versions listed under it.
  11. Web usability using terms. Boyhun Kim, UW Medical Library. Don't over-rely on context: "Images" -> "Medical Images". Terms like "mobile" can be interpreted very broadly by users: "mobile" = "off-campus access as well as mobile app". Sometimes there is no better term: "Interlibrary Loan". Brevity will cost you. [See slides]
  12. Restriction Classes, B!#@hes. Simon Spero. OWL learning time. Commonly misunderstood aspect of OWL. Attempto-controlled English
  13. Processing.js. Make visualizations, graphics work using web standards and without plugins. Other uses: learning Japanese
Get LAMP 9:00PM in Ballroom

Wednesday, February 8, 2012. Morning Session 1. 
Digital Library User Behavior with Google Analytics, Kirk Hess, U of Illinois Urbana-Champaign
Can export data using API. [Could use to map collection handles to community handles]
Some British guy adds comment about calling Analytics events from your server. Interesting. I'll have to follow up on that.

Single Search Box, Corey Lown
73% of searches start from default tab. Other tabs are ignored - even though they used the other tabs during usability testing. Yup!
Worth watching again later with staff
"We pay a lot of attention to top search terms…. If we can get these searches working really well, we'll probably make a lot of people happy."
Single search box implies that you're confident it will work really well…hmmm
Tracking stats using a redirect script from search box

Building research applications with Mendeley, William Gunn
"Mendeley collection attention data about papers."
JISC DURA Project jisc-dura.blogspot.com. Researchers can use Mendeley desktop to dump their publications into a repository box.
This presentation could have been much better. Kind of hard to make sense of.
Give feedback on presentation
http://mnd.ly/c4l2012wg  Belgian

Wednesday, February 8, 2012. Morning Session 2.
Stack View, http://github.com/harvard-lil/stackview
Recreating records into pixels…size, height, thickness.
Heatmapping…coloring book renderings based on data: popularity, Amazon rating, whatever datapoint you want
Renderings are just HTML+CSS
Can use various APIs to get book data (catalog, Amazon, whatever)
Push out Stack View JSON and consume
Github version is only books. Harvard's production version will have other representations.

Bib framework for digital age, Jeremy Nelson, Colorado College
LoC working group 
NoSQL & Redis for bib records
BigTable and Hadoop
Redis key-value data structures are descriptive enough to model bib records using FRBR
[check out Twitter's Bootstrap for web and presentation design framework, and other stuff listed at the end]
[Watch this again later]
875,xxx+ records in Redis

Ask Anything
Dan Chudnov looking for someone to help with Rails on Umlaut
Data management plan support (related to Research Works Act) -- Declan's DMP pilot project at UC

Wednesday, February 8, 2012. Afternoon Session
Data Storage in the Browser. Jason Casden [Rewatch his browser data storage}

Lies, Damned Lies, and Lines of Code Per Day, Columbia University
Read "Making Software Work" book
How do you hire the right people? 
[Maybe watch this again]

Agile at Stanford, Naomi Dushay
Sheets on Taskboard photo: Backlog, On Deck, In Progress, Done
One week a month is Dead Week = no standing meetings

Research Networks and Citation Analysis Breakout [Me]

  • BibApp, VIVO, Zotero, Mendeley, vendor products
  • Potential role in promotion and tenure? Providing services to departments, e.g. CV creation.
  • Different needs for different types of institutions and disciplines. Humanities may actually need a wider network than biomedical, science.
  • Role in open access publishing
  • Role as discovery interface for multiple content and repository types. Richer relationship graphs than most repository software.
  • Take advantage of Zotero, Mendeley, crowdsourcing, and linked data. Tom (U. of Oregon) brought up AltMetrics and possibility of relating works better to existing ontologies.

Wednesday, February 8, 2012. Lightning Talks

  1. Zotero & SHERPA/RoMEO. Scott H. Filter a collection of articles by publisher policies. Zotero Plugin?
  2. Including library resources in LMS. Basic LTI. Passing query through http post with parameters about it's LMS context (course details) to some library application.
  3. FOSS4Lib, Peter Murray, Lyrasis
  4. I've Got Good News, Mark Matienzo. fiwalk progress for extracting archival metadata from digital media.
  5. ArchivesOnline at Indiana. Digitization workflow.
  6. Mashing up OPACs in Japan. Screen scraped *all* OPACs in Japan. Built web service for all 5000 libraries, 2000 OPACs.
  7. Make broadcast TV, radio available online, Denmark. Built repository, search, services
  8. Macaw. Joel, Smithsonian. Metadata collection tool for book-like things. On Google Code (php)
  9. LOD-LAM Incubator. Rachel, DLF. Practical application of linked open data in libraries, archives, and museums. Funding: Planning and Startup Grants [BibApp+Zotero/Mendeley funding?]. Kickstarter-type web site for projects.
  10. Project Shizuku. [See presentation from C4L11]. Making friends in libraries. Supporting encounters among library users

Thursday, February 9, 2012. Keynote
Bethany (@nowviskie), ScholarsLab, UVa
Lazy Consensus, How Impatient People Can Change the World
How to move world forces, even against their will. Wielding lazy consensus because it's already being wielded against you.
Lazy consensus = "Yes" becomes the default. Saying "If you can't be bothered to speak a timely "yes" or "no" then you probably don't care enough about this matter to formulate an opinion."
It may be a socially contract, but it's practically a natural law.
[Watch Dark City movie]
[Watch this with staff again later]

Thursday, February 9, 2012. Lightning Talks

  1. Title. David Uspal, Villanova. Projects: Interactive Map, Tap Campus Tour, URL Manager
  2. Adding Hathi Trust records to your Solr-based index (Blacklight, VuFind). Robert, UVa
  3. DJango-based Discovery Layer
  4. Turbo MARC in YAZ. Dennis Schafroth, Index Data [Watch this again]

[Left early. Watch missed talks]