Archie R. Dykes Library Blog: Notes from Code4Lib 2012

Some rough notes from Code4Lib 2012 Conference in Seattle. I'll link to the videos archives as time allows.

Code4Lib2012 on Lanyrd
Code4Lib2012 Livestream video
Crowdsourcing to improve Code4Lib 2012 Video Archive

Monday, February, 6. Preconference Workshops.

Linked Data - Dan Chudnov

http://wiki.code4lib.org/index.php/2012_Linkfest_Preconference Project ideas

Corey Harper: http://www.tagasauris.com/ does Crowdsourcing social network sites to tag photos. But how do you map "canonical" descriptions/metadata to that data…approaching "aboutness". Then geo-tag. Recruiting others (turk it) to use Google Refine to map the data.

Devon Smith (OCLC): Moving beyond MARC. Decoupling MARC and remodeling into Linked Data. Beginning to explore the model, which vocabularies, FRBR relationships.

Dan Chudnov: Socially networked repo for linked data.
Brainstormed on DChud's idea.

Blacklight

https://github.com/projectblacklight/blacklight_range_limit

4:05

You can see it on the cliobeta side bar.

4:05

It really took like 10 minutes at best to add it in.

4:13

http://sass-lang.com/

4:15

http://coffeescript.org/

4:16

http://susy.oddbird.net/

Installed Blacklight on my Mac and imported a tiny set of MARC records from Voyager.

Tuesday, February 7, 2012. Keynote.

Dan Chudnov, GWU

"ketone" :)

"I love you, but we blew it! We have turned away too many people."

Retraced his family and personal history noting that "things fall apart." Retraced history and current state of Code4Lib, warning that "if we don't change, this thing will fall out from under us". People will stop trying to get in next year to attend the conference if we don't make room for them.

Non-techies want to learn to do what we do.

We must "Hack or Die"!

PyCon is a good model. 2 days of pre-conf training at all levels. Post-conf sprint days.
"Chicago's ready, are you?"

Tuesday, February 7, 2012. Session 1.

Git and Mercurial

Using version control on metadata

Mentioned in IRC: richard anderson at Stanford recently release a comparison of version control systems for "repository objects"

Case 2: Zephir metadata management system for HathiTrust. "Individually, this has been a fantastic decision." Everything works great, but when you're dealing with 10mill files, it's wicked slow.

If it looks like code, even if it's data, it will probably work.

Linked-Data-Ready Software for Libraries, eXtensible Catalog

Setting stage for linked data: converting MARC data to FRBR, designed schema for mapping to triples, developed platform to create linked data.

Bulk conversion of existing metadata, sync data conversion to existing systems, allow libraries to do it themselves, provide a way to experiment with data, make linked data available to developers (find out what libraries need, what developers want)

NCIP toolkit has a Voyager driver? Should we use this with VuFind and also to provide linked data.

Future plans for tools: vocabularies, expert enrichment of metadata

Scholarly Practice Participatory Program

Your catalog in Linked Data. Document at achelo.us/talks/marc2rdf

"Just take some baby steps".

Perl script to convert macro to RDF triples, host a working SPARQL endpoint, and publish linked dataset to endpoint

HTML5 Microdata and Schema.org

HTML5 Semantics, but still don't tell us anything about our content. That's where HTML5 Microdata comes in.

Google found that RDFA was too complex for page editors, thus HTML5 Microdata.

Itemscope, Itemtype, Itemprop makes up entire schema. Everything is an item of type ItemType that has properties ItemProp.

Schema.org hosts schemas for ItemTypes

Catches: Only certain types trigger Google rich snippets, missing Types for many Objects, sparse properties for types.

Communities are just beginning to develop around defining ItemTypes, e.g. Marvel proposing Comics and Periodical Schemas.

Call to action: Try it out, report back to community

ALL TEH METADATAS! Declan Fleming

How we used RDF to manage digital assets.

You can't simply make up the triples -> METS as container model. MODS for descriptive metadata, PREMIS/MIX/XDRE for Admin/Rights/Technical metadata, METS for structural metadata, file locations map to storage

DAMS metadata workflow (also starting to use for research data, 5 pilot data collections)

Why RDF? Flexible. Easy to change our minds if we want. Triples expressed in any form given the proper stylesheet. Serialize RDF back to XML and onto disk.

HathiTrust Large Scale Search

Performance: phrase queries too slow. Solution: CommonGrams

For large-scale search, favor precision over recall

Reading positions index is performance hit

[Look at slides again later and read blog posts]

Relevance Ranking in the Scholarly Domain, Ex Libris on Primo

Being methodical

Evaluation Metrics: MAP, MRR

Kill the Search Button 2, Michael Nielsen

The Handheld Devices Are Coming

"No child in the pipeline"

Gestures for a mobile search app

Hands are a really good tool, but unfortunately current paradigm is based on glass => "we slide"

Devices don't provide clues or feedback for gestures

Alternative = Direct manipulation, gesture-driven, palpable, tactile, feedback

Mobile projects at State University in Denmark: mobile search apps: HTML5 app for searching.

App for barcode scanning to location mapping.

Large number of possible interactions with smartphones

Chose to go native and build iPhone app. Wait for better support for HTML5

Cool! Checkout demo video

There are no standard mobile gestures. They may be individual or may not be appropriate or all.

Design for Developers, Lisa Kurt, U of Nevada

Basics of graphic design

It's difficult for people to identify why they like something. Vocabulary: color, composition, typeface

Study the designs that you love and those that you hate.

Why do we care? Maintain credibility with your audience. "You don't wanna have a rainbow s*!#storm, basically." [best quote of the conference]

"Don't use clip art that looks like clip art."

Alignment, space, pleasing art, balance, restraint.

Design a little and then walk away.

The Golden Road, U of Santa Cruz

Grateful Dead Archive Online

Using Omeka

User contributions through Omeka, Archive data through ContentDM

Also using ARK identifiers from EZID.

[Collaborate with Scott on Omeka/Hydra. Slides?]

Passing METS record from Omeka to Merritt (digital object storage http://merritt.cdlib.org/). Possibly also use Fedora for backend storage.

All metadata and objects directly into ContentDM. Problems interacting with ContentDM. Resorted to running batch scripts against Web Admin client.

METS/MODS from ContentDM

Hydra Breakout

some people exploring replacing Fedora with Microservices underneath

Hydra adds access control and gated discovery. Each head provides customized functionality to solve a specific need

Blacklight is read-only discovery

at least 15 institutions actively creating Hydra heads and using Hydra

Notre Dame's Atrium project is designed to be used as just a Blacklight extension or with the full Hydra stack for collections mgmt.

IRC channel #projecthydra

Head for curating dataset is at top of priorities, but real-world solutions have fizzled.

Breakout reports

Building OSS communities. Building a Library a la Carte community. U of Oregon maintaining. Testing an Amazon cloud deployment. FOSS4Lib bringing together libraries and programmers for pitching and developing ideas.

Schema.org and Microdata. Interest in use for citation data, datasets.

Tuesday, February 7, 2012. Lightning Talks.

XTF.
Save MLAK. Code4LibJapan.
Created wiki for sharing information - actually worked well. Tshirts and bags for donations.
Vendors Suck. Andrew Nagy, Serials Solutions. Not really. Your library's problems aren't unique. Call your vendor and ask to talk with the project manager.
Heat Maps, not just for input analysis. Let's get grad students to teach instruction sessions. Identifying times of greatest need because grad students have limited time and have to travel.
ElasticSearch. Gabriel Farrell. All JSON. Clustering and sharding out of the box. All interaction is with http and JSON. JSON-based document store.
NISO wants to hear about your problems. What environment or conditions are needed for addressing problems for interoperability? Several working groups available.
Finding Images in Book Page Images. PicturePages, Eric Larson. Grabbing book page images with curl, run thru ImageMagick with some crazy processing.
Rock and Roll Hall of Fame Library and Archive using Blacklight against EAD/MARC.
Finding Movies with FRBR and Facets. Making it easier to find movies in libraries. People want to find movies, but libraries describe *publications*. Users don't care about a lot of the stuff we do, but they do care about versions (Blu-Ray, DVD, language). FRBRized records. only one hit per movie title, with multiple versions listed under it.
Web usability using terms. Boyhun Kim, UW Medical Library. Don't over-rely on context: "Images" -> "Medical Images". Terms like "mobile" can be interpreted very broadly by users: "mobile" = "off-campus access as well as mobile app". Sometimes there is no better term: "Interlibrary Loan". Brevity will cost you. [See slides]
Restriction Classes, B!#@hes. Simon Spero. OWL learning time. Commonly misunderstood aspect of OWL. Attempto-controlled English
Processing.js. Make visualizations, graphics work using web standards and without plugins. Other uses: learning Japanese

Get LAMP 9:00PM in Ballroom

Wednesday, February 8, 2012. Morning Session 1.

Digital Library User Behavior with Google Analytics, Kirk Hess, U of Illinois Urbana-Champaign

Can export data using API. [Could use to map collection handles to community handles]

Some British guy adds comment about calling Analytics events from your server. Interesting. I'll have to follow up on that.

Single Search Box, Corey Lown

73% of searches start from default tab. Other tabs are ignored - even though they used the other tabs during usability testing. Yup!

Worth watching again later with staff

"We pay a lot of attention to top search terms…. If we can get these searches working really well, we'll probably make a lot of people happy."

Single search box implies that you're confident it will work really well…hmmm

Tracking stats using a redirect script from search box

Building research applications with Mendeley, William Gunn

"Mendeley collection attention data about papers."

JISC DURA Project jisc-dura.blogspot.com. Researchers can use Mendeley desktop to dump their publications into a repository box.

This presentation could have been much better. Kind of hard to make sense of.

Give feedback on presentation

http://mnd.ly/c4l2012wg Belgian

Wednesday, February 8, 2012. Morning Session 2.

Stack View, http://github.com/harvard-lil/stackview

Recreating records into pixels…size, height, thickness.

Heatmapping…coloring book renderings based on data: popularity, Amazon rating, whatever datapoint you want

Renderings are just HTML+CSS

Can use various APIs to get book data (catalog, Amazon, whatever)

Push out Stack View JSON and consume

Github version is only books. Harvard's production version will have other representations.

Bib framework for digital age, Jeremy Nelson, Colorado College

LoC working group

NoSQL & Redis for bib records

BigTable and Hadoop

Redis key-value data structures are descriptive enough to model bib records using FRBR

[check out Twitter's Bootstrap for web and presentation design framework, and other stuff listed at the end]

[Watch this again later]

875,xxx+ records in Redis

Ask Anything

Dan Chudnov looking for someone to help with Rails on Umlaut

Data management plan support (related to Research Works Act) -- Declan's DMP pilot project at UC

Wednesday, February 8, 2012. Afternoon Session

Data Storage in the Browser. Jason Casden [Rewatch his browser data storage}

Lies, Damned Lies, and Lines of Code Per Day, Columbia University

Read "Making Software Work" book

How do you hire the right people?

[Maybe watch this again]

Agile at Stanford, Naomi Dushay

Sheets on Taskboard photo: Backlog, On Deck, In Progress, Done

One week a month is Dead Week = no standing meetings

Research Networks and Citation Analysis Breakout [Me]

BibApp, VIVO, Zotero, Mendeley, vendor products
Potential role in promotion and tenure? Providing services to departments, e.g. CV creation.
Different needs for different types of institutions and disciplines. Humanities may actually need a wider network than biomedical, science.
Role in open access publishing
Role as discovery interface for multiple content and repository types. Richer relationship graphs than most repository software.
Take advantage of Zotero, Mendeley, crowdsourcing, and linked data. Tom (U. of Oregon) brought up AltMetrics and possibility of relating works better to existing ontologies.

Wednesday, February 8, 2012. Lightning Talks

Zotero & SHERPA/RoMEO. Scott H. Filter a collection of articles by publisher policies. Zotero Plugin?
Including library resources in LMS. Basic LTI. Passing query through http post with parameters about it's LMS context (course details) to some library application.
FOSS4Lib, Peter Murray, Lyrasis
I've Got Good News, Mark Matienzo. fiwalk progress for extracting archival metadata from digital media.
ArchivesOnline at Indiana. Digitization workflow.
Mashing up OPACs in Japan. Screen scraped *all* OPACs in Japan. Built web service for all 5000 libraries, 2000 OPACs.
Make broadcast TV, radio available online, Denmark. Built repository, search, services
Macaw. Joel, Smithsonian. Metadata collection tool for book-like things. On Google Code (php)
LOD-LAM Incubator. Rachel, DLF. Practical application of linked open data in libraries, archives, and museums. Funding: Planning and Startup Grants [BibApp+Zotero/Mendeley funding?]. Kickstarter-type web site for projects.
Project Shizuku. [See presentation from C4L11]. Making friends in libraries. Supporting encounters among library users

Thursday, February 9, 2012. Keynote

Bethany (@nowviskie), ScholarsLab, UVa

Lazy Consensus, How Impatient People Can Change the World

How to move world forces, even against their will. Wielding lazy consensus because it's already being wielded against you.

Lazy consensus = "Yes" becomes the default. Saying "If you can't be bothered to speak a timely "yes" or "no" then you probably don't care enough about this matter to formulate an opinion."

It may be a socially contract, but it's practically a natural law.

[Watch Dark City movie]

[Watch this with staff again later]

Thursday, February 9, 2012. Lightning Talks

Title. David Uspal, Villanova. Projects: Interactive Map, Tap Campus Tour, URL Manager
Adding Hathi Trust records to your Solr-based index (Blacklight, VuFind). Robert, UVa
DJango-based Discovery Layer
Turbo MARC in YAZ. Dennis Schafroth, Index Data [Watch this again]

[Left early. Watch missed talks]

Archie R. Dykes Library Blog

Monday, February 13, 2012

Notes from Code4Lib 2012

LINKS