Adventures in Data - An Update

Getting my bearings on the data road

Source: Photo by Sharefaith from Pexels, CC0

I wanted to give an update on my data journey. I spent a lot of time in the past three months continuing the Google Data Analytics Professional Certification course. I am now on Course 5 out of 8, with the 8th course being the capstone portfolio project. And the 7th course will be on the programming language R and RStudio, which I am excited for. I never had a good excuse to fully dive into R, since I knew how...

New Adventures in Data

Setting off into the evil data forest Source: Photo by Carolina Roepers from Pexels, CC0

It’s been an interesting few months for me. I have been close to securing some academic librarian jobs, but can’t quite make it over the hump. Although the aggregate job market is currently very short on labor, that hasn’t seemed to filter into libraries. I think last year they went into downsize / hiring freeze mode and might not ever swing back the other way. I’ve also been headhunted by some firms, and chuckled when they apologized that the salary would only...

COVID and Keeping Busy

A lot has happened in the past year of my life. As my last post detailed, I was not promoted in my position and I worked the final 6 months of my job from home due to COVID. I also started working a second job as a part-time instructor for Syracuse University, teaching an Introduction to Metadata course. Amid the craziness of the 2020 Summer, I also uprooted my family from San Diego and moved back to Syracuse. While living in this multi-generational house, my wife’s grandmother declined in her health, dealing with dementia and cancer, and finally passed away...

Academic Libraries' Toxic Leadership

Wow. It’s been a while. I haven’t written a blog post in years. Also, this post has been in a “should I post this or not?” purgatory for half of a year. But the #lismentalhealth hashtag and some particularly personal and devastating accounts encouraged me to post this.

A big reason for my blog absence has been the turmoil in my professional life. I have in these past 3 years experienced bullying, intimidation, sabotage, and have been forced to work in a toxic environment. I have gone to my superiors hoping for mediation, sought out mentors, worked...

Metadata Experimentation in Jupyter Notebooks

Below is just a quick copy and paste from a jupyter notebook I made last week. It is experimenting with the rdflib python library to look at parsing and then serializing RDF into json-ld.

I really think that stuff like this should be done more in libraries. Not just because it can lead to rapid experimentation and thus new ideas, but because it would immediately let us help students with their notebooks. I know of a few universities that already are on top of this, but this could become a part of the Library Carpentry and other initiatives curricula.

...

Geospatial Metadata Reconciliation

At UC San Diego, we are currently in the midst of our subject reconciliation project for the DAMS. Nearly all of our over 12,000 local subject authorities (topics, people/corporations, places, species, etc.) will be adding linked data URIs. I’ve outlined the process in previous posts, but it generally follows the process of:

Clean the existing labels via some simple regular expressions (Note: Rawson and Muñoz have a wonderful post about how we should be more precise about the word ‘cleaning’ with regard to data)
Reconcile to FAST via an OpenRefine reconciliation service (which will get us...

Beef With BIBFRAME

Now that BIBFRAME 2.0 has slowly been rolling out, it feels as if we are moving from the alpha phase to the beta phase. And in that regard, we can begin to critique BIBFRAME and have it not seem unfair. Believe me, it was hard to hold back over the many, many glaring problems with BIBFRAME 1.0. As I’ll go over in this post, although 2.0 makes a lot of steps in the right direction, I still think that BIBFRAME has a FRBR-sized hole in it, and is driving a lot of the remaining problems that could keep BIBFRAME...

Why Use the Command Line in Libraries?

tldr: Navigating on the command line and using common CLI (Command Line Interface) utilities can save tremendous amounts of time, and can introduce less human error into certain processes. You’re also almost guaranteed to learn things about computer science, scripting, and gain experience in OSX/Linux environments. And lastly, sometimes it is the only way, as some programs or abilities simply do not have a GUI (Graphical User Interface).

Looking at a recent metadata job posting, it included the following:

and [the librarian] is expected to write scripts (e.g. Python, Perl) for repurposing existing metadata

It’s unavoidable that scripting...

Tilting at Windmills of Linked Data

Source: By Internet Archive Book Images [No restrictions], via Wikimedia Commons

“What giants?” Asked Sancho Panza.
“The ones you can see over there,” answered his master, “with the huge arms, some of which are very nearly two leagues long.”
“Now look, your grace,” said Sancho, “what you see over there aren’t giants, but windmills, and what seems to be arms are just their sails, that go around in the wind and turn the millstone.”
“Obviously,” replied Don Quixote, “you don’t know much about adventures.”

...

Reconciling Metadata to FAST Linked Data

Last year, our unit decided to tackle an issue we’ve been meaning to address for a while. That is: our subjects are bad. What do I mean by ‘bad’?

First of all, our subjects are LCSH strings. As anyone versed in data entry or metadata knows, typed strings are error-prone. This resulted in a ‘Subject Browse’ of our repository that was ugly. Multiple subjects with typos, subjects that varied only slightly, and straight up messes: it was clear browsing by subjects was not doable. And of course, these problems still affected a normal object view.

While having something like...

Fallout 4 - The Apex of the Open World RPG

Whenever a massively hyped game like Fallout 4 comes out, the jaded gamer can forget that there are a ton of people who have never played a Fallout game. Further, there are many who have only played Fallout 3, which is the ‘modern’ Fallout game. Everything from The lore, what is ‘normal’, and the legacy of past game systems are taken for granted by series veterans.

So before I explain why this game represents the best that Open World RPGs can offer, I’ll respond to the most prevalent criticisms of Fallout 4:

There’s no real tutorial

This is true....

An OpenRefine Tutorial, Part 3

A Quick Analysis of Intentions

This part of the tutorial is all about the scripting and more complex aspects of OpenRefine, but since this aspect is a potential rabbit hole, we should figure out what it is we want to ‘do’. In general, I want to avoid throwing everything advanced into the scary ‘programming’ box because this isn’t really programming, simply determining which functions of the tool (and its extensions) to harness.

If you want to simply continue with the data cleanup aspects, there will be some scripting-like things (specifically using GREL) that will come in handy that...

An OpenRefine Tutorial, Part 2

Facets are the main method of using OpenRefine on data. They allow you to take temporary slices of the data, and from there you can perform a variety of actions on that subset of data, all while being able to see that slice visually.

Facets are basically ‘views’ of the data that you can isolate, and as you close facets, you will return to the original overview of the data. Even if you do make changes at this point, changes have a clear trail, so that you can revert to previous states of the data.

An OpenRefine Tutorial, Part 1

Introduction to OpenRefine

There are a lot of ways to clean tabular data files. Excel has formulas, Find & Replace, and other useful features. Programming scripts, utilities like csvkit, and code libraries can accomplish similar tasks with text manipulation using regular expressions, and have the additional advantage of iterative looping and conditional statements (if/then, for/each, etc.). But if you want to combine powerful scripting functions with a visual interface for data cleanup, as well as the ability to reconcile metadata to Linked Data sources, OpenRefine is an excellent tool. It was previously known as GoogleRefine, but Google has since...

Software Tools That I Use

Whenever I talk to non-MARC metadata people outside of my library, invariably we come to a point where we get specific about what software we use. Well, let me back up: first we might complain about Windows and how much better Mac OSX and Linux are, but then we move on to software. I realized that there’s so many choices and personal decisions involved that I would just post a list of what I use and maybe that will help someone, or at least pique their curiosity.

For reference, I am running Linux Mint Rebecca, which affects some of the...

Resurrecting Game Worlds

For a few years now, I’ve taken an interest in video game preservation. Despite the growing number of video game archives and increasing awareness amongst gamers themselves, it’s one of those areas in digital preservation that hasn’t really developed or matured yet. I think most of the problem derives from the nature of video games: interactivity. That is, there are very real differences between simple storage, archiving, and providing ‘access’ to a game. This ‘access’ is not just viewing, too: it is fully expected that a game will be provided to play, as the Internet Archive’s Internet Arcade does....

Adding Structured Data With APIs, Reconciliation, and Entity Recognition

Recently at work I was tasked with presenting OpenRefine to my unit. Since it’s such a complicated, dense piece of software, I had to prioritize what I would present. I wanted to present aspects that would benefit our unit right away. The most obvious benefits are the many data cleanup features.

When it came to the structured data and enrichment features, I decided to focus on what we would most likely use the most often: reconciliation and named-entity extraction.

These two processes are part of what have been termed the “low hanging fruits of Linked...

Programming Stuff I'm Working On

In my perpetual quest to start actually programming, I am starting to feel like I’m making progress. In the Fall, I completed yet another Python course, but this time I felt like the material sunk in, and the things I were taught are applicable to things I encounter at work. Like manipulating text files, regular expressions, and using Python libraries in… well, libraries. Using code libraries and working with APIs are probably the most important thing I can learn at this point in my career. I rarely would be expected to make a full program, and if I did it...

Libraries and Surveillance

Source: Image CC0 Wikimedia Commons

This post is inspired by Kade Krockford of Privacy SOS and Alison Macrina of Library Freedom Project. They’re doing good work, so support them and show up whenever they host stuff!

For a long time, I have been deeply interested in how libraries can counteract, or at the very least, educate people on how they are being spied upon. This of course flared up in a big way after the Snowden and PRISM revelations. The latest news seems...

My Favorite Games of 2014

A lot of the gaming press is lamenting 2014, calling it a slow, disappointing year. I don’t agree, but maybe that’s because I had gaming years during N64, SegaCD, and 3DO eras. Those were lean times, my friend.

The PC continues to be a magical platform for gaming. While Steam is still the storefront of choice to buy digital PC games, it no longer has a virtual chokehold on the PC game market. EA’s Origin is a disaster, but it is required to play any Electronic Arts PC games. More positively, Good Old Games has gained momentum this year...

Toward a Metadata Pipeline

The metadata version of this {: .center} Source: Image CC0 Pixabay {: .center}

Now that I’ve been in my first librarian gig for almost 5 months (whoa), I’ve got a solid grasp on how things are done with regards to our digital repository and how metadata gets “made”. I also of course bring baggage with me, both good and bad, from library school and elsewhere. This baggage contains thoughts, theories, dreams, ambitions of where I’ve been and where I want the metadata to go. Where the present of “how we do this now” and the baggage...

Quality Assurance in Metadata

I know, I know: quality assurance isn’t a sexy topic. Neither is metadata. When the two meet, it is an unholy alliance of boring. But metadata work is heavily reliant on QA. The best data model, the best workflows, the perfect Linked Data strategy all fail if a 1 or 0 or non-escaped character is where it shouldn’t be.

Rather by chance, I have Quality Assurance experience. I was a video game tester with the publisher THQ (R.I.P.) for just over a year in 2006. No, Grandma’s Boy was not an accurate portrayal of my existence, but it was...

My Move from Drupal to GitHub Pages

This post will try to walk through the steps I went through to move from a hosted Drupal 7 blog to this current GitHub Pages blog. It took me about 3 days to complete, but would take almost no time if you were a Linux user with a solid grasp of GitHub. The bonus is that, besides the ease of use GH Pages provides over Drupal (or other CMSes like WordPress), it taught me a lot of the basics of GitHub. Which is nice.

So far, GitHub Pages has a lot going for it. There’s no database to worry about...

Support the Ada Initiative

When I first applied for a job at UC San Diego, I was asked to complete a Diversity Statement. It was somewhat awkward to write, because I had no experience designing inclusive programs. I was greatly inspired by Black Girls Code and The Ada Initiative, but my neck of the woods is metadata, not coding.

I would love to do something similar, except designed for libraries and metadata. Because, despite librarianship being overwhelmingly female, it is very white, and the more technical the position, the more likely it is occupied by a man. I’d like to change that.

...

Welcome

So, after spending a lot of time tweaking Jekyll and GitHub Pages to get working on Windows (hint: it sucks!), I decided to just fork an existing Jekyll installation that was Windows-friendly and was themed. I also tried my hand at manually creating HTML, which was nice, but as always, the CSS part bored me to death. I will get you one day, CSS.

I’m trying to figure out if it’s even worth it to migrate things from my excisting Drupal blog. Probably not.

See you!

There’s no real tutorial

A Quick Analysis of Intentions

Facets

Your First...

Introduction to OpenRefine