Post-NLP class notes

This morning I took a 4-hour course through O'Reilly about natural language processing with Bruno Gonçalves: Natural Language Processing (NLP) for Everyone. (ramshackle notes)

(Side note: I've tried several O'Reilly events this year, for various business and software topics, and they've all been good. We get access free through my company, and it's safe to say that O'Reilly access is my favorite company perk.)

NLP is one of the things that I've been wanting to dig into for years, so I signed up for this one. (In short: NLP is machine learning applied to comparing, categorizing, and creating human language text.) It's just a casual superpower that I'd like to know more about and make some tools for my own use.

My main interest in it is language translation. When I'm trying to find something to read in Chinese, whether for gathering words for or to read for fun or information, I have a hard time discerning what articles are actually good and which are not, which authors are good, etc. I'm not good enough at the language to tell the difference. Much of the time I spend with new (to me) Chinese text is simply trying to break down the article into the smallest parts that I can understand—often individual words or characters—and clumsily building up from there. If I could have some friends give me examples of good ("good") things that they've read, then I could use those as a body of good examples against which I could compare other articles to read. It's not as good as making the decision myself based on understanding, but who knows when I'll have the skill to do that.

Another idea I've had in the back of my mind is being able to create a service where I could find recipes that match whatever food I have on hand at home. It seems like I should be able to crawl some group of websites with recipes, build a library of recipes, and be able to input a list of ingredients and find the best matches. ("Building", not "sharing", the library of recipes—just use it for the analysis, and pass the user on to the real thing on the real site, not steal the content.)

One final idea, related to work: finding information at work is a huge headache. On one hand, we have an internal library and an internal search engine. The library works best if you can search on title or author or keyword, and I don't know what the search engine is doing, but it's not very useful for me. That's the formal information. The informal information is the most relevant for day-to-day work—emails, meeting notes (people should do this more), files on the server, information in databases, Jira, Confluence, on and on and on. This sort of information finds its way into hidden corners and folders, effectively lost forever, and there simply isn't time to open it all and make sense of it if you do find it. But what if you could scrape that information, categorize it, and make it relevant to use? That would be a superpower—to be able to find anything and everything that a project created, not just the things that are collectively remembered. (I would like to hold up Evernote's context feature as an example, which showed six notes most related to the current open note at the bottom, but they've removed that feature in their newest versions... bring it back, bring it back...)

There's an advanced version of the class coming up on 27 January that I've signed up for: NLP with Deep Learning for Everyone.

Leave a Reply

Your email address will not be published. Required fields are marked *