seventeen things tagged “today i learned”

Stop Words

In computing, stop words are words which are filtered out before or after processing of natural language data (text). Though “stop words” usually refers to the most common words in a language, there is no single universal list of stop words used by all natural language processing tools, and indeed not all tools even use such a list. Some tools specifically avoid removing these stop words to support phrase search.

Any group of words can be chosen as the stop words for a given purpose. For some search engines, these are some of the most common, short function words, such as the, is, at, which, and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as “The Who”, “The The”, or “Take That”. Other search engines remove some of the most common words—including lexical words, such as “want”—from a query in order to improve performance.

Wikipedia

I was hitting Algolia’s search limits had to remove words I didn’t care about searching like “and”, “only”, “there”, or “I’ve” in an attempt to shrink the size of the posts on this site in the search index. There are quite a few lists on the internet and I ended up using a few of them for significant (> 65% average) size reductions in the search corpora.

The Scunthorpe Problem

The Scunthorpe problem (or the Clbuttic Mistake) is the unintentional blocking of websites, e-mails, forum posts or search results by a spam filter or search engine because their text contains a string of letters that appear to have an obscene or otherwise unacceptable meaning.

Wikipedia

Examples would be: shitake mushrooms, Herman I. Libshitz, magna cum laude, Arun Dikshit.

Corpsing

Corpsing is British theatrical slang for unintentionally laughing during a non-humorous performance or when a role in a humorous performance is intended to be played “straight”. In North American TV and film, this is considered a variation of breaking character or simply “breaking”.

Wikipedia

Here’s some further examination by Ricky Gervais and crew. Features Sir Ian McKellen and Daniel Radcliffe.

Schrodinger’s Douchebag

One who makes douchebag statements, particularly sexist, racist or otherwise bigoted ones, then decides whether they were “just joking” or dead serious based on whether other people in the group approve or not.

Urban Dictionary

They’re always “just joking.” About pandemic response, about requesting foreign interference in their country’s elections, injecting disinfectants to treat disease, asking for more police brutality, mocking the disabled, treason, dangling pardons like a mob boss, asking foreign governments to investigate political opponents, calling onself “The Chosen One”, calling a former president the founder of a terrorist organization, or condoning violence against journalists. Just look at your face, bro 😆

And then there’s Schrodinger’s Asshole:

A person who decides whether or not they’re full of shit by the reactions of those around them.

Via Mark.

Tampography

Using a Bloopy Thing to print on all sorts of materials is called “Pad Printing” or tampography. Here’s a Big Bloopy Thing printing a very beautiful pattern onto a bowl (and here’s two of them going at the same time.)

You can use the same technique on all sorts of things: plastic bins, pens, keychains, golf balls, pills. The silicone/bloop conforms to the shape of the object to print on rather easily.

Quality Logo Products has a quick explanation:

Resolution versus Magnification

Was looking for a portable iPhone microscope and came by this one on Amazon. Doubted the “50x-1000x magnification” claim and landed on this video by Oliver Kim (here’s his channel on YouTube) on how to make sense of that feature.

The key idea here is to think of a microscope as a device that resolves hitherto unseen things. It’s simply not just a magnifier of small things our eyes cannot resolve. This is the difference between the 2x optical zoom on your iPhone versus the 10x you can slide it up to.

So I don’t doubt that the product on Amazon can do 50x, which is fine for my needs. I just don’t believe that the upper bound (1000x) can provide anything useful.

Optotypes

Optician Sans is a free optotype 1 based on the Sloan letters2.

  1. I had no idea this was a thing. There’s a lot to learn here about the eye charts I see once a year. The earliest chart appears to go all the way back to 1862 (!) Those little "C"s are called Landolt C. The one most optometrists use these days is called a LogMAR Chart which measures visual acuity as a log function of the smallest visual angle your eyes can resolve. ↩︎

  2. Which you can download here with “noncommercial research” in mind. ↩︎