Monday, November 7, 2011

Bare Singulars and Bare Plurals

Since my final paper for our Advanced Semantics Seminar on Plurality, generics and bare singulars (including incorporated nouns) and bare plurals have been near and dear to my heart.

Naturally a post on generic comparisons on the Language Log quickly got my attention. Liberman argues using generic plurals toys with the gap between statistically significant generalizations and the grammatical genericity/generalizations
 that the results are presented in a way that misleads the public — and in some cases, the use of generic plurals seems to mislead the scientists themselves.
He sites a number of examples from by Sarah-Jane Leslie  about "Generics and Generalization"
 "Ticks carry Lyme Disease", although only a minority of ticks do so (14% in one study). 
"Mosquitoes carry West Nile Virus", though the highest infection rate found in the epicenter of a recent epidemic was estimated at 3.55 per thousand (and the rate was essentially zero outside of the epicenter).
"Ducks lay eggs" and "Lions have manes", though in each case the prevalence is at most 50%. (Female Ducks lay eggs, and only Male Lions have manes). 

Sunday, November 6, 2011

Fighting the Unicode Fight

Almost anytime I have to build a new corpus the Unicode Fight returns. I lived many Unicode Fight free years when Linux became 100% Unicode, but now I'm using Mac OSX.

The default file.encoding for Mac is MacRoman. I've tried a whole variety of Googling to find the keywords to find out the proper way (using the System Preferences) to set the default to UTF-8 to no avail. I really hate Google's new (~6mos ago) search algorithm that tries to guess what we mean to ask, and doesn't include all keywords we query. It makes it near impossible to find anything long-tail-ish.

This is when it started working in Java/Groovy:

  • created a file /etc/launchd.conf and put this into it:

 setenv JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
For general purposes:
  • added this to my ~/.vimrc
set encoding=utf-8
set fileencoding=utf-8
  • added this to my /etc/bashrc
export LC_CTYPE=en_CA.UTF-8
export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
  • Changed my Terminal > Preferences > Encodings to only UTF-8, and Terminal > Preferences > Settings > Advanced > International to UTF-8 (I also put the Font to Menlo)
  • For good measure I changed all my text input to Inuktitut (that ought to force Unicode for good :)

I'm pretty sure this will have dastardly side-effects for any of my Java programs (I'm most curious about Eclipse and GATE)... we'll see.


Now groovy picks up the flag and sets the encoding to UTF-8
$ groovy -e "println System.properties.'file.encoding'"Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
UTF-8
And Inuktitut now prints out on in the GroovyConsole, instead of ????? My groovy code which contains Inuktitut isn't getting saved as MacRoman  replacing all Inuktitut with ? anymore by GroovyConsole, ah... finally.

Tuesday, November 1, 2011

Bootstrapping Android Best Practices

Today I'll be giving a talk at Android Montreal:

Bootstrapping Android Best Practices



Like human language, programming languages are
  • One part syntax, 
  • One part vocabulary, and 
  • One part culture/socio-linguistics. 
Too often when learning a new language we focus on syntax and vocabulary, but not enough on culture/best practices. Sure in our courses we might learn that the French like wine and baguettes, and wear berets, but on the ground its not really that simple (n'est pas?). In this tutorial we "immerse" ourselves in the culture of two projects to simultaneously learn syntax, vocab and best practices for getting things done in Android Development.

We have selected a few repositories, 2 which show best practices, and 2 pairs of pidgins vs best practices which show not fully formed Android development.


Beginner-Friendly Best Practice Learning Grounds

MyTracks

Replica Island

Pidgins vs. Best Practices Pairs

Two page-curl repos

Three Blogger clients

Google IO Sched 2011
  • Advanced: using fragments for phones and tablets (warning: this creates many layers of abstraction so its hard to navigate as a beginner) 
If you want to use MyTracks you will need to follow their steps to set up your Eclipse environment, and update your AVD manager.