this.life(): 2017

Tuesday, September 19, 2017

Stubs, Mocks, and Spies in Rspec

There's a number of different terms we use in testing to describe how we're setting up the tests. Are we stubbing? Mocking? Spying? All of the above? What do they even mean?

1. Stubs

These are just canned responses. You stub out methods so that when they're called, they just return something (anything you want). If they're not called, nothing happens.

Rspec stubbing: https://relishapp.com/rspec/rspec-mocks/v/2-14/docs/method-stubs

Example

allow(obj).to receive(:message).and_return(:value)

You can do this either both real objects and doubles.

2. Spies

These are objects that you set up with canned responses (like stubs) that also record information about the calls made to that method (was it called? what was it called with? How many times?).

Rspec

Example

allow(Invitation).to receive(:deliver)

invitation = spy('invitation')

Followed by the assertion

expect(invitation).to have_received(:deliver)

3. Mocks

These are objects that have already been instrumented with expectations. They're like spies in that actions are recorded but they also go a step further to automatically verify the behavior when the test is exercised.

Example

logger = double("logger")

account = Account.new logger

expect(logger).to receive(:account_closed)

account.close

State verification vs Behavior verification

Verification is the act of verifying that something occurred in some manner. In testing, we rely on either state verification or behavior verification to ensure that objects are behaving in the way we expect them to.

In the testing world, state verification is verifying the test through the state of an object. What's the current value of variable X? In other words, what's the current status of this object? Does it match up with expectations?

Behavior verification takes a different approach. Behavior verification is not concerned with what state the object is in - it's only concerned with the fact that certain actions occurred in some manner (regardless of how they changed the objects state).

According to Martin Fowler, only mocks insist on behavior verification. In other words, unlike stubs and spies, only mocks are preprogrammed to perform behavior verification. With mocks, you don't really have a choice to do state verification.

Sunday, September 10, 2017

Spinning up a rails app using Heroku in < 5 min

It's really amazing how you can have an app running in production in < 5 minutes using Heroku. Below are the minimal set of required steps to serving a basic static page to Heroku.

Pre-requisites

Heroku Account
Heroku CLI
Rails CLI

Create and and commit the app

rails new app-0 -d postgresql && cd app-0

rails generate controller welcome

touch app/views/welcome/index.html.erb && echo 'Hello World' > app/views/welcome/index.html.erb

Add root 'welcome#index' to the config/routes.rb file

git add . && git commit -m 'first commit'

Deploy app to Heroku

heroku login

heroku create

git push heroku master

heroku ps:scale web=1

heroku open

Topics:

GIT
PostgreSQL
Rails

Routing
Code generation

Heroku

create
ps
open
Default Rails Server: WEBrick

Single threaded server vs multithreaded server

Sunday, August 20, 2017

Rspec doubles - normal double, instance double, class doublea

Test doubles are any object that are suppose to stand in and represent real objects during testing. Rspec offers three types of doubles - ordinary doubles(just double), instance doubles, and class doubles.

All doubles are strict by default. What that means is that if any un-allowed or unexpected methods are invoked, the test will fail. For example:

x = double()
a.bar // error because `bar` was not allowed.

However, you can make any double loose by appending `as_null_object`.

Now lets look at differences.

Ordinary doubles

x = double()

These doubles are super barebones. You can allow any messages on these methods.

Instance doubles

x = instance_double('ClassName')

These doubles are aware of the instance methods of class 'ClassName' - you can only allow messages that are defined.

Class doubles

x = class_double('ClassName')

These doubles are aware of the class / module methods of any class or module named 'ClassName'. Just like instance doubles, only defined messages are allowable.

In short, instance and class doubles go a step farther than verifying the state of an object (was 'X' called?). They also verify behavior (is 'X' a thing that this object does?).

When do you use one over the other?

I see ordinary doubles as good for creating dummy objects during a test. For example, if you need to fulfill a parameter requirement for a method where you know the object isn't being used. However, you generally want to use instance and class doubles for the stricter check.

Saturday, August 12, 2017

New Smoothie I'm trying this week

So the avocado coconut smoothie I mentioned last week is more complicated to make than I prefer.

This week I'm going to give this a shot: https://iquitsugar.com/recipe/liver-detox-smoothie/

Ingredients

small green apple, diced and frozen.
1 zucchini, diced and frozen.
1/4 avocado, diced and frozen.
1 cup mixed greens like broccoli florets, watercress, beetroot greens, silverbeet, kale or spinach.
1/4 cup coriander or parsley (or both!).
1 teaspoon chia seeds.
1/4 teaspoon turmeric, ground.
1/2 lemon, juiced.
2 cups coconut water.

My Grocery List

1 - 2 green apples
Zucchini
Avocados
broccoli florets
Almond Milk

Reflections on Sublime so far - quotes and projects

I switched over to using Sublime from Vim this past year and it's been an absolute joy to use. In fact, I liked it so much I even made a cheatsheet for others to get the most out of this editor. However, there's were still a couple of actions that have been a source of frustration: changing quote types and navigation across multiple projects. In this post I'll discuss what they are, the solution I've adopted to address them, and how I still felt about the solutions after a week of use.

Problem 1: Quotes

Replacing single quotes with double quotes or double quotes with single quotes using multiple selection. If you select both quotes, typing in a single quote will simple quote the double quotes themselves.

For example:

"hello world" becomes '"'hello world'"' when what I really want is 'hello world'. This isn't a problem if I'm trying to actually quote selections, but very rarely do I (or anyone) want to be quoting the quotation marks.

Solution

The most promising solution I found to this problem is the ToggleQuotes sublime plugin. I just installed it and it works pretty well. I'm considering adding tests to this and adding support for quotes around multi-line strings.

Problem 2: Project navigation

When I have sublime opened for > 1 projects, it's difficult to go from one to another. Right now I just use the basic mac application switcher and it works fine for two projects but once I have > 3 applications open it becomes a nightmare.

Solution

Turns out Sublime has a built in projects feature to deal with this issue. I just learned that every window you open is either a named project or anonymous project (if you open sublime in any directory not associated with a project). You can define project specific sublime settings and switch between them quickly using sublimes Project feature. You can also add folders from other projects into the current project.

Update after a week of using ToggleQuotes and sublime projects features

The project feature for sublime is absolutely indispensable. I've been able to switch from one project to another seamlessly and this has been a tremendous boost for my workflow. I will also say that I continue to advocate for keeping the project files in the same place - no need to pollute your repos if you don't have to.
Toggle quotes is awesome. Thank you @spadgos.

Saturday, July 29, 2017

Smoothie breakfast diet

I'm currently experimenting with a low-sugar, nutrient-dense breakfast smoothie diet that's easy to make. This is a list of three smoothies that I will be having throughout the week along with a grocery list of ingredients you'll need to buy to make them.

Banana Peanut butter

http://www.youmustlovefood.com/banana-peanut-butter-cinnamon-smoothie/

http://allrecipes.com/recipe/221261/peanut-butter-banana-smoothie/A

Peanut butter

has protein as well as potassium — which lowers the risk of high blood pressure, stroke and heart disease. It also contains fiber for your bowel health, healthy fats, magnesium to fortify your bones and muscles, Vitamin E and antioxidants.

Bananas

They contain several essential nutrients, and have benefits for digestion, heart health and weight loss.
Bananas are among the most popular fruits on earth.
Bananas contain a fair amount of fiber, as well as several antioxidants.

Grocery List

banana 2x
milk (2 cups)
peanutbutter (1/2 cup)
Cinnamon powder
Cacao powder

https://iquitsugar.com/raw-cacao-vs-cocoa-whats-the-difference/

Apple Strawberry

http://gimmedelicious.com/2015/07/16/apple-strawberry-smoothie/

Apples are

extremely rich in important antioxidants, flavanoids, and dietary fiber. The phytonutrients and antioxidants in apples may help reduce the risk of developing cancer, hypertension, diabetes, and heart disease. This article provides a nutritional profile of the fruit and its possible health benefits.

Strawberries are no exception to this rule;

in addition to antioxidants, they have many other nutrients, vitamins, and minerals that contribute to overall health. These include folate, potassium, manganese, dietary fiber, and magnesium. It is also extremely high in vitamin C!

Groceries

Apple (1x)
Frozen strawberries (1 cup)
Milk (1/2 cup)
Strawberries / banana (1 cup)

Avocado Coconut

https://iquitsugar.com/recipe/avocado-coconut-dreamboat-smoothie/

Avocados are

Health benefits and nutritional information. Also known as an alligator pear or butter fruit, the versatile avocado is the only fruit that provides a substantial amount of healthy monounsaturated fatty acids (MUFA). Avocados are a naturally nutrient-dense food and contain nearly 20 vitamins and minerals.

Coconut oil

is high in natural saturated fats. Saturated fats not only increase the healthy cholesterol (known as HDL cholesterol) in your body, but also help convert the LDL “bad” cholesterol into good cholesterols. By Increasing the HDL in the body, it helps promote heart health and lower the risk of heart disease.

Groceries

Coconut oil (2 tablespoons)
Blueberries (1 / 2 cup)
Banana (frozen) ( 1)
Avocado (1)
Coconut cream (500 ml)

Toppings

Gogi berries

https://www.downtoearth.org/health/vitamins-supplements/ways-to-use-goji-berries

Chia seeds
Cacao nibs
Mulberries

Final grocery list

2 Apples
4-5 Bananas
Cacao powder
Coconut oil
Coconut cream (2 cups)
1 Avocado
1 container Blueberries
Strawberries
Frozen strawberries
Almond milk

Tuesday, May 16, 2017

How to run Cinemania 96 on Windows 10

Install Virtualbox
Create a windows 7 VM using free IE8 windows 7 virtual machine from microsoft. Create it with at least 1GB of RAM so it's no super slow.
Download virtualbox guest addition and extension pack
Add the extension pack to VB
Start windows 7 VM and install guest additions. Reboot.
Connect CD to USB port
Add a USB device in the VM
Copy files over to the hardisk
Run cinemania.exe

Sunday, April 30, 2017

Career Development Plan

In our last post, we established the key skills that make up the value of an engineer working in an organization. However, knowing the generalities is not very actionable.

In this post, we'll delve into the key sub-skills that make up the top level skills as well as accompanying specific actionable tasks that you can start implementing immediately increase your skillfulness in that area and thereby increase your overall value as an engineer.

Technical Expertise (clean and efficient code, language mastery, editor mastery, etc)

Core CS Fundamentals

Algorithms and Data structures

System Design

Learn Programming Paradigms

Object Oriented
Functional

Programming Language

Learn Ruby

Text editor

Learn Sublime Editor (for host machine development tasks)
Learn Vi / Vim / Emacs (for general editing tasks)

Shell
Testing
Debugging
Source Control

Learn GIT

Qualities (leadership ability, general problem solving ability, communication ability, maturity level, etc)

Communication

Writing and Speaking

Analytical / Problem Solving Ability
Maturity

Growth Mindset
Honors commitments
Seeks and Integrates Feedback
Treats others with respect
Actively Identifies problems
Self awareness
Proactive about future / career

Execution (planning ability, ability to get shit done, hitting goals)

Estimation
Project Planning / XP
Time Management
Getting Unblocked
Simplest thing that could possibly work

Scope (area of impact, sub-component, component level, sub-system, system, business unit value)

As your skill sets grows (technical, quality, and execution), so will your scope. If it doesn't, then you need to seek out more responsibility yourself.

Dedicate entire branches of improvement for each area and track your progress overtime. Identify sticking points (tasks that are on B) and master them (move them to C).

Saturday, April 29, 2017

Framework for Assessing value of Software Engineers

How does the industry differentiate between a junior software engineer from a senior software engineer? What are the set of attributes we use to assess the value of an engineer so that they're compensated appropriately?

These are important questions with answers that seem to vary wildly, but having some method of approaching these questions are crucial for arriving at a mutual agreement between the hiring manager and the engineering employee on the value that the engineer brings to the organization. That shared understanding is really important for initial salary negotiations and for maintaining a amicable business relationship overtime.

As an engineer, even if the companies you're applying for jobs for or are currently working at have no structure for compensation, it's still in your best interest to know what your metrics are for measuring your value so you can improve more effectively. For example, if "leadership quality" is a valuable attribute in your system, then you can very intentionally seek out ways to acquire that trait.

But what does this value structure for engineers look like? What are the metrics?

It's easy to come up with a list of things that you think are most important, but they're not helpful if they don't align with what a company or the industry as a whole deems most important. You can't approach companies asking to be paid a million dollars a year just because you can recite methods in the C++ standard library. Impressive? Maybe. Valuable? Not necessarily.

One way to approach this question is to simply ask hiring managers at top companies. Luckily, many great engineering teams have shared their views on this publicly. After looking at the engineering ladders / compensation structures for these companies and identifying commonalities, I arrived at this general structure for engineers:

Technical Expertise (clean and efficient code, language mastery, editor mastery, etc)
Qualities (leadership ability, communication ability, maturity level, etc)
Execution (planning ability, ability to get shit done, hitting goals)
Scope (area of impact, sub-component, component level, sub-system, system, business unit value)

Some companies also mentioned "experience" and "public artifacts" such as Github projects. However, I left those out because those only serve as signals for the traits we're looking for. When you look at someones experience, you're trying to get a sense of where they're at in terms of things like leadership ability or technical expertise.

The specifics of each and how they're prioritized will vary by company and by industry. You don't have the time to evaluate every attribute that could fall under technical expertise, so as a hiring manager you need to identify which ones you value and come up with strategies to measure how different candidates stack up on that metric.

I highlighted scope in a different color because it's not so much a skill as it is an area of responsibility. Skillfulness in other areas does lead to expansion in the scope of responsibility for an individual within an organization through promotions, but they're still quite distinct from one another since it's not inherently a skill. Nonetheless, it's a key metric because increasing scope does have a significant and direct effect on your value. You can have flying marks in all three areas of technical expertise, qualities, and execution but work on relatively small system compared to other members of the organization with very little business value.

How this generic structure helps engineers

You can use this list as a framework for thinking about your skill set in the context of your domain / industry. How do you think you rank on technical expertise? On your personal qualities like communication ability? Do you want to become knowledgeable about other critical parts of the system to expand your area of responsibility? If you find a weak area, start working on it.

Admittedly, this structure isn't very actionable if you can't fill in the specifics. One particular area where finding the set of important skill seems daunting is in the area of technical expertise. A million things can qualify as technical expertise. For example, if you work in embedded systems, having knowledge of Angular.JS isn't as valuable even though it certainly qualifies as a technical expertise. The solution here is to focus on durable skills.

Durability is a trait that exists for each skill and across skills. First, we'll look at an example of two different skills that differ in durability.

1. Knowledge of classic computer algorithms

2. Knowledge of the API of a brand new AWS web service

#1 is not likely to change. #2 is more likely to change. Whether or not #1 is directly useful to you is a different story. But it is knowledge that is more durable because the most efficient sorting algorithms don't change in a matter of weeks or even years.

Lets take one skill: editor mastery.

Durable skill: Mastery over a popular, mature, productive editor that is available across systems.

Non-durable skill: Mastery over an editor that is buggy and works on a single platform.

Every time you decide to learn something, keep durability in mind. You're making an investment and you want to make sure you're investing in something that will pay dividends for years to come.

Resources

Wednesday, March 22, 2017

Sublime Selection

Select a character

SHIFT + LEFT / RIGHT

Select from cursor to end of word

SHIFT + OPTION + RIGHT

Select from cursor to beginning of word

SHIFT + OPTION + LEFT

Select a word

CMD + D

Select a line

CMD + L

Select between parenthesis

CTRL + SHIFT + SPACE

Select everything in file

CMD + A

Sublime Cursor Motion

Here's basic cursor motion commands you should know.

Move by character

UP / DOWN / LEFT / RIGHT

Move by word

OPTION - LEFT / RIGHT

Move to the end of the line

CMD + RIGHT

Move to the beginning of the line

CMD + LEFT

Move to a matching brace

CTRL + M

Move to the top of the file

CMD + UP

Move to the bottom of the file

CMD + DOWN

Go to a line

CTRL + G (preferred) or CMD + T + :

Go to a symbol

CMD + R

Touching Typing Numbers and Special Keys

When I learned to touch type, I learned how to type letter properly but never properly learned numbers and special characters.

Here's how the keys map to fingers:

left hand
1 - pinky
2 - ring
3 - mid
4 - index
5 - index

right hand
6 - index
7 - index
8 - mid
9 - right ring
0 - pinky

tab - left pinky
caps lock - left pinky
shift - left pinky
fn - left pinky
control - left ring finger
option / alt - left mid
command - left thumb

http://apple.stackexchange.com/questions/47293/what-fingers-do-i-need-to-use-for-hitting-control-option-and-command-buttons-on
http://www.typing-lessons.org/preliminaries_4.html

Tuesday, March 21, 2017

Database Indexes

Premature optimization is the root of all evil.

Database indexes are all about optimization. Using indexes prematurely is unnecessary in most cases. However, knowing what it is and how to use it will save your ass.

So lets talk indexing.

When you do things to some set of data in a relational database, the db has to retrieve that data. And if that data is retrieved based on some value (get every row where column C equals 5), then the db has to go through every single row in a table containing that data and decide whether or not that row should be used.

Unless you use indexes.

If you use indexes, the db does NOT have to go through every single row. Instead, it will find the rows it needs by using a special data structure. That special data structure is the index.

Most database indexes are B-Trees which allow for logarithmic time operations. In other words, it can change O(N) lookup to O(log n). If you have a lot of data, that can be a huge difference. Instead of looking at every row and checking if a specific column equals a certain value. If there's an index for that column, then the database can do a O(logn) time look up for the value in the B-Tree, then follow the pointer to the row!

Using an index is not always an optimization.

There are times when using an index can actually hurt you. An index is just used to do a look up for a row, but if a lot of those rows are being returned, then all an index adds is extra overhead for the lookups because when the matching rows are found using the index lookup, they still need to be scanned!

Monday, March 20, 2017

Relational Databases

So there's data that we want to store, access, and manipulate. Relational databases are tools that enable us to do just that.

With relational databases, you have to define the structure of the data before you can do anything with it. This structure is expressed as tables that have columns. And those columns represent a specific type of data (numerical, date, text, etc).

Tables are the types of things that you have information about. Columns are information about those types of things. Rows are information about the actual things. For example, if I'm interested in working with medical data and I need a database of patient (a type of thing) information, I might have a table called patients and the patients have information like name, and social security number, and insurance.

The blueprint of a database, which is just a set of table definitions, is called a schema.

Here's an example:

Doctor (
name CHAR(20)
id PRIMARY KEY
)

Patient (
name CHAR(20)
social_security_number INTEGER(9) PRIMARY KEY
doctor_id FOREIGN KEY REFERENCES doctor
)

Tables can also have keys which are a special type of information that is unique to every row in the table. In the case of patients, that might be their social security number since no two people will have the same social security number.

Using SQL to manipulate data in a relational DB

Now for this database to be us useful, it needs to contain data. We can create a db and manipulate data in that db by using SQL (structured query language) which is a language that you can use to define what you want to do with that data that the database understands.

> CREATE DATABASE example_hospital;

Adding a doctor

> INSERT INTO Doctor VALUES("Dr. Phil")

Get all the doctors!

> SELECT * FROM Doctor

Get all the names of all of the patients of Doctor with name "Dr. Phil" (this involves data from more than one table!)
To select data in multiple tables that are related, we have to join them by running JOIN statements.

> SELECT patient.name FROM Patient, Doctor WHERE patient.doctor_id = doctor.id AND doctor.name="Dr. Phil"

This is known as the inner join. There are no rows with a key value in a table that does not match up with the key value of of another row in the corresponding table. In other words, it excludes rows from both tables that do not link up.

Left outer - retain all the rows in the left table but exclude rows on the right if they don't match. The values in those rows are replace with the value NULL. This would include all the patients even if they don't have a doctor seeing them.

Right outer - retain all the rows in the right table. This would include all the doctors even if they don't have patients.

Full outer - retain rows in both tables! This would include all the patients and doctors.

Get the number of patients

> SELECT count(*) from Patient

Get the number of patients per doctor!

> SELECT doctor.name, count(patient.id) as num_patients FROM Patient, Doctor WHERE patient.doctor_id = doctor.id AND doctor.name="Dr. Phil" GROUP BY doctor.name

Maintaining Data Consistency

Databases also prevent you from trying to do things that it thinks is nonsensical. For example, if you say that a patient has a doctor and you insert a bunch of patients that reference doctors. Then you can't just delete doctors because then there will be foreign keys in the patients table that point to nothing. That violates referential integrity (which says that every row in a table with a foreign key must have that key point to an actual row in another table).

DB's also support features like transactions, where you can specify a series of operations that are treated as atomic. The changes only persist if every operation succeeds. Otherwise, no changes persist.

Sunday, March 19, 2017

Jalapeno Popper Sandwich with Bacon

Ingredients

bacon strips
sliced bread
jalapeno peppers
cream cheese
shredded cheddar cheese

Instructions

Pepper prep

cut the jalapeno peppers in half
broil the peppers

preheat to 400 and let it cook for 10-15 minutes until slightly charred

let it cool for 10 - 15 minutes because you'll need to touch them next
remove the skin and seeds

Cheese prep

mix the cheese in a bowl
or not. up to you.

Bacon prep

if frozen, put it in the fridge and let it thaw
bake the bacon

preheat to 450 and let it cook for 15 minutes. longer if you like crispier bacon.

Bread prep

toast the bread!

Assembly

Put the cheese on the bread. Then the peppers. Then the bacon.

Microwave for 1 - 2 minutes (to melt the cheese into everything else mmmmm) and serve :)

Saturday, March 18, 2017

Migrating a wordpress.com site will take you longer than thirty minutes

A couple of weeks ago my girlfriend told me that she wanted to install Google Analytics on her wordpress.com blog. Unfortunately, you can't install analytics without upgrading to the business plan which is a whopping $24.92 a month. She had plans to monetize her site in the future (ads?), which isn't possible with the personal plan.

So I told her that she should consider a self-hosted site. In fact, it's so easy I could do it for her in less than an hour.

I just finished.

It took over a week.

Here's what I had to do:

Create an account on a new hosting service
Install wordpress through cPanel using a temporary domain name
Import content from old wordpress into new wordpress
Install Jetpack after realizing that none of the shortcodes were working
Install Google Analytics (FREE)
Transfer current domain name registrar over to new registrar (This process takes FIVE days)
Buy a SSL certificate (got it for a dollar/year thanks to discount holla)
Update DNS nameservers for domain (Takes about a day to propagate) to point to new host
Update hosting plan to use current domain name and turn on SSL
Replace WP database references to temporary domain using
Optimize the site with the help of Google PageSpeed because the site was slow AF. Went from a page speed score of 40 (ah!) to 96 (yay) by using a combination of plugins:

autoptimize
wp super cache
wp asset cleanup
speed up javascript to footer
speed up optimize css delivery

Finding the right combination of plugins was a bit of trial and error. Some claimed to work but didn't. So I had to keep inspecting the page source and running page speed to test whether or not the plugins were making any difference to the site performance.

Done!

Design Patterns

Humans are great at seeing patterns. When a programmer solves a lot of problems, he starts to see solutions that seem generally applicable to a wide category of problems. When those problems are design problems (how to manage the objects and the relationship between objects), those solutions are known as "design patterns".

There are different types of patterns:

Creational

ways to manage the creation of objects

Behavioral

ways to manage the communication of objects

Structural

ways to organize objects

A common structural pattern is known as the decorator or wrapper pattern. You use it by wrapping an object with another object that has the same interface which then modifies the behavior of the inner object. You're basically creating an onion :)

Decorator pattern in the wild

If you've ever build a rails application, you've probably run into situations where the controller is filled with code responsible for formatting your data for presentation. There's a library called draper that allows you to write decorator models that wrap your models with functions that will do the formatting, which removes an additional responsibility from your controllers.

Why not just use inheritance?

Less flexible. With decorators, you can decorate your objects at run time by wrapping them with other objects. With inheritance, you lose that flexibility.

Object Oriented Programming

OOP is just one approach to creating computing abstractions that's very popular because it's based on how we already think and perceive. We see the world in terms of things (objects) doing things (methods), and thus a programming language that allows us to define computing processes using that way of thinking feels much more familiar than, say, writing ones and zeros. OOP languages are all just variants of how to formally define those abstractions to a computer.

When I learned about object oriented programming in college using Java, I first learned a bunch of weird buzzwords like encapsulation, inheritance, polymorphism and that you need to make classes of things before they can do things. Oh and that in order to share the same set of behaviors in one class in the definition of another you have to use the "extends" keyword. So boring.

This kind of introduction to OOP that starts with the unique terms of OO and language specific keywords makes it harder to grasp the essence of the why behind OOP. Students don't need to know the difference between an abstract class or an interface to "get" OOP.

Regardless of what OO language you use, you'll always be dealing with objects and data structures. They're both data, but objects are a higher level abstraction that also comes with actions whereas data structures is just pure data. All the other terms surrounding OO that are language specific are just ways that that language allows you to define the behavior of those objects and their relationships with other objects.

Concepts like "polymorphism" and "encapsulation" are just things that are made easier by OO programming, but are not exclusive to OO programming languages. I think it's much more effective to teach how to model processes with OO first, then introduce more specialized concepts once the basic big picture understanding is cemented. Things like "encapsulation" starts to make perfect sense once you start seeing the benefits of hiding the details of how a thing does something.

Friday, March 17, 2017

Concurrency

Why do one thing at a time if you can do many things at a time? That's what multi-core processors allow computers to do - do many things in parallel. Even in single processors, from the users perspective it still seems as if things are running in parallel because operating systems are so damn good at context switching.

A thread is the fundamental unit of execution in a computer. It starts somewhere and ends somewhere. Once you introduce multiple threads of executions, you can open up a whole can of synchronization problems if you don't synchronize your threads!

For example, lets say Sally and Bob share a bank account. The way the system works is that there is a number of ATM machines in different locations that are connected to the central banking system. When someone initiates a transaction, a new thread of execution is initiated in the program that runs in the central system.

In this program, lets say there are accounts (one of which is shared by Bob and Sally).

The code involved in updating the balance as a result of a deposit is as follows:

newBalance = userBalance - amount (the calculation)
userBalance = newBalance (the update)

Now lets say Bob and Sally both attempt to withdraw at the same time. Lets say Thread A is the thread initiated by Bob and Thread B is initiated by Sally.

1. Bob and Sally both initiate a withdrawal of $100 from a starting balance of $500.
2. Thread A calculates the new balance to be $400 and then gets suspended while Thread B is run. At this point, the user balance is still $500.
3. Thread B ALSO calculates the new balance to be $400 and then goes to completion.
4. Tread A finishes with balance of $400.

They both withdraw $100, but the final balance is $400 :)

Luckily, there are several ways to synchronize threads so that things like this don't happen. A common construct is a semaphore which basically protects shared resources. So if one was used in this situation, it would be used to protect the account resource. If someone is already using the account to withdraw or deposit, don't let anyone (any other thread) access it.

How should you sort in this situation?

A master directory server receives a list of accounts, ordered by user ID, from each of several departmental directory servers. What's the best approach for this server to create a master list combining all the accounts ordered by user ID?

Questions you should ask:

Are the list of accounts from each individual server already sorted?
Can all the id's fit in memory?

Scenarios:

Individual lists are not sorted

extra memory available

You can pretty much just pick a sorting algorithm based on speed. Quick sort will do.

extra memory not available

You're still fine with most sorting algorithms as long as they're in-place (such as quicksort or selection sort).

Individual list are sorted

extra memory available

Merge sort can be very efficient here since the merge operation is O(n). Since we have extra memory, the auxiliary memory it needs may not be an issue.

extra memory not available

You can still use merge sort if you lazy load the sublists. So instead of loading O(N) records in memory that will require an additional O(N) extra memory for the temporary buffer, you'll just have the O(N) temporary buffer and read the values of the sublists as you need them from either the disk or server.

As you can see, there's no such thing as the"best" general sorting algorithm because it depends on what the constraints are!

Sorting Algorithms

There's only two reasons to sort a list:

1. So that you can present it in a way that's more readable. Say, a list of state names in alphabetical order.

2. Increase the efficiency of things you do to that list. For example, retrieving and removing the max value element in the list over and over (for some reason...)

There's a lot of ways to achieve both of those goals. And they all force you to understand the property of the list you want to sort and the characteristics of the sorting algorithm you use.

How much data are you trying to sort?
Is that data partially sorted?
Can all that data fit in memory?
Does the algorithm you use maintain the relative position of equal values? (stability)
Does the algorithm require extra memory?

If you're a programmer, you should at least know the basic properties of some of the most common sorting algorithms such as ...

Bubble sort (O(n^2))

Go through the list looking at pairs of elements. If the left one is less than the right, swap their positions.

Selection sort (O(n^2))

Start at the first element. Now find the minimum and swap the minimum with the first element and then move on to the second element and find the minimum from the second element to the last and swap.

Insertion sort (O(n^2))

Start at the second element and check if it's smaller than the first. If so, insert it before the first (the first becomes the second). You now have a partially sorted list! Now, you move on to the third element and insert it in the right position in that partially sorted list and continue until you have a fully sorted list.

Quick sort (O(n log n))

Pick a pivot and swap the elements so that all the numbers smaller than or equal to the pivot is on the left and those larger are on the right. Do this recursively!

Merge sort (O(n log n))

Split the list into individual elements, then merge them together (first by pairs) in order. The merge algorithm works by creating a new list and inserting the correct elements into the lists through comparisons of the two lists.

Most general purpose languages include sorting functions in their core libraries so you shouldn't ever have to write one yourself :)

Telephone Words

Write a function that takes a seven-digit telephone number and prints out all of the possible "words" or combinations of letters that can represent the give number.

Recursive Solution

If we've passed 7 digits, print out our current number. Other wise, loop through each letter associated with the current digit, save the current digit in position, and then recurse.

If you want to instead return the values instead of merely printing them, you would want to maintain a running list of all the numbers that grows as the functions "unwind".

Iterative Solutions

Just use 7 for loops :)

for ...

If you write out a bunch of combinations for a number by going from left to right, you'll notice that as the last digit cycles, the digit to its left also cycles.

For example, lets say we're only dealing with three sets of characters: [[a,b,c],[d,e,f],[g,h,j]]. Our pattern will be as follows:

a,d,g

a,d,h

a,d,j

a,e,g

a,e,h

a,e,j

a,f,g

a,f,h

a,f,j

b,d,g

...

As you can see, after we go through [g,h,j] at the end once, the previous numbers letter changes from a higher value to a lower value (d to e). We can take advantage of this simply by starting out with a word, say "a,d,g". Then we'll change "g" to "h" and print. Then change "h" to "j" and print. And then since we've gone through a full cycle, we'll reset it back to zero and then change our neighbor to be a higher value ("e"). The trick here is updating the counters for each position correctly (like when [d,e,f] is cycled through, its neighbor is updated.

Sunday, March 12, 2017

Recursion

"If you already know what recursion is, just remember the answer. Otherwise, find someone who is standing closer to Douglas Hofstadter than you are; then ask him or her what recursion is." - Andrew Plotkin

Informally defined, recursion is simply the process of repeating self similar elements. The easiest way to visualize a recursive event is by standing in front of a mirror with a mirror. You’ll see an infinite repetition of the same reflection.

In math and computing, recursion is more formally defined.

In mathematics, recursion is a definition of a function in which the application of the function is in its definition. A common example of this is the definition of the fibonacci sequence, which has two properties:

1. The base case or base cases(a non-recursive definition).
2. A set of rules that reduces all other cases to the base case(s).

In computation, a recursive function is one that invokes itself. In most programming languages, the definition of recursive functions end up closely resembling mathematical definitions of recursive functions.

Daylight Saving

To understand the "why" of daylight saving, we have to dig into the past.

In ancient times, the start and end of labor was determined by the rising and setting of the sun. The notion of having to abide by a specific time schedule is a product of the industrial revolution, when manufacturing factory owners began enforcing time schedules. Be at work at 7AM. Eat at 12PM. Go home at 8PM.

As people of industrial society, we’re accustomed to living by a strict time-based schedule. Nowadays, the starting and stopping in the operation of nearly all of our institutions are based on fixed, precise measurements of time. A store opens at an agreed upon moment in time and is opened for some measure of time. Since so many of our activities are based on some agreed upon measure of time, what if we change the time? What effect will that have on the activities?

In the mid-19th century, a man by the name of George Vernon Hudson pondered those exact questions that led to the implementation of "daylight savings". Hudson was working a shift job that gave him leisure time to collect insects, and led him to value after-hours daylight.

George had later shifts (not uncommon amongst industrial workers), so he wished for the sun to still be out by the time his boss let him off! But if his boss lets him off really late, then he wouldn’t be able to collect insects. Of course, he could always ask his boss to let him out earlier, but most factory owners weren’t that nice - and they certainly weren’t going to change their hours of operation just for a bunch of insect collectors!

Now if your boss won't let you off early, who should you appeal to? Well, Hudson went to the government. If the government moved the time forward, he will be able to get off work early and enjoy the sun! And this concept of moving the time forward to enjoy more sun in the evening is what is now called Daylight Saving.

Unfortunately for George, the government laughed at him. And when this idea of changing the time was finally enacted in 1916, he was already dead.

Sunday, March 5, 2017

Learning about the Y-Combinator

Mike Vanier wrote an awesome article on the Y-Combinator function that I've been working my way through. It's a fascinating read. Here are my notes on it so far.

An explicit recursive definition is a recursive function definition whose body contain the name of the recursive function. For example, (define (hello a) (hello a)).
It's possible to generate the recursive version of a function without using an explicit recursive definition by use of higher-order functions. Holy shit, right?
The Y-Combinator is one such higher-order function that can generate a recursive version of a function by using it's non-recursive function.
In functional programming, you can create the non-recursive version of a function by abstracting out the recursive call. So instead of referencing itself, it will reference a function that's to be passed in as an argument by a higher order function. Common technique in FP.
Pass an identity function as an argument to the non-recursive function to get the factorial of all numbers up to 0. For example, (define factorial-zero (almost-factorial identity)). This will fail for N > 0. However, it will succeed if you pass factorial-zero as an argument to factorial-one! This can be proven although why it works is still a mystery to me.
By performing the above process infinity times, you can get the factorial of all numbers up to infinity. The function that can do that for all numbers up to infinity is the factorial function! But how the heck do you define this chain of functions up to infinity?
To be continued ...

Thursday, February 16, 2017

Two definitions of programming

Over the past year I've finally begun to realize that writing machine executable code is not the essence of programming and these two definitions have helped me articulate what's really the essence.

"Well it seems to me the most succesful programmers I’ve encountered don’t craft software; they write software in order to move information around, in order to get something done. Information is the real deal – the software just defines the space that it moves around in. For those programmers, success is about getting information from point A where it’s currently languishing to point B where it’s going to actually be useful, as quickly and effectively as they can." - Dan North

"Programming, by definition, is about transforming data: It’s the act of creating a sequence of machine instructions describing how to process the input data and create some specific output data." - Noel

The movement and transformation of information. How effectively you do this to create value in a domain will determine your worth as a programmer in that domain.

Wednesday, February 8, 2017

Sublime line editing

One of the most common code editing actions I do is at the individual line level. I copy lines, delete lines, move lines. The shortcuts that sublime provides are great. Here's some of the main ones I use.

Selecting

Select a line - CMD + L

Deleting

Delete from cursor to end of the line - CTRL + K
Delete from cursor to the beginning of the line - CMD + Delete

Delete entire line - CTRL + SHIFT + K (Really annoying to use)

Cut line - CMD + X (delete entire line and copy)

Moving

Move a line up - CTRL + CMD + UP

Move a line down - CTRL + CMD + DOWN

Copying

Duplicate a line - CMD + SHIFT + D

Then there are a couple more that are not about manipulating lines themselves but are based on lines:

Insert cursor to line before current line - CMD + SHIFT + ENTER

Insert cursor to line after current line - CMD + ENTER

Monday, January 23, 2017

GIT in 5 Minutes from the top down

You have directories and files on your computer which make up your file system
GIT is a version control system that allows you to manage changes to any part of that file system over time
A working tree is any directory on your computer that has a GIT repository
A repository is a collection of commits
A commit is a type of GIT object that represents a snapshot of a working tree
A snapshot of a working tree is represented by a GIT tree object
A tree object is a tree of blobs
The three fundamental types of GIT objects are commit, tree, and blob
blobs hold file contents and thus the structure of blob trees reflect the structure of the working tree

blobs track file size and content. Two files with the same content will be represented by the same blob ID by GIT even if they're on two different machines. Why? Because they have the same content

Fundamentally, using GIT means manipulating the shape of blob trees

In summary, since commits point to blob trees which are the snapshot of your working tree. Manipulating different commits means manipulating different snapshots of your working tree. If you know how to do this effectively for a project, you have a great deal of power over what changes made a one point in the project to the changes made at a different point.

But what about stuff like HEAD, branches, tags, rebase, merge ... etc. Ok, ok. One more minute.

A HEAD points to the currently checked out commit

If you checked out a branch, it's tracking that branch
If you checkout out a commit, it's not tracking new commits. Called a "detached head" because it's not tracking a branch so it's not moving with the branch

branches and tags are just named commits

a branch (branch of development) represents the first commit in a tree of commits and tracks new commits made on top of that tree
a tag represents a specific commit and does not move

merge and rebase are different ways of bringing together different snapshots of your working tree together

merge tells git to "just bring shit from main branch A into feature branch B, then create a single new merge commit that represents the new working tree that resulted in the changes you made"
rebase tells git to "bring each commit from main branch A, let me decide if I want to keep them, and then bring the ones I want to keep into branch B one by one as if I made the changes on branch B"

Thursday, January 12, 2017

Slightly more advanced sublime shortcuts

There are more specialized text manipulation shortcuts that can help you make the changes you want more quickly than relying on basic selection shortcuts:

Create a macro
CTRL + Q to begin recording, CTRL + Q to stop recording, and CTRL + SHIFT + Q to execute the macro.

This is super useful when you need to make the same change across a number of different files.

Shift lines up and down
CMD + CTRL + UP and CMD + CTRL + DOWN

I have to move lines of code around very often and this is a significantly more efficient (and less error prone) way of doing it then cutting the line and then pasting it. You can also shift multiple selected lines!

Repeat the last command
CMD + Y

If you need to repeat commands that take way too many keys like CTRL + SHIFT + Q (remember what that does?) or CMD + CTRL + DOWN, you can just type this to repeat it. Really handy!

In the next post, I'll be covering basic search shortcuts.

How do inactive credit cards affect your credit score?

"One of the most important factors in your credit score is the credit utilization ratio (often referred to as the debt-to-credit ratio). This measures the existing balance on your cards relative to your total spending limit, and reporting agencies use it to assess how well you handle credit.

That said, there is one way that maintaining a zero balance could hurt your score. Card issuers have been known to cancel cards after sustained inactivity, and if your account is closed, then you lose that portion of your total spending limit."

- http://thepointsguy.com/2016/02/should-i-use-credit-cards-monthly/

"If you’re holding onto a credit line that you’re not using, your credit card company would rather take it away from you and give it to another customer that will."

"Although the CARD Act of 2009 states that creditors must provide customers with 45 days notice of major changes to the terms of their accounts, courts have decided that a card cancellation caused by inactivity doesn’t count as one of those major changes."

- https://www.nerdwallet.com/blog/credit-cards/credit-card-cancelled-due-inactivity/

"Utilization contributes toward the amounts-owed portion of your FICO score, a scoring model commonly used by lenders. The amounts-owed factor counts for 30 percent of your score."

"The score wants to see some kind of activity"

- http://www.bankrate.com/finance/credit-cards/does-card-inactivity-hurt-credit-score.aspx

In summary:

You must use credit cards and pay them off as agreed (making at least minimum payments) to build credit score. Now
Credit utilization accounts for at least 30% of your FICO score. The lower the better - but to a point!

No credit activity for a some cards can lead to card cancelation, which increases your utilization.
No credit activity at all can hurt because zero activity = unpredictable = high risk.

So, not having any credit card activity is not ideal. If you have many cards and you want to stop using them, avoiding them permanently will cause them to be canceled by the issuer. Keep the ones you want active just by using them once every few months. As for the ones you don't want, cancel them after zeroing out your balance to avoid a hit to your utilization ratio. In general, having a long and positive credit history (healthy ratio) is more important than how much total credit you have at any given moment.

Sunday, January 8, 2017

Trying a new architecture for managing my basic finances

Back when I was a student with almost no money and no credit cards, managing my finances was simple. The only recurring expense I had was my tuition and room and board and my parents took care of that once a year. I didn't own any credit cards so all the money I had to use was whatever I had in my bank account from my on-campus jobs.

As far as I was concerned, the financial architecture of my life looked like this:

Sometimes I wanted to buy stuff on Amazon.com and sometimes I would go with my friends out to the city to eat at a restaurant. I had no bills to worry about month to month. Since I didn't have credit cards (zero debt), I knew exactly how much money I had just by looking at my checking account. It was very, very low stress.

Nowadays, things are a bit more complicated. I have lots of bills to pay, I use a number of services that charge me periodically, and I have multiple credit cards.

My financial architecture now looks like this:

My parents are no longer handling my fixed expenses and the fixed expenses that I deal with now exist on different payment cycles. As a college student, there was one giant payment in the beginning of the school year that pretty much covered my ass for the whole year (food, living, club activities). Now, there's a bunch of individual fixed expenses that charge monthly - and at different times of the month!

I can no longer just look at my checking account and know how much money I have to spend each month because:

I pay for services using my credit card and credit card balances aren't due until the following month. So if I spend $1000 right now on a plasma television, I won't see that amount reflected on my account this month since it goes towards what's due the next month.
I pay for services using multiple credit cards, which means I can't know how much I have to spend until I analyze the statements of all of my credit cards!
Some services (like my building maintenance fee) are fixed and recurring but can't be paid using a credit card! And at the same time, they bill at the middle of the month. This complicate matters even more.

I don't like this at all. It's stressful and on more than one occasion it's caused me to overdraw my account because I don't know how much I actually have to spend. Sure, I can just start tallying how much I spend every single day but I really don't want to do daily, manual accounting.

Here's the new system I'm attempting to move towards:

In this system, I only use a single credit card for all of my fixed, monthly payments. The benefit of using a credit card for these services is that I don't have to worry about when in the month they bill me. It could be the 2nd. Mid-month. 23rd. It doesn't matter. All I need to be concerned about is:

When my credit card payment is due
I have enough money to cover that payment

The credit card is essentially an interface for payment between me and all the monthly services I have to pay for! Also, you can continue building your credit and get cash back rewards for using the card.

The second key feature is switching my primary and secondary checking accounts. The secondary checking is now responsible for all outbound payments - to credit cards and fixed services that don't use credit cards like my building maintenance. The key benefit of this is that it gives me complete control over when I want to deduct the expenses from my primary account. If I want to do it at the beginning of the month, I can set up a system to move X amount of money over.

With these two changes, I have full control over when to pay and when to get charged. Sure, they need to be synchronized but having to synchronize two dates is a vast improvement over having to synchronize N where N is the number of services AND credit cards.

How to set up this system

Set up a fixed payment checking and fixed payment credit card

Create a second checking account if you don't already have one.
Choose a single credit card to pay for fixed expenses (it doesn't have to be all of them).
Update as many of your fixed cost, recurring services to draw from your credit card.
Use your second checking to pay off your credit card and remaining services not on your credit card.

Sync schedules

Update credit card to charge balance at the end of the month.
At the beginning of the month (or close to the date you're paid), move the money over to your fixed payment checking.

Migrating

If you're migrating from a different system, it's easiest to pay off your total balance for all of your credit cards before adopting this new system. Once everything is set up and your cards are paid off for the current month, set up the scheduling for the next month!

Friday, January 6, 2017

How I currently manage my finances vs how I want to manage my finances

My current system is as follows:

Three credit cards

Discover One
AMEX Blue
Chase Amazon rewards

One debit card

Charles Schwab Bank

Subscription bills go to my discover card. I use AMEX for groceries and my Amazon card for all other purchases. My paycheck gets directly deposited into my Charles Schwab debit card and all credit card payments are drawn from the same checking account. I set a monthly budget for myself, but I have a hard time sticking to it.

The problem isn't that I have a lack of discipline. The problem is that this system is too fucking complex. Monthly fixed expenses like entertainment subscriptions or phone bills get deducted at different times during month and credit card balances accumulated in the current month isn't due until the following month.

I can't simply look at a single balance and say, "can I afford to buy this thing I'm about to buy right now?".

In an excellent post by Ramit Sethi on automating your finances, he says

"Most people neglect one thing when automating: dates. If you set automatic transfers at weird times, it will inevitably necessitate more work, which will make you resent and eventually ignore your personal-finance infrastructure. For example, if your credit card is due on the 1st of the month, but you don’t get paid until the 15th, how does that work? If you don’t synchronize all your bills, you’ll have to pay things at different times and that will require you to reconcile accounts. Which you won’t do."

That's exactly what I've experienced! Because my bills were not synchronized, I felt like I could no longer trust the balance that was in my account to the point of ignoring it altogether.

Ramit offers some great advice for avoiding this problem: get your bills on the same schedule. Get paid on the 1st? Get all your bills paid at around that time automatically. There's a couple of problems (which he addresses) that makes this a bit tricky. First, not everyone gets paid on the 1st (or even once a month). I currently get paid twice a month - on the 6th and on the 22nd. Second, it's a lot of work to try to get companies / services to charge you on the date of your choosing and keeping that up to date as your income schedule changes.

I really like his idea, but I think we can we set up an even simpler system. One that is less dependent on when you get paid or charged.

The system I'm about to propose is based on the following key assumptions:

1. You're less likely to overspend if you know how much you can actually spend as early as possible. How much you can actually spend = (your income - fixed expenses). You can't budget if you don't know how much you're working with.
2. Simply knowing how much you can spend is not enough if your account balances do not accurately reflect it. For example, you may know that you're left with $356 after paying your fixed expenses every month. However, your expenses don't kick in on your account until long after you've received your paycheck. Looking at your bank account, it looks like you have a lot more to spend than you really do!
3. Trying to keep track of all the charges (the synchronization challenge Ramit refers to) in your head is really hard and will cause you to hate life.
4. Asking all the services you use to charge you on the same schedule is a lot of work and requires a great deal of effort and discipline to maintain.
5. Paying your bills for all the things you're charged for all at once is significantly better than splitting your payments because you have a bi-weekly paycheck schedule. Simple is good.

Ramit's system is based on the first three assumptions. You want to know how much you got to spend early, and you do that by calling companies and getting them to charge you as close to when you receive your paycheck.

I don't think it's a good idea to request every service you use to adhere to a specific schedule for several reasons. First, this may not always be possible. The universe does not revolve around you. What if one of those charges happens to be a monthly loan repayment for your friend who needs the money on the first of the month? Second, your income schedule may change. People change jobs all the time now. What happens then? Do you call every company again? Lastly, you have to do this for every new service you pay for. Just bought an ESPN subscription? Better go and make sure that they bill you on the 1st! That's going to get annoying real fast.

Ramit suggests a different system for people who get paid twice a month: split your payments. I really don't like this idea. My biggest problem with it is that it ties when you pay to when you get paid. You have to constantly re-evaluate the math every time either one changes. New service added? Oh, well does that get paid in the first or second pay check?

My proposal

"Pay" all of your fixed charges on the first of the month by moving it to a separate checking account.
Update your charges to go on a single credit card and schedule your payments to be at the end of the month or anytime after you've received your total income for the month.

What's great about #2 is that you can allow services to charge you anytime of the month. No need to manage when you need to pay them since that's taken care of by your credit card. With #1, you can reap all the benefits of having your bills charged on the first without having to actually reschedule any of your charges (thanks to #2). And if you don't have enough to pay it all upfront (because you get paid twice a month), splitting your charges in half is far easier this way than modifying payment schedules so that roughly half gets taken out.