Title

Interview Cheat Sheet


Technology
    <p><p>INTERVIEW CHEAT SHEET</p></p>



    <p></p>



    <p><ul>

  • Non-Tech Discussions
  •     <p><pre><code>- Thunken Projects
    

        <p><pre><code>    - Cobalt Metrics
    

        <p><pre><code>    - Ironsift
    

        <p><pre><code>    - Clients (Nanomolard, Digital Science, European Science Foundation)
    

        <p><pre><code>    - Legal Citation Analysis
    

        <p><pre><code>    - UrlSniffer
    

        <p><pre><code>- CTG Projects
    

        <p><pre><code>    - NCISC Federal Contract
    

        <p><pre><code>- Adverse Decisions
    

        <p><pre><code>    - Law: client wanted to settle early; instead
    

        <p><pre><code>    - Thunken:
    

        <p><pre><code>        - Legal Citation analysis
    

        <p><pre><code>        - UrlSniffer
    

        <p><pre><code>            - J
    

        <p><pre><code>    - CTG
    

        <p><pre><code>        - Celebrity Biometric Competition
    

        <p><pre><code>            - elasticsearch allowed aggregations and fuzzy search out of box (whereas mongo does not)
    

        <p><pre><code>            - caching on file system vs amazon buckets
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><ul>
    

  • Algorithms
  •     <p><ul>
    

  • Sort
  •     <p><pre><code>- Merge Sort
    

        <p><ul>
    

  • Search
  •     <p><ul>
    

  • Data Structures
  •     <p><pre><code>- Stack
    

        <p><pre><code>- Queue
    

        <p><pre><code>- LinkedList (single and double)
    

        <p><pre><code>- Binary Search Tree
    

        <p><ul>
    

  • Object Oriented Programming
  •     <p><pre><code>- Inheritance
    

        <p><pre><code>- Polymorphism
    

        <p><pre><code>- Abstract Class vs. Interface
    

        <p><ul>
    

  • Java
  •     <p><pre><code>- equals vs ==
    

        <p><pre><code>- Generics
    

        <p><pre><code>- Final, Finalize, Finally
    

        <p><ul>
    

  • Java Spring
  •     <p></p>
    
    
    
        <p><ul>
    

  • Patterns
  •     <p><pre><code>- well-defined solution to a specific problem in a particular context (i.e., documentation)
    

        <p><pre><code>- Types of Patterns
    

        <p><pre><code>    - Singleton
    

        <p><pre><code>    - Factory
    

        <p><pre><code>    - Proxy
    

        <p></p>
    
    
    
        <p><ul>
    

  • SQL
  •     <p><pre><code>- JOIN
    

        <p><pre><code>    - inner join
    

        <p><pre><code>    - outer join
    

        <p><pre><code>    - full join
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h1>Behaviroal Questions</h1></p>
    
    
    
        <p></p>
    
    
    
        <p><ol>
    

  • Name a time you went against the grain and how did it go.
  •     <p><pre><code>- Law Example: Trademark infringment case; motion in limine leading to big settlement
    

        <p><pre><code>- Thunken Example:
    

        <p><pre><code>    - Legal Citation Analysis Project
    

        <p><pre><code>-CTG Example:
    

        <p><pre><code>    - Federal Contract Coding Competition; Celebrity Database
    

        <p></p>
    
    
    
        <p><ol>
    

  • Describe projects you worked on.
  •     <p><ul>
    

  • Thunken
  •     <p><pre><code>- Cobaltmetrics
    

        <p><pre><code>- Ironsift
    

        <p><pre><code>- Legal Citation Analysis
    

        <p><ul>
    

  • Capital TG
  •     <p><pre><code>- Celebrity App for Coding Competition
    

        <p><pre><code>- Chronos
    

        <p><ul>
    

  • Uplaw
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><hr /></p>
    
    
    
        <p></p>
    
    
    
        <p><pre><code>                    ALGORITHMS
    

        <p></p>
    
    
    
        <p><hr /></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h1>Algorthms</h1></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>Algorithm Analysis</h2></p>
    
    
    
        <p><ul>
    

  • how long will my program take?
  •     <p><ul>
    

  • why does my program run out of memory?
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Asymptotic notation</strong>--tool for measuring the growth rate of functions, which for program design usually means the way in which the time or space costs of a prgoram scale with the size of the input.</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Mathematical models.</strong> The total running time of a program is determined by two factors:</p></p>
    
    
    
        <p><ol>
    

  • the cost of executing each statement
  •     <p><ol>
    

  • the frequency of execution of each statement
  •     <p><ul>
    

  • Tile approcimations--throw away low order terms
  •     <p><pre><code>- f(N) represents any function that when divided by f(N) approaches 1 as N grows.
    

        <p><pre><code>- g(N) - f(N) indicates that g(N)/ f(N) approaches 1 as N grows.
    

        <p><ul>
    

  • Order-of-growth classification.-- Most often, we work with tilde approximations of the form g(N) ~ a f(N) where f(N) = N^b log^c N and refer to f(N) as the The order of growth of g(N).
  •     <p><pre><code>- constant, logarithmic, linear,
    

        <p></p>
    
    
    
        <p><pre><code>- linear and n log n are alogorithms we strive for
    

        <p><pre><code>    - they scale well
    

        <p></p>
    
    
    
        <p><p><strong>Power Law</strong>--funcitonal relationship b/w two quantiies, wehre a relative change in one quantity results in a propportional relative change in the other quantity, independent of the intial size of those quantiies: one quantity varies as a power of another.</p></p>
    
    
    
        <p><pre><code>- e.g., considering the area of a square in terms of the length of its side, if the length is doubled, the area is multipled by a factor of four
    

        <p><pre><code>- aN^b
    

        <p></p>
    
    
    
        <p><p><strong>Big O to rescue</strong></p></p>
    
    
    
        <p><ul>
    

  • idea is to replace complex running time forulae like cnlog n or c1n(n+1)/2 + c2n with an asymptotic growth rate O(nlog n) or O(n2)
  •     <p><pre><code>- these asymptotic growth rates omit the specific details of exactly how fast our algorithms run and concentrate solely on how the cost scals as the size of hte input n becomes larger
    

        <p><pre><code>    - solves two problems:
    

        <p><pre><code>        1. computers are different speeds to need something where we can measure without worrying about hardware
    

        <p><pre><code>        2. performance on larger inputs is more important than performance on small inputs, since programs running on small inputs are usually prety fast.
    

        <p><ul>
    

  • The idea of ’‘’asymptotic notation’’’ is to consider the shape of the worst-case cost T(n) to process an input of size n. Here, worst-case means we consider the input that gives the greatest cost, where cost is usually time, but may be something else like space
  •     <p></p>
    
    
    
        <p><p>-Big O Complexity (from worst to best)</p></p>
    
    
    
        <p><pre><code>1. O(1)--Constant
    

        <p><pre><code>2. O(log n)--logatirhmic
    

        <p><pre><code>3. O((log n)^k)--polylogatithmic
    

        <p><pre><code>3. O(n)--linear
    

        <p><pre><code>4. O(n log n)--quasilinear ?
    

        <p><pre><code>5. O(n^2)--polynomial
    

        <p><pre><code>6. O(2^n)--exponential
    

        <p><pre><code>7. O(n!)--factorial
    

        <p></p>
    
    
    
        <p><p>NAME    NOTATION    EXAMPLE CODE FRAGMENT</p></p>
    
    
    
        <p><p><strong>Constant  O(1) (statement)</strong></p></p>
    
    
    
        <p><ul>
    

  • e.g., array access, arithmetic operation..function call
  •     <p><ul>
    

  • op();
  •     <p></p>
    
    
    
        <p><p><strong>Logarithmic   O(logn) (divide in half)</strong></p></p>
    
    
    
        <p><ul>
    

  • e.g., binary search in a sorted array, insert in a binary heap, search in a red–black tree
  •     <p><p>```</p></p>
    
    
    
        <p><p>for (int i = 1; i &lt;= n; i = 2*i)</p></p>
    
    
    
        <p><pre><code>op();
    

        <p><p>```</p></p>
    
    
    
        <p><p><strong>Linear    O(n) (loop)</strong></p></p>
    
    
    
        <p><ul>
    

  • e.g., sequential search, grade-school addition, BFPRT median finding
  •     <p><p>```</p></p>
    
    
    
        <p><p>for (int i = 0; i &lt; n; i++)</p></p>
    
    
    
        <p><p>op();</p></p>
    
    
    
        <p><p>```</p></p>
    
    
    
        <p><p><strong>Linearithmic  O(nlogn) (divide and conquer)</strong></p></p>
    
    
    
        <p><ul>
    

  • e.g., mergesort, heapsort, fast Fourier transform
  •     <p><p>```</p></p>
    
    
    
        <p><p>for (int i = 1; i &lt;= n; i++)</p></p>
    
    
    
        <p><p>for (int j = i; j &lt;= n; j = 2*j)</p></p>
    
    
    
        <p><pre><code>  op();
    

        <p><p>```</p></p>
    
    
    
        <p><p><strong>Quadratic O(n^2) (double loop)</strong></p></p>
    
    
    
        <p><ul>
    

  • e.g., enumerate all pairs, insertion sort, grade-school multiplication
  •     <p><ul>
    

  • running time grows much faster than input size
  •     <p><ul>
    

  • not really acceptable for big problems
  •     <p><pre><code>- quadractic algorithms don't scale well: as computers get bigger, quadratic algorithms gets slower
    

        <p><p>```</p></p>
    
    
    
        <p><p>for (int i = 0; i &lt; n; i++)</p></p>
    
    
    
        <p><p>for (int j = i+1; j &lt; n; j++)</p></p>
    
    
    
        <p><pre><code>  op();
    

        <p><p>```</p></p>
    
    
    
        <p><p><strong>Cubic O(n^3) (triple loop)</strong></p></p>
    
    
    
        <p><ul>
    

  • e.g., enumerate all triples, Floyd–Warshall, grade-school matrix multiplication
  •     <p><p>```</p></p>
    
    
    
        <p><p>for (int i = 0; i &lt; n; i++)</p></p>
    
    
    
        <p><p>for (int j = i+1; j &lt; n; j++)</p></p>
    
    
    
        <p><pre><code>  for (int k = j+1; k &lt; n; k++)
    

        <p><pre><code>     op();
    

        <p><p>```</p></p>
    
    
    
        <p><p><strong>Polynomial O(n^C()</strong></p></p>
    
    
    
        <p><p>**Exponential 2^O(n^C) (exhaustive search)</p></p>
    
    
    
        <p><ul>
    

  • e.g., check all subsets
  •     <p></p>
    
    
    
        <p><p><strong>Memory Usage</strong>. To estimate how much memory our program uses, we can count up the number of variables and weight them by the number of bytes according to their type. For a typical 64-bit machine,</p></p>
    
    
    
        <p><ul>
    

  • Primitive types. the following table gives the memory requirements for primitive types.
  •     <p><p>memory requirements for primitive types</p></p>
    
    
    
        <p><ul>
    

  • Objects. To determine the memory usage of an object, we add the amount of memory used by each instance variable to the overhead associated with each object, typically 16 bytes. Moreover, the memory usage is typically padded to be a multiple of 8 bytes (on a 64-bit machine).
  •     <p><ul>
    

  • References. A reference to an object typically is a memory address and thus uses 8 bytes of memory (on a 64-bit machine).
  •     <p><ul>
    

  • Linked lists. A nested non-static (inner) class such as our Node class requires an extra 8 bytes of overhead (for a reference to the enclosing instance).
  •     <p><p>_ Arrays_. Arrays in Java are implemented as objects, typically with extra overhead for the length. An array of primitive-type values typically requires 24 bytes of header information (16 bytes of object overhead, 4 bytes for the length, and 4 bytes of padding) plus the memory needed to store the values.</p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>1. Algorithms Fundamentals</h2></p>
    
    
    
        <p></p>
    
    
    
        <p><p>Algorithm--method for solving a problem</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>Data Structure--method to store information.</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Primitive Data Types</strong></p></p>
    
    
    
        <p><ul>
    

  • long (64 bit)
  •     <p><ul>
    

  • Integers (int) (32 bit)
  •     <p><ul>
    

  • Real Numbers (double) (32 bit)
  •     <p><ul>
    

  • Booleans
  •     <p><ul>
    

  • Characters (16 bit)
  •     <p><ul>
    

  • bytes (8 bits)
  •     <p></p>
    
    
    
        <p><p><strong>Statements</strong></p></p>
    
    
    
        <p><ul>
    

  • Declarations create variables of specified types names with identifiers.
  •     <p><ul>
    

  • java stronly typed because Java compiler checks for consistency
  •     <p><ul>
    

  • Assignment
  •     <p><ul>
    

  • Initializing Declaration
  •     <p><ul>
    

  • Implicit argument (i++, ++i)
  •     <p><ul>
    

  • conditional
  •     <p><ul>
    

  • loop
  •     <p><ul>
    

  • call
  •     <p><ul>
    

  • return
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>Arrays</h2></p>
    
    
    
        <p><ul>
    

  • An array stores a sequences of values
  •     <p><ul>
    

  • aliasing-if we assign one array name to another, both refer to the same array
  •     <p><ul>
    

  • 2d array: a[i][j]
  •     <p></p>
    
    
    
        <p><p><strong>Static Method.</strong></p></p>
    
    
    
        <p><ul>
    

  • sequences of statements that are executed
  •     <p><ul>
    

  • called functions in many programming languages
  •     <p><ul>
    

  • Properties of Methods
  •     <p><ul>
    

  • Method Overloading--same name; different signatures.
  •     <p><ul>
    

  • single return value, but may have multiple return statements
  •     <p><ul>
    

  • "side effects"--method returns void, but may have side effects
  •     <p><p><strong>Recursion</strong>--A recursive method is a method that calls itself either directly or indirectly.</p></p>
    
    
    
        <p><pre><code>- 3 rules of thumb:
    

        <p><pre><code>    1. base case
    

        <p><pre><code>    2. recursive calls must address subproblems that are smaller in some sense, so that recursive calls converge to the base case.
    

        <p><pre><code>    3. recursive calls should not address subproblems that overlap.
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h3>Object Oriented Programming</h3></p>
    
    
    
        <p><ul>
    

  • programming based on building data types
  •     <p><ul>
    

  • revovles around an object--an entity that holds a data type value
  •     <p><pre><code>- primitive types largely restricted to opearting on numbers
    

        <p><pre><code>- reference types allow strings, pictures, sounds or many othe abstractions
    

        <p><ul>
    

  • Data Type-set of values and operations on those values
  •     <p><ul>
    

  • Abstract Data Type--data type whose internal representation is hidden from the client
  •     <p><ul>
    

  • Objects--an entity that can take on a data-type value.
  •     <p><pre><code>- 3 properties:
    

        <p><pre><code>    1. state--value from object's data type
    

        <p><pre><code>    2. identity--distinguihes one object from another
    

        <p><pre><code>    3. behavior--effect of data-type operations
    

        <p><pre><code>-"reference"--mechanism for accessing an object
    

        <p></p>
    
    
    
        <p><p><strong>Application Programming Interface</strong>--used to specify the behavior of an abstract data type</p></p>
    
    
    
        <p><ul>
    

  • list of constructors and instance methods (operations), with an informal description of the effect of each
  •     <p></p>
    
    
    
        <p><p><strong>Client</strong>--program that uses the data type</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Implementation</strong>--code that implements the data type specified in an API</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Abstract Class vs Interface</strong></p></p>
    
    
    
        <p><ul>
    

  • Abstract Class--allow creating blueprint for concrete classes (but the inheriting class should implement the abstract method)
  •     <p><pre><code>- can be subclassed; cannot be instantiated
    

        <p><pre><code>-  methods: static, abstract, or non-abstract methods that subclasses are forced to provide an implementation for
    

        <p><pre><code>- variables: final, static , or class member variables
    

        <p><pre><code>- visibility: private, protected, or public
    

        <p><pre><code>- use: when subclasses share state or use common functionality, or you require to declare non-static, non-final fields or need access modifiers other than public
    

        <p><pre><code>- Inheritance: only one abstract class
    

        <p><ul>
    

  • Interface--blueprint used to implement a class--no concrete mehtods(i.e., no code)--all methods abstract (i.e., Pure Abstract)
  •     <p><pre><code>- methods: static, abstract, and default methods
    

        <p><pre><code>- variables: final and static (never contain instance varaiable)
    

        <p><pre><code>- visibility: public
    

        <p><pre><code>- inheritance: allows multiple inheritance
    

        <p><pre><code>- use: if you expect unrelated classes would implement your interface; where multiple inheritances of type is desired
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Inheritance</strong> :</p></p>
    
    
    
        <p><p><strong>Using Abstract Data Types</strong>)</p></p>
    
    
    
        <p><ul>
    

  • a client does not need to know hw a data type is implemented in order to be able to use it
  •     <p><ul>
    

  • create object with new and contructor
  •     <p><ul>
    

  • instance methods used to operate on objects' data type values
  •     <p><ul>
    

  • in Java, execept for primitives, all data types are objects
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h3>Implementing Abstract Data Types</h3></p>
    
    
    
        <p><ul>
    

  • Instance Variables--declare instance variables to define data-type values
  •     <p><ul>
    

  • Constructors--establishes an object's identity and intializes the instances variables.
  •     <p><ul>
    

  • Instance Methods--specify the data-type operations.
  •     <p><ul>
    

  • Scope
  •     <p><pre><code>- parameter variables (specified in method signture and intialized with client values).
    

        <p><pre><code>    - scope: entire method
    

        <p><pre><code>- local variables (declared and intialized w/in method body).
    

        <p><pre><code>    - scope: the block where they are defined
    

        <p><pre><code>- instance variables hold data-type values for objects in a class.
    

        <p><pre><code>    - scope: entire class
    

        <p></p>
    
    
    
        <p><h3>Designing Abstract Data Types</h3></p>
    
    
    
        <p><ul>
    

  • Encapsulation--hidden data--enables encapsualting data types within their implemntations
  •     <p><pre><code>- enables modular programming
    

        <p><ul>
    

  • Interface inheritance-
  •     <p><pre><code>- subtyping
    

        <p><pre><code>- Comparable - &gt; comparator
    

        <p><pre><code>- Iterable -&gt; iterator
    

        <p><ul>
    

  • Implementation inheritance--
  •     <p><pre><code>- subclassing
    

        <p><pre><code>- enables programmer to change behavior and add functionality without rewriting a entire class from scratch
    

        <p><pre><code>- sub class inherits instance methods and instance variables from another class
    

        <p><ul>
    

  • String Converstions
  •     <p><ul>
    

  • Wrapper Types (for primitive types)
  •     <p><ul>
    

  • Equality
  •     <p><pre><code>- `(a==b)` where a and b are reference variables of the same type
    

        <p><pre><code>    - testing whetehr they have they same identity--i.e., whether they're references are equal
    

        <p><pre><code> - `equals()`--test whether data-type values (object state) are the same
    

        <p><ul>
    

  • Memory Managemnet
  •     <p><ul>
    

  • Immutability
  •     <p><pre><code>- immutable data type has the property that the value of an object never changes one consrtucted
    

        <p><pre><code>- mutable-object values intended to change
    

        <p><ul>
    

  • Exceptions and Errors
  •     <p><ul>
    

  • Asserts
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>1.5: Union Find</h2></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Dynamic Connectivity</strong>. The input is a sequence of pairs of integers, where each integer represents an object of some type and we are to interpret the pair <code>p q</code> as meaning <code>p</code> is connected to <code>q</code>. (assume "is connected" is an <em>equivalence relationship</em>).</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Equivalence Relationship</strong>. An equivalence relation partitions the objects into equivalence cclasses or conneced components.</p></p>
    
    
    
        <p><pre><code>- _symmetric_: if p is connected to q, then q is connected to p
    

        <p><pre><code>- _transitive_: if p is connected to q an q is connected to r, then p is connected to r.
    

        <p><pre><code>_ reflexive_: p is connected to p.
    

        <p><pre><code>- connected componenets: maximal set of objects that are mutually connected.
    

        <p></p>
    
    
    
        <p><p><em>Note</em>: quadratic algorithms (n^2) don't scale</p></p>
    
    
    
        <p><pre><code>- as computers get faster and bigger, quadratic algorithms get slower
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h1>1.3. Bags Queues, And Stacks</h1></p>
    
    
    
        <p></p>
    
    
    
        <p><p><em>Generics</em>--enables ADT's to use any type of data</p></p>
    
    
    
        <p><ul>
    

  • e.g., public class Stack<T>
  •     <p></p>
    
    
    
        <p><p><em>Autoboxing</em>--automatically casting primitve to a wrapper type--</p></p>
    
    
    
        <p><ul>
    

  • type parameters have to be instantiated as reference types, so Java automatically converts b/w a primitive type and its corresponding wrapper type in assignments, method arguments, and expressions (e.g., int -> Integer and vice versa)
  •     <p><ul>
    

  • allows using generics with primitive types
  •     <p></p>
    
    
    
        <p><p><em>Iterable Collections</em>--allows iterating through collections</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><em>Bags</em>--a collection where removing items is not supported</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><em>FIFO Queue</em></p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><em>Pushdown Stack</em></p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><em>Array and resizing array implementations of collections</em></p></p>
    
    
    
        <p><ul>
    

  • Fixed Capacity Stack of Strings
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><em>Linked lists</em>--recursive data structure that is eitehr empty (null) or a reference to a node having a generic item and a reference to a linked list.</p></p>
    
    
    
        <p><ul>
    

  • implemntation
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>2. Stacks and Queues</h2></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Array</strong></p></p>
    
    
    
        <p><ul>
    

  • array compared to linked list:
  •     <p><pre><code>- array inserting and removing more expensive
    

        <p><ul>
    

  • resizing array
  •     <p><pre><code>- create new array of capacity; copy current array into first half of new array;
    

        <p><pre><code>- time proporitinal to n, not n^2 b/c ammoritzed time
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Stack</strong>--LIFO (last-in-first-out)</p></p>
    
    
    
        <p><ul>
    

  • API:
  •     <p><pre><code>- _push_ (insert a new elemnt on the top of the stack)
    

        <p><pre><code>- _pop_ (remove and return the element at the top of the stack.)
    

        <p><pre><code>- _peek_ (return top of stack)
    

        <p><pre><code>- _isEmpty_() (return true if stack is empty)
    

        <p><ul>
    

  • Implementations
  •     <p><ul>
    

  • unlike arrays, stacks do NOT offer contant-time access to the ith item. However, it does allow contsant-time adds and removes, as it doesn't require shifting elements aroudnd
  •     <p><ul>
    

  • stacks often useful for recursive algorithms
  •     <p><pre><code>- sometimes you need to push temporary data onto a stack as you recurse, but then remove it as you backtrack and a stack offers an intuitive way to achieve this
    

        <p><pre><code>- stack can also be used to implement a recursive alforithm  iteratively
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Queue, O(n), O(n), O(1)</strong>FIFO</p></p>
    
    
    
        <p><ul>
    

  • API
  •     <p><ul>
    

  • items removed from the data structure in the same order they are added
  •     <p><ul>
    

  • can be implemented with linked list, but if we want O(1) time for both enqueue (push) and dequeue (pop) operations, we must keep around pointers to both ends of the linked list)
  •     <p><p>```java</p></p>
    
    
    
        <p><p>//queue implementation linked list</p></p>
    
    
    
        <p><p>public class LinkedQueueOfStrings</p></p>
    
    
    
        <p><p>{</p></p>
    
    
    
        <p><p>private Node first, last;</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>private class Node{}</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>public boolean isEmpty(){</p></p>
    
    
    
        <p><pre><code>return first == null;
    

        <p><p>}</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>public void enqueue(String item){</p></p>
    
    
    
        <p><pre><code>Node oldLast = last;
    

        <p><pre><code>last = new Node();
    

        <p><pre><code>last.item = item;
    

        <p><pre><code>last.next = null;
    

        <p><pre><code>if (isEmpty()) first == last;
    

        <p><pre><code>else oldLast.next = last;
    

        <p><p>}</p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>public String dequeue() {</p></p>
    
    
    
        <p><pre><code>String item = first.item;
    

        <p><pre><code>first = first.next;
    

        <p><pre><code>if (isEmpty()) last = null;
    

        <p><pre><code>return item;
    

        <p><p>}</p></p>
    
    
    
        <p><p>}</p></p>
    
    
    
        <p><p>```</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>```java</p></p>
    
    
    
        <p><p>//queue implementeated with linked list</p></p>
    
    
    
        <p><p>//todo from week 2 algorithms</p></p>
    
    
    
        <p><p>```</p></p>
    
    
    
        <p><p><strong>Singly-Linked List (singly and doubly), O(n), O(n), O(1)</strong></p></p>
    
    
    
        <p><ul>
    

  • basic idea is store each item in its own node/struct, and each node/structs includes a pointer to the next node/struct (with null pointer to indicate there are no more elements). Follow the pointers to eventually reach all elements.
  •     <p><ul>
    

  • linked list compared to array:
  •     <p><pre><code>- linked list inserting and removing cheaper
    

        <p><ul>
    

  • linked list good for implementing stack b/c linked list allows cheap insert/remove
  •     <p></p>
    
    
    
        <p><p><strong>Hash Table, access n/a, O(n), O(n)</strong></p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Binary Search Tree, O(n), O(n), O(n)</strong></p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>```java</p></p>
    
    
    
        <p><p>public class MyStack<T> {</p></p>
    
    
    
        <p><p>private static class StackNode<T> {</p></p>
    
    
    
        <p><pre><code>private T Data;
    

        <p><pre><code>private StackNode&lt;T&gt; next;
    

        <p></p>
    
    
    
        <p><pre><code>public StackNode(T data){
    

        <p><pre><code>  this.data = data;
    

        <p><pre><code>}
    

        <p><p>}</p></p>
    
    
    
        <p><p>private StackNode<T> top;</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>public T pop(){</p></p>
    
    
    
        <p><pre><code>if (top == null) throw new EmptyStackException();
    

        <p><pre><code>T item = top.data;
    

        <p><pre><code>top = top.next;
    

        <p><pre><code>return item;
    

        <p><p>}</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>public void push(T item){</p></p>
    
    
    
        <p><pre><code>StackNode&lt;T&gt; t = new StackNode&lt;T&gt;(item);
    

        <p><pre><code>t.next = top;
    

        <p><pre><code>top = t;
    

        <p><p>}</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>public T peek(){</p></p>
    
    
    
        <p><p>if (top == null) throw new EmptyStackException();</p></p>
    
    
    
        <p><p>return top.data;</p></p>
    
    
    
        <p><p>}</p></p>
    
    
    
        <p><p>public boolean isEmpty(){</p></p>
    
    
    
        <p><pre><code>return top == null;
    

        <p><p>}</p></p>
    
    
    
        <p><p>}</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>```</p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>3. Merge Sort</h2></p>
    
    
    
        <p><ul>
    

  • n log n guarantee; stable (e.g., keeps sort when sorting on multiple fields/when indexes are equal)
  •     <p><ul>
    

  • algorithm:
  •     <p><pre><code>1. divide array into two halves
    

        <p><pre><code>2. Recrusively sort each half
    

        <p><pre><code>3. merge two halves
    

        <p><pre><code>- e.g.,
    

        <p><pre><code>    - input [M E R G E E  |  E X A M P L E]
    

        <p><pre><code>    - sort left   [E E E G M R] | [E X A M P L E]
    

        <p><pre><code>    - sort right  [E E E G M R] | [A E E L M P X ]
    

        <p><pre><code>    - compare left and right (if = keep left)
    

        <p><pre><code>    - results [A E E E E E G L M M P R X]
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>4. Priority Queues</h2></p>
    
    
    
        <p><p>-Randomized Queue--remove random item</p></p>
    
    
    
        <p><ul>
    

  • Priority queue--remove the largest (or smallest item)
  •     <p><p>-API</p></p>
    
    
    
        <p><pre><code>-MaxPQ
    

        <p><pre><code>-isEmpty()
    

        <p><pre><code>-insert()
    

        <p><pre><code>-deleteMax()
    

        <p><pre><code>-swim()
    

        <p><pre><code>-sink()
    

        <p><pre><code>-less()
    

        <p><pre><code>-exchange()
    

        <p></p>
    
    
    
        <p><p><strong>Binary Heap</strong></p></p>
    
    
    
        <p><ul>
    

  • log N
  •     <p><ul>
    

  • can implement priority queue using heap data structure as an array
  •     <p><pre><code>- i     0 1 2 3 4 5 6 7 8 9 10 11
    

        <p><pre><code>- a[i]  - T S R P N O A E I H G
    

        <p><ul>
    

  • Properties
  •     <p><pre><code>- Largest key is a[1], which is root (e.g., T)
    

        <p><pre><code>- can use array indices to move through tree
    

        <p><pre><code>    - parent of node at k is k/2
    

        <p><pre><code>        - e.g., parents of a[10] = H and a[11]=G is a[5]=N
    

        <p><pre><code>    - children of node at k are at 2k and 2k+1
    

        <p><pre><code>- Promotion: Peter principle: promotoe to incompetence
    

        <p><pre><code>- Demotion: power struggle--better subordinate promoted
    

        <p><pre><code>- Insertion: add node at end, then swim it up
    

        <p><pre><code>- Remove Maximum: exchange root with node at end, then sink it down
    

        <p></p>
    
    
    
        <p><p><strong>Heap Sort</strong></p></p>
    
    
    
        <p><ul>
    

  • n log n; in place
  •     <p><ul>
    

  • Algoirthm
  •     <p><pre><code>- start with array of keys in arbitraty order
    

        <p><pre><code>- create max-heap with all N keys
    

        <p><pre><code>    - e.g., reverse alphabetical, so Z is highest and A is lowest at arr[1]
    

        <p><pre><code>- repeatedly remove the maximum key
    

        <p><ul>
    

  • not used much in practice
  •     <p><pre><code>- inner loop is bigger than merge sort
    

        <p><pre><code>- doesn't have local memory reference (looks at further place away as it moves down tree)
    

        <p><pre><code>- not stable
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>5. Balanced Search Tree</h2></p>
    
    
    
        <p><p><strong>2-3 Tree</strong>--allow 1 or 2 keys per node</p></p>
    
    
    
        <p><ul>
    

  • 2-node: one key, two children
  •     <p><ul>
    

  • 3-node: two keys, three children
  •     <p><ul>
    

  • Symmetric order: inorder traversal yields keys in ascending order
  •     <p><ul>
    

  • Perfect balance: every path from root to null link has same length
  •     <p></p>
    
    
    
        <p><p><strong>Red-Black BST</strong></p></p>
    
    
    
        <p><ul>
    

  • 2 lg N guaranteed
  •     <p></p>
    
    
    
        <p><p><h2>Geometric Applications of BSTs</h2></p></p>
    
    
    
        <p><h3>Line Segment Intersection</h3></p>
    
    
    
        <p><h3>Kd-Trees</h3></p>
    
    
    
        <p><h3>Interval Search Trees</h3></p>
    
    
    
        <p><p>-</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>TODO@@ Week5 part 2 symbol tables</p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>6. HashTable</h2></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Hashing</strong></p></p>
    
    
    
        <p><ul>
    

  • logarithmic (lg n)
  •     <p><ul>
    

  • save items in a key-indexed table
  •     <p><ul>
    

  • hash function -- method for computing array index from key
  •     <p><ul>
    

  • issues:
  •     <p><pre><code>- computing hash function
    

        <p><pre><code>- equality test for keys
    

        <p><pre><code>- collision resolution
    

        <p><ul>
    

  • Classic space-time tradeoff
  •     <p><pre><code>- if space unlimited: trivial hash function w/ key as index
    

        <p><pre><code>- if time unlimited: trivial collision resolution with squential search
    

        <p></p>
    
    
    
        <p><p>Copmuting Hash Function:</p></p>
    
    
    
        <p><ul>
    

  • idealistic goals: scarmble the keys uniformly to produce a table index
  •     <p><pre><code>- efficiently computable
    

        <p><pre><code>- each table index equally likely for each key
    

        <p></p>
    
    
    
        <p><p>Hashing Preferred over red-black tree when:</p></p>
    
    
    
        <p><ul>
    

  • short keys that are easy to compute (hash)
  •     <p><ul>
    

  • no need for order
  •     <p></p>
    
    
    
        <p><p>Red Black tree has better performance in practice</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Seperate Chaining Hash Table</strong></p></p>
    
    
    
        <p><ul>
    

  • lg N
  •     <p><ul>
    

  • average constant run time for a random search miss ()
  •     <p></p>
    
    
    
        <p><p><strong>Linear Probing (Open Addressing)</strong></p></p>
    
    
    
        <p><ul>
    

  • lg N
  •     <p><ul>
    

  • average contstant time for delete
  •     <p><ul>
    

  • just use array instead of linked list
  •     <p><pre><code>- array size (M) must be bigger than number of expected keys (N) (use empty slots in array)
    

        <p><pre><code>- array needs to stay roughly half empty (so must resize array)
    

        <p><ul>
    

  • insert: put in index i if free; if not free, try i+1, i+2 etc
  •     <p><ul>
    

  • search: same thing as insert: search hash value at index i; if not there then try i+1, i+2
  •     <p><pre><code>- if you hit an empty index, then that means then search value is not in the table (b/c if it was in table linear prbing would've hit it)
    

        <p><ul>
    

  • implement delete: find and remove the key-value pair and then reinsert all of the key-value pairs in the same cluster that appear after the deleted key-value pair
  •     <p><pre><code>- if table doesn't get too full, the expected number of key-value paris to reinsert will be a small constant
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>Graphs</h2></p>
    
    
    
        <p><p><strong>Graph</strong>--Set of vetices connected pairwise by edges</p></p>
    
    
    
        <p><p><strong>Path</strong>--sequence of vertices conneted by edges.</p></p>
    
    
    
        <p><p><strong>Cycle</strong>--path whose first and last vertices are the same</p></p>
    
    
    
        <p><p><strong>Connected</strong>--two vertices are connected if there is a path b/w them.</p></p>
    
    
    
        <p></p>
    
    
    
        <p><h3>Graph API</h3></p>
    
    
    
        <p><ul>
    

  • API
  •     <p><pre><code>- `Graph (int V)`
    

        <p><pre><code>- `Graph (In inputStream)`
    

        <p><pre><code>- `void addEdge (int v, int w)`
    

        <p><pre><code>- `Iterable&lt;Integer&gt; adj(int v)`
    

        <p><pre><code>- `int V()`
    

        <p><pre><code>- `int E()`
    

        <p><ul>
    

  • Representations
  •     <p><pre><code>- Array
    

        <p><pre><code>    - index is vertic and value is adjacent vertix
    

        <p><pre><code>- Adjaceny Vertix Graph
    

        <p><pre><code>    - NxN with 0 or 1 at each vertix
    

        <p><pre><code>- Adjacenty list Graph
    

        <p><pre><code>    - for every vertix, maintain a list of vertices adjacent to it
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h3>Depth-First Search</h3></p>
    
    
    
        <p><ul>
    

  • put unvisited vertices in stack
  •     <p><ul>
    

  • in a graph G represetned usin gthe adjacency-lists representation, depth-first search marks all vertices connected to s in time proportional to the sum of the degrees of the vertices in the connected component containing s.
  •     <p></p>
    
    
    
        <p><h3>Breadth-First Search</h3></p>
    
    
    
        <p><ul>
    

  • put unvisited vertices on a queue
  •     <p><ul>
    

  • solves shortest path problem (find path from s to i that uses fewest number of edges)
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h3>Unidrected Graphs</h3>
    

    h3

        <p><ul>
    

  • E + V is the order of growth of the memory used to represent a graph with V vertices and E edges using the adjacent-lists representation
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h3>Directed Graphs</h3></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>Minimum Spanning Trees</h2></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>Shortest Paths</h2></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h1>Sorting</h1></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>Elementary Sorts</h2></p>
    
    
    
        <p><ul>
    

  • 2 categories regarding memory:
  •     <p><pre><code>1. in place sorting algorithms (no extra memory)
    

        <p><pre><code>2. copying algorithms that require extra memory to hold another copy of the array to be sorted
    

        <p></p>
    
    
    
        <p><p><strong>Callbacks</strong>--reference to executable code</p></p>
    
    
    
        <p><ul>
    

  • client passes array of objects to sort() function.
  •     <p><ul>
    

  • the sort() function calls back object's compareTo() method as needed
  •     <p></p>
    
    
    
        <p><p><strong>Selection Sort</strong></p></p>
    
    
    
        <p><ul>
    

  • (1/2)n^2
  •     <p><ul>
    

  • n exchanges; quadratic in best case
  •     <p><ul>
    

  • repeatedly select the smallest remaining item.
  •     <p><ul>
    

  • pseudo:
  •     <p><pre><code>- find smallest item in array and exchange it with the first entry.
    

        <p><pre><code>- Then find the next smalledst item and exchange it with the second entry.
    

        <p><pre><code>- Continue till array is sorted.
    

        <p><ul>
    

  • uses n^2/2 compares and n exchanges to sort an array of length n
  •     <p></p>
    
    
    
        <p><p>```java</p></p>
    
    
    
        <p><p>public static void selectionSort(Comparable[] a){</p></p>
    
    
    
        <p><p>int n = a.length;</p></p>
    
    
    
        <p><p>for (int i =0; i&lt; n; i++){</p></p>
    
    
    
        <p><pre><code>int min = 1;
    

        <p><pre><code>for (int j = i+1; j &lt; n; j++){
    

        <p><pre><code>  if less(a[j], a[min])) min =j;
    

        <p><pre><code>}
    

        <p><pre><code>exch(a, i, min);
    

        <p><pre><code>assert isSorted(a, 0, i)
    

        <p><p>}</p></p>
    
    
    
        <p><p>assert isSorted(a);</p></p>
    
    
    
        <p><p>}</p></p>
    
    
    
        <p><p>```</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Insertion Sort</strong>.</p></p>
    
    
    
        <p><ul>
    

  • (1/2)n^2
  •     <p><ul>
    

  • used for small or partially- sorted array
  •     <p><ul>
    

  • psuedo:
  •     <p><pre><code>- consider each item one at a time
    

        <p><pre><code>- insert item in proper place among those already considered (keeping them sorted)
    

        <p><pre><code>- make space by moving larger item one position to the right
    

        <p><ul>
    

  • n^2/2 compares and N^2/2 exchange
  •     <p></p>
    
    
    
        <p><p>```java</p></p>
    
    
    
        <p><p>public static void insertionSort(Comparable[] a){</p></p>
    
    
    
        <p><p>int n = a.length;</p></p>
    
    
    
        <p><p>for (int i = 1; i &lt; n; i++){</p></p>
    
    
    
        <p><pre><code>for (int j = i; (j &gt; 0 &amp;&amp; less(a[j], a[j-1]); j--) {
    

        <p><pre><code>  exch(a, j, j-1)
    

        <p><pre><code>}
    

        <p><pre><code>assert isSorted(a, 0, i);
    

        <p><p>}</p></p>
    
    
    
        <p><p>assert isSorted(a);</p></p>
    
    
    
        <p><p>}</p></p>
    
    
    
        <p><p>```</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Bubble Sort</strong></p></p>
    
    
    
        <p><ul>
    

  • (1/2)n^2
  •     <p><ul>
    

  • rarley useful (use insertion sort instead)
  •     <p></p>
    
    
    
        <p><p><strong>Shell Sort</strong>.</p></p>
    
    
    
        <p><ul>
    

  • c n^(3/2)
  •     <p><ul>
    

  • tight code; subquadratic
  •     <p><ul>
    

  • extension of insertion sort that gains speed by allowing exchanges of entries that are far apart, to produce partially sorted array that be efficiently sorted, eventually by insertion sort.
  •     <p><ul>
    

  • pseudo:
  •     <p><pre><code>- rearrange the array to give it the property that taking every hth entry (starting anywhere) yields a sorted sequence. Such an array is said to be h-sorted.
    

        <p><ul>
    

  • by h-sorting for some large values of h, we can move entries in the array long distances and thus make it easier to h-sort smaller values of h.
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>```</p></p>
    
    
    
        <p><p>public static void shellSort(Comparable[] a) {</p></p>
    
    
    
        <p><pre><code>    int n = a.length;
    

        <p></p>
    
    
    
        <p><pre><code>    // 3x+1 increment sequence:  1, 4, 13, 40, 121, 364, 1093, ...
    

        <p><pre><code>    int h = 1;
    

        <p><pre><code>    while (h &lt; n/3) h = 3*h + 1;
    

        <p></p>
    
    
    
        <p><pre><code>    while (h &gt;= 1) {
    

        <p><pre><code>        // h-sort the array
    

        <p><pre><code>        for (int i = h; i &lt; n; i++) {
    

        <p><pre><code>            for (int j = i; j &gt;= h &amp;&amp; less(a[j], a[j-h]); j -= h) {
    

        <p><pre><code>                exch(a, j, j-h);
    

        <p><pre><code>            }
    

        <p><pre><code>        }
    

        <p><pre><code>        assert isHsorted(a, h);
    

        <p><pre><code>        h /= 3;
    

        <p><pre><code>    }
    

        <p><pre><code>    assert isSorted(a);
    

        <p><pre><code>}
    

        <p><p>```</p></p>
    
    
    
        <p><h2>QuickSort</h2></p>
    
    
    
        <p><ul>
    

  • (1/2)n^2
  •     <p><ul>
    

  • n log n probabilistic guarantee; fastest in practice
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>MISC NOTES TO ADD</h2></p>
    
    
    
        <p><ul>
    

  • architecture
  •     <p><pre><code>--style, design
    

        <p><pre><code>--standards and consistency
    

        <p><pre><code>--design is layer of granularity below architecture
    

        <p></p>
    
    
    
        <p><ul>
    

  • SOLID
  •     <p><pre><code>- _Single Responsibility Principle_--a class should only have a single responsibility, that is, only changes to one part of the software's specification shoulld be able ot affect the specification of the class.
    

        <p><pre><code>- _Open-closed Principle_--Software entities should be open for extension, but closed for modification
    

        <p><pre><code>- _Liskov Substitution Principle/Design By Contract_--objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program.
    

        <p><pre><code>- _Interface Segregation Principle_--many client specific interfaces are better than one general purpose interaces.
    

        <p><pre><code>- _Dependency Inversion Principle_--one should depend upon abstractions, not concretions.
    

        <p></p>
    
    
    
        <p><p><strong>Types</strong></p></p>
    
    
    
        <p><p>--static (compile time) vs dynamic (run time)</p></p>
    
    
    
        <p><pre><code>--java is hybrid
    

        <p><p>--strong (strict on what operations can do) vs weak (basically unchecked)</p></p>
    
    
    
        <p><ul>
    

  • java is strongly typed because java compiler checks for consistency
  •     <p><p>--functions types defined by parameter and return typw</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>-types of dbs:</p></p>
    
    
    
        <p><pre><code>-db is basically just a container you dumb data into
    

        <p><pre><code>    -low level db is filesystem (everything is a file)
    

        <p><pre><code>-relational (columns x rows; PKs and FKs)
    

        <p><pre><code>-hiearchial (xml; root node)
    

        <p><pre><code>-object (graph)
    

        <p><pre><code>-nosql (json--&gt;combine this with above?)
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>Microservice--</p></p>
    
    
    
        <p><ul>
    

  • single responsiblity (only does one thing)
  •     <p><ul>
    

  • easily understood
  •     <p><ul>
    

  • e.g., queue process for messaging, serving resource (user, article)
  •     <p><ul>
    

  • scalable
  •     <p></p>
    
    
    
        <p><p><strong>Concurrency</strong></p></p>
    
    
    
        <p><ul>
    

  • does more than one thing at same thing
  •     <p><ul>
    

  • multi-tasking
  •     <p><ul>
    

  • multithread
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Edge Computing</strong></p></p>
    
    
    
        <p><ul>
    

  • client machine is geographically close to server
  •     <p><ul>
    

  • 2 pros:
  •     <p><pre><code>1. latency (how fast the request is from a to b)
    

        <p><pre><code>2. bandwidth (how much data can go through at one time)
    

        <p><ul>
    

  • 2 ways:
  •     <p><pre><code>1. independent machines/virtual machines/containers (e.g., AWS)
    

        <p><pre><code>2. Isoprotos (Cloudfare's big idea)
    

        <p><pre><code>    - use less resources
    

        <p></p>
    
    
    
        <p><h1>Data Structures and Programming Techniues</h1></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h1>Javascript</h1></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>React</h2></p>
    
    
    
        <p><ul>
    

  • View Library--used to build complex UIs
  •     <p><ul>
    

  • Components--make complex trees of elements or wrap simple elements
  •     <p><pre><code>- similar (but very different) to web components
    

        <p><ul>
    

    • Comoponents make programing in react declarative
  •     <p><ul>
    

      • rendered/mounted into single node on DOM
  •     <p><ul>
    

      • like writing html in javascript
  •     <p><ul>
    

      • compare with web components being very imperative
  •     <p><ul>
    

  • React manages all state for you
  •     <p><ul>
    

  • feels like managing server side applications but on client side
  •     <p></p>
    
    
    
        <p><h2>Redux</h2></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>Misc Javascript</h2></p>
    
    
    
        <p><p><code>==</code> vs <code>===</code></p></p>
    
    
    
        <p><p><code>==</code> (equality or abstract comparison operator)--converts varaiables values to the same type before performing comparison</p></p>
    
    
    
        <p><p><code>===</code> (identity or strict comparison operator)--does not do type conversion and returns true only if both values and type are identical</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>--apply method</p></p>
    
    
    
        <p><p>--react form (how it works to pass data to backend)</p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h1>Java</h1></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>String, StringBuilder, StringBuffer</strong></p></p>
    
    
    
        <p><ul>
    

  • String--
  •     <p><ul>
    

    • immuatable (so if you change a string, the string object will reference another place in the memory and the previous object will be avialble for the garbage collector)
  •     <p><ul>
    

  • StringBuilder--mutable, non-synchronized (i.e., not thread-safe, so two threads can call the methods of StringBuilder simultaneously)
  •     <p><ul>
    

    • more efficeint than StringBuffer
  •     <p><ul>
    

  • StringBuffer--mutable, synchnoized (i.e., thread-safe, so cannot call the methods of StringBuffer simultaneously)
  •     <p><ul>
    

    • less efficient that StringBuilder
  •     <p></p>
    
    
    
        <p><p><strong><code>==</code> vs <code>.equals</code></strong></p></p>
    
    
    
        <p><ul>
    

  • ==--reference (address) comparison
  •     <p><ul>
    

    • check if both objects point to the same memory location
  •     <p><ul>
    

  • .equals()--content comparison
  •     <p><ul>
    

    • evaluates to the comparison of values in the objects
  •     <p></p>
    
    
    
        <p><p>```java</p></p>
    
    
    
        <p><p>String s1 = new String("Hello");</p></p>
    
    
    
        <p><p>String s2 = new String("Hello");</p></p>
    
    
    
        <p><p>s1==s1 //false b/c different memory location</p></p>
    
    
    
        <p><p>s1.equals(s2) //true b/c same value</p></p>
    
    
    
        <p><p>```</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Final, Finally, Finalize</strong></p></p>
    
    
    
        <p><ul>
    

  • final--
  •     <p><ul>
    

  • finally - code after try clause that is executed no matter what
  •     <p><pre><code>- still exected after `return`
    

        <p><ul>
    

  • finalize--called by garbage collector
  •     <p></p>
    
    
    
        <p><p><strong>Overloading vs Overriding</strong></p></p>
    
    
    
        <p><ul>
    

  • Overloading (different signatures)--two or more methonds in one class have the same method name but different parameters
  •     <p><pre><code>- compile-time
    

        <p><ul>
    

  • Overriding (different implementations)--two methods--one in parent class and other in child class--with the same method name and parameters (i.e., same signature), but child class provides a different implementation from parent class
  •     <p><pre><code>- example of polymorphism
    

        <p><pre><code>- run-time
    

        <p></p>
    
    
    
        <p><p><strong>Generics</strong></p></p>
    
    
    
        <p><ul>
    

  • "type erasure"
  •     <p><ul>
    

  • Generics--enables ADT's to use any type of data
  •     <p><ul>
    

  • e.g., public class Stack<T>
  •     <p><ul>
    

  • avoids multiple implementaitons for many different types of data
  •     <p><ul>
    

  • castings (runtime error=bad) kinda tried to fix this, but client code had to do casting and was buggy (i.e., client code means disocering mistakes at runtime)
  •     <p><pre><code>- generics allows discovering mistakes at _compile_ time
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Lambdas</strong></p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>JEE</h2></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>Spring</h2></p>
    
    
    
        <p><ul>
    

  • Container, Dependency, Inversion of Control
  •     <p><ul>
    

  • Light: no need for application server; not invasive
  •     <p><ul>
    

  • Container: application objects don't have to look for their dependencies; handles objects life cycle
  •     <p><ul>
    

  • Framework: eases integration and communication w/ third party libraries
  •     <p><p><strong>Spring</strong></p></p>
    
    
    
        <p><ul>
    

  • alternative app framework (alternative to J2EE)
  •     <p><ul>
    

  • 3 parts
  •     <p><pre><code>- Dependency Injetion
    

        <p><pre><code>    - i.e., objects referencing each other and setters
    

        <p><pre><code>- Aspect Oriented Programming
    

        <p><pre><code>- Abstract Service Layer/Templates
    

        <p></p>
    
    
    
        <p><p>-Spring intializes <em>beans</em> as it starts up. It looks for various annotations like @Bean, @Service, @Component, and @Repository. It then attempts to create an instance of classes that have these annotations</p></p>
    
    
    
        <p><ul>
    

  • the container running the application takes a look at the class, sees what is annotated and autowired and creates those and passes them to objects before creating the object
  •     <p></p>
    
    
    
        <p><p><strong>Bean</strong></p></p>
    
    
    
        <p><ul>
    

  • Define a bean: @Bean
  •     <p><ul>
    

  • Define a bean: @Component
  •     <p><pre><code>- @Component Specializations
    

        <p><pre><code>    - @Service--heart of an app
    

        <p><pre><code>    - @Repository--handles data persistence
    

        <p><pre><code>    - @Controller--handles requests and responses
    

        <p></p>
    
    
    
        <p><p><strong>Bean Scopes</strong></p></p>
    
    
    
        <p><ul>
    

  • Singleton--(default)--single bean instance
  •     <p><ul>
    

  • prototype--a bean instance per usage
  •     <p><ul>
    

  • request--a bean instance per HTTP request
  •     <p><ul>
    

  • session-- a bean instance per HTTP session
  •     <p><ul>
    

  • application--a bean instance per Servlet Context
  •     <p><ul>
    

  • websocket--a bean instance per websocket
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Inversion of Control</strong></p></p>
    
    
    
        <p><ul>
    

  • control of objects or portions of a program is transferred to a container of framework
  •     <p><ul>
    

    • instead of objects specifically reaching out and obtaining the dependencies they want, they are injected by an exteral process (often based on configuration or other means); thus, an object can define its dependencies, and something else can work to fulfill them based on what is available
  •     <p><ul>
    

      • Autowring is an extra layer on top of this that allows the container's facilities to look at what a particular service needs/wants, and figures out automatically what to give it from what is available based on type hinting and other cues.
  •     <p><ul>
    

        • this reduces the manual configuration burden, and allows you to stand up various services very quickly by simply type hinting what you need and relying on the fact that container will set it up with all of that
  •     <p><ul>
    

  • enables a framework to take control of the flow of a program and make calls to our custom code
  •     <p><ul>
    

        • autowiring not always possible--sometimes the requirement is too ambigous for the container to fulfill automatically.
  •     <p><pre><code>- contrast w/ traditional programming where custom code makes  calls to a library
    

        <p><ul>
    

  • advantages:
  •     <p><pre><code>- decoupling the execution of a task from its implementation
    

        <p><pre><code>- easier to switch b/w different implementations
    

        <p><pre><code>- greater modularity of a program
    

        <p><pre><code>- easier testing by isolating a component or mocking its dependcies and allowing components to communicate through contracts
    

        <p><pre><code>- achieved many different ways:
    

        <p><pre><code>    - dependecy injection, factory pattern, strategy design pattern, service locator pattern
    

        <p></p>
    
    
    
        <p><p><strong>Dependency Injection</strong></p></p>
    
    
    
        <p><ul>
    

  • pattern through which to implement Inversion of Control, where the control being inverted is the setting object's dependencies
  •     <p><pre><code>- act of connecting objects w/ other objects, or "injecting" objects into other objects, is done by an assembler rather than by the objects themselves.
    

        <p><ul>
    

  • Spring container "injects" objects into other objects or "dependencies"
  •     <p><pre><code>- allows for loose coupling of componenets and moves the responsibility of manageing components onto the container
    

        <p><ul>
    

  • @Autowired
  •     <p><pre><code>- class field
    

        <p><pre><code>- setter
    

        <p><pre><code>- constructor (preferred for easy testing)
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Spring IoC Container</strong></p></p>
    
    
    
        <p><ul>
    

  • represented by the interface ApplicationContext
  •     <p><ul>
    

  • responsible for instantiating, configuring, and assembling objets known as beans, as well as managing their lifecycle
  •     <p><ul>
    

  • Spring framework provides several implementations of the Application Context interface
  •     <p><pre><code>- `ClassPathXmlApplicationContext` and `FileSystemXmlApplicationContext` for standalone applications, and `WebApplicationContext` for web applications.
    

        <p><ul>
    

  • to assemble beans, the container uses configruation metadata, which can be in the form of XML configuraitons or annotations.
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Aspect Oriented Programming</strong></p></p>
    
    
    
        <p><ul>
    

  • modularizing cross-cutting concerns
  •     <p><ul>
    

  • "cross-cutting concerns"
  •     <p><ul>
    

  • "advice"
  •     <p><ul>
    

  • necessary because OOP cannot solve cross-cutting concerns well, so AOP allows functional solutions
  •     <p></p>
    
    
    
        <p><h3>Spring Core</h3></p>
    
    
    
        <p></p>
    
    
    
        <p><h3>Spring Boot</h3></p>
    
    
    
        <p><ul>
    

  • opiniated library used to create Spring applications
  •     <p><ul>
    

  • convention over configuraiton--design paradigm that attempts to decrease the number of decisions that a devceloper using the framework is required to mae w/out losing flexibiility.
  •     <p><pre><code>-i.e., developer only needs to specify unconventional aspets of the application (e.g., "sales" in model called "sales" in database by default)
    

        <p><ul>
    

  • Autoconfiguration
  •     <p></p>
    
    
    
        <p><h3>Hibernate</h3></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Eager/Lazy Loading (design patterns)</strong></p></p>
    
    
    
        <p><ul>
    

  • eager: data initialization occurs on the spot
  •     <p><pre><code>- loads up associated data and stores everything in memory
    

        <p><ul>
    

  • lazy: defer intialization of the object for as long as possible
  •     <p><pre><code>- data not intialized or loaded into memory until an explicit call is made to it
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h1>Machine Learning</h1></p>
    
    
    
        <p><p>First Term</p></p>
    
    
    
        <p><p>: This is the definition of the first term.</p></p>
    
    
    
        <p></p>
    
    
    
        <p><h3>What Algorithm to Use</h3></p>
    
    
    
        <p><h4>Classify Images (Answers qustions like "What does this image represent?")</h4></p>
    
    
    
        <p><pre><code>- Dense Net
    

        <p></p>
    
    
    
        <p><h4>Two-Class Classification: Predict between Two Categories</h4></p>
    
    
    
        <p><pre><code>- (answers simple two-choice questions like yes or no, true or false)
    

        <p><pre><code>- **Two-class Support Vector Machines** (under 100 features linear model)
    

        <p><pre><code>- **Two-class Averaged Perception** (fast training, linear model)
    

        <p><pre><code>- **Two-class Decision Forest** (accurate, fast training)
    

        <p><pre><code>- **Two-class Logistic Regression** (fast training, linear model)
    

        <p><pre><code>- **Two-class Boosted Decision Tree** (accurate, fast training, large memory footprint)
    

        <p><pre><code>- **Two-class Neual Network** (accurate, long training time)
    

        <p></p>
    
    
    
        <p><ul>
    

  • Multi-Class Classification: Predict between Several Categories
  •     <p><ul>
    

  • Extract Information from Text
  •     <p><ul>
    

  • Find Unusual Occurences
  •     <p><ul>
    

  • Discover Structure
  •     <p><ul>
    

  • Predict Values
  •     <p><ul>
    

  • Generate Recommendations
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p>|What do you want to do?| Extract Information from Text (Text Analytics) (derives information from text; answers "What info is in text?") |  Regression (Predict Continuous Values; makes forecast by estimating the relationship b/w values; answers "How Much or How Many?") |  Two-Class Classification (2 possible answers; e.g., yes/no, true/false) | Multi-Class Classification (Predict b/w several categories; answers questions w/ multiple possible answers) | Image Classification (classify images with popular networks; answers question "what does this image represent?") | Recommenders (generate recommendations) (predicts what someone will be interested in) | Clustering (discover structure) (separate similar data points into intuitive groups; answers "How is this organized?") | Anomaly Detection (find unusual occurences) (identifies and predicts rare or unusual data points; answers "is this weird?" |</p></p>
    
    
    
        <p><p>|---|---|---|---|---|---|---|---|---|</p></p>
    
    
    
        <p><p>| <strong>Algorithm</strong> | Extract N-Gram Features from Text (create a dictionary of n-gram features from a column of free text) | Fast Forest Quantile Regression (Predicts distribution) | Two-Class Support Vector Machine (under 100 features, linear model) | Multiclass Logistic Regression (fast training, linear) | DenseNet (high accuracy, better efficeincy) | SVD Recommender (collarborative filtering, better performance w/ lower cost by reducing dimensionality) | K-Means (unsupervised learning) | One Class SVM (under 100 features, aggressive boundary) |</p></p>
    
    
    
        <p><p>| <strong>Algorithm</strong>| Feature Hashing (Converts text data to interger encoded features using the Vowpall Wabbit library  | Poisson Regression (predicts event counts)  | Two-Class Averaged Perceptron (fast training, linear model) | Multiclass Neural Network (accuracy, long training times)  |   |   |   | PCA-Based Anonmaly Detection (fast training times)  |</p></p>
    
    
    
        <p><p>| <strong>Algorithm</strong>|  Preprocess Text (performas a cleaning operations on text, llike removal of stop -words, case normalization | Linear Regression (fast training, linear model) |  Two-Class Decision Forest (accurate, fast training) | Multiclass Decision Forest (accuracy, fast training) | |   |   |   |</p></p>
    
    
    
        <p><p>| <strong>Algorithm</strong>| Word2Vector (converts words to values for use in NLP tasks, like recommender, named entity recongintion , machine translation | Bayesian Linear Regression (Linear Model, small data sets) |  Two-Class Logistic Regression (fast training, linear)  | One-vs-All Multiclass (depends on the two-class classifier used)  |   |   |   | |</p></p>
    
    
    
        <p><p>| <strong>Algorithm</strong>|  | Decision Forest Regression (Accurate, fast training times) | Two-Class Boosted Decision Tree (accurate, fast training, large memory footprint) | Multiclass boosted Decision Tree (non-perametric, fast, scalable)  |   |   | |  |</p></p>
    
    
    
        <p><p>| <strong>Algorithm</strong> |  | Neural Network Regression (accurate, long training times) |  Two-Class Neural Network (accurate,long training)  |   |   |   |   |   |</p></p>
    
    
    
        <p><p>| <strong>Algorithm</strong> |  | Boosted Decision Tree Regression (accurate, fast training, large memory footprint)  |   |   |   |   |   |   |</p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>ML Interview Questions</h2></p>
    
    
    
        <p></p>
    
    
    
        <p><h1>Patterns</h1></p>
    
    
    
        <p><p>-A pattern is a problem in a specific context with a well-settled/well-known solution.</p></p>
    
    
    
        <p><pre><code>-A pattern is basically documentation--i.e., solution/schema to solving a problem
    

        <p><p>-Patterns are everywhere: software, hardware, research, writing, architecture, relationships etc.</p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>Types of Patterns</h2></p>
    
    
    
        <p><p><strong>Singleton</strong></p></p>
    
    
    
        <p><p><strong>Factory</strong></p></p>
    
    
    
        <p><p><strong>Proxy</strong></p></p>
    
    
    
        <p><p><em>Convention over Configuration</em></p></p>
    
    
    
        <p><p><em>Explicit over Implicit</em></p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>REST REpresentational State Transfer</h2></p>
    
    
    
        <p><p><strong>REST API</strong>--collections  of individually-addressable resources (nouns) manipulated using a small number of methods (verbs)</p></p>
    
    
    
        <p></p>
    
    
    
        <p><h3>Resource Oriented Design</h3></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>REST API</strong>--collections  of individually-addressable resources (nouns) manipulated using a small number of methods (verbs)</p></p>
    
    
    
        <p><ul>
    

  • compare with RPC APIs, which are designed in terms of interfaces and methods (eventually adds up and becomes overwhelming)
  •     <p><ul>
    

  • resource names naturally map to URLs and methods map to HTTP methods POST, GET, PUT, PATCH, and DELETE
  •     <p><pre><code>- results in fewer things to learn, as developers can focus on the resources and their relationship
    

        <p><ul>
    

  • REST is stateless--the server has no state (or session data)
  •     <p></p>
    
    
    
        <p><p><strong>Standard API Methods</strong></p></p>
    
    
    
        <p><ul>
    

  • List
  •     <p><ul>
    

  • Get--request a resource
  •     <p><ul>
    

  • should not contain request body
  •     <p><ul>
    

  • Create
  •     <p><ul>
    

  • Update
  •     <p><ul>
    

  • Delete
  •     <p><ul>
    

  • custom methods are also available for functionality that doesn't easily map to one of the standard methods (e.g., database transactions)
  •     <p></p>
    
    
    
        <p><p><strong>Resources</strong></p></p>
    
    
    
        <p><ul>
    

  • resource-oriented API modeled as a resource hierarchy, where each node is either a simple resource or a collection resource.
  •     <p><pre><code>- _collection_--contains a list of resources of the same type (e.g., user has a collection of contacts)
    

        <p><pre><code>- _resource_--has some state and zero or more sub-resources (each subresources can be either a simple resource or a collection resource).
    

        <p><ul>
    

  • note: while there is some conceptual alighment b/w storage sytems and REST APIs, a service with a resource-oriented API is not necessarily a database
  •     <p></p>
    
    
    
        <p><p><strong>Methods</strong></p></p>
    
    
    
        <p><ul>
    

  • key characteristic of resource-oriented API is that it emphasizes resources (data model) over the methods performed on teh resources (functionality).
  •     <p><ul>
    

  • typical API exposes a large number of resources w/ small number of mehtds
  •     <p><ul>
    

  • e.g.,
  •     <p><ul>
    

  • API service: gmail.googleapis.com
  •     <p><pre><code>A collection of users: `users/*`. Each user has the following resources.
    

        <p><pre><code>    A collection of messages: `users/*/messages/*`.
    

        <p><pre><code>    A collection of threads: `users/*/threads/*`.
    

        <p><pre><code>    A collection of labels: `users/*/labels/*`.
    

        <p><pre><code>    A collection of change history: `users/*/history/*`.
    

        <p><pre><code>    A resource representing the user profile: `users/*/profile`.
    

        <p><pre><code>    A resource representing user settings: `users/*/settings`.
    

        <p></p>
    
    
    
        <p><pre><code>- API service: `spanner.googleapis.com`
    

        <p><pre><code>A collection of instances: `projects/*/instances/*`.
    

        <p><pre><code>    A collection of instance operations: `projects/*/instances/*/operations/*`.
    

        <p><pre><code>    A collection of databases: `projects/*/instances/*/databases/*`.
    

        <p><pre><code>    A collection of database operations: `projects/*/instances/*/databases/*/operations/*`.
    

        <p><pre><code>    A collection of database sessions: `projects/*/instances/*/databases/*/sessions/*`.
    

        <p></p>
    
    
    
        <p><p><strong>HTTP REST Methods</strong></p></p>
    
    
    
        <p><ul>
    

  • GET--request a resource.
  •     <p><ul>
    

  • POST--creates and returns (or updates) new/modified resource
  •     <p><ul>
    

    • not idempotent--invoking POST multiple times creates multiple new resources
  •     <p><ul>
    

  • PUT--creates/update resource at particluar URI
  •     <p><ul>
    

    • idempotent--invoking it multiple time will not affect resources
  •     <p><ul>
    

  • DELETE--removes the resource
  •     <p><ul>
    

  • OPTIONS--indicates what techniques are supported
  •     <p><ul>
    

  • HEAD-- about the request URL it reutnrs meta informaiton.
  •     <p></p>
    
    
    
        <p><h1>Databases</h1></p>
    
    
    
        <p><p><strong>Database</strong></p></p>
    
    
    
        <p><p>: logically modled clusters of information</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Database Management system (DBMS)</strong></p></p>
    
    
    
        <p><p>: a computer program that interacts with a database.</p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Database Management system (RDBMS)</strong></p></p>
    
    
    
        <p><p>: a DMBS that employs the relational data model.</p></p>
    
    
    
        <p><pre><code>- data are organized into tables/relations--a set of tuples/rows in a tables, with each tuple sharing a set of attribuites/columns
    

        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>SQL</h2></p>
    
    
    
        <p><p><strong>(Inner) Join</strong>: returns records that have matching values in both tables.</p></p>
    
    
    
        <p><p><code>SELECT column_name FROM table1 INNERJOIN table2 on table1.column_name = table2.column_name;</code></p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Left (Outer Join)</strong>: returns all recrods from the left table andmatching records from the righ table.</p></p>
    
    
    
        <p><p><code>SELECT column_name FROM table1 LEFT JOIN table2 ON table1.column_name = table2.column_name;</code></p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Right (Outer) Join</strong>: returns all records from the right table and matching records from the left table.</p></p>
    
    
    
        <p><p><code>SELECT column_name FROM table1 RIGHT JOIN table2 ON table1.column_name = table2.column_name;</code></p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Full (Outer) Join</strong>: returns all records when there is a match in either left or right table.</p></p>
    
    
    
        <p><p><code>SELECT column_name FROM table1 FULL OUTER JOIN table2 ON table1.column_name = table2.column_name WHERE condition;</code></p></p>
    
    
    
        <p></p>
    
    
    
        <p><h1>Glossary</h1></p>
    
    
    
        <p></p>
    
    
    
        <p><h2>Terms</h2></p>
    
    
    
        <p><p><strong>accuracy</strong></p></p>
    
    
    
        <p><p>: the fraction of predictions that a classification model got right</p></p>
    
    
    
        <p><ul>
    

  • multiclass: accuracy = correct predictions/total number of examples
  •     <p><ul>
    

  • binary: TP + TN / total number of examples
  •     <p></p>
    
    
    
        <p><p><strong>activation function</strong></p></p>
    
    
    
        <p><p>: a function (e.g., ReLU or sigmoid) that takes in the weighted sum of all inputs from the previous layer and then generates and passes an output value (typically nonlinear) to the next layer</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>feature/attribute</strong></p></p>
    
    
    
        <p><p>:</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Area Under the ROC Curve (AUC)</strong></p></p>
    
    
    
        <p><p>: an evaluation metrics that consiers all possible classification thresholds</p></p>
    
    
    
        <p><ul>
    

  • the Area under the ROC Curve is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example
  •     <p></p>
    
    
    
        <p><p><strong>backpropagation</strong></p></p>
    
    
    
        <p><p>: the primary algorithm for performing gradient descent on neural networks. First, the output values of each mode are calculated (and cached) in a forward pass. Then, the partial derivative of the error with respect to each parameter is caluclated in a backward pass through the graph</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>batch</strong></p></p>
    
    
    
        <p><p>: set of examples used in one iteration (i.e., one gradient update) of model training</p></p>
    
    
    
        <p><ul>
    

  • see also batch size
  •     <p></p>
    
    
    
        <p><p><strong>batch normalization</strong></p></p>
    
    
    
        <p><p>: normalizing the input/output of the activation functions in a hidden layer. Batch normalizations can prvoide the following benefits.</p></p>
    
    
    
        <p><ul>
    

  • make neural networks more stable by protecting against outlier weights
  •     <p><ul>
    

  • enable higher learning rates
  •     <p><ul>
    

  • reduce overfitting
  •     <p><p>-</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Bayesian Neural Network</strong></p></p>
    
    
    
        <p><p>: A probabilistic neural network that accounts for uncertainty in weights and outputs.</p></p>
    
    
    
        <p><ul>
    

  • A standard neural network regression model typically predicts a scalar value--e.g., a model predicts a house price of 853,000.
  •     <p><ul>
    

  • On the other hand, a Bayesian neural network predicts a distribution of values--e.g., a model predicts a house price of 853,000 with a standard deviation of 67,200. A bayesian neural network can be useful when it is important to quantify uncertainty, such as in models related to pharmaceuticals.
  •     <p><ul>
    

  • Bayesian neural networks also help prevent overfitting.
  •     <p></p>
    
    
    
        <p><p><strong>bias (ethics/fairness)</strong></p></p>
    
    
    
        <p><p>: 1. Sterotypying, prejudice, or favortism towrards people/places/things.</p></p>
    
    
    
        <p><p>: 2. Systematic error introducted by a sampling or reporting procedure.</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>bias (math)</strong></p></p>
    
    
    
        <p><p>: an intercept or offset from an origin. Bias (aka as bias term) is referred to as <em>b</em> or <em>w0</em></p></p>
    
    
    
        <p><ul>
    

  • not to be confused with bias in ethics or prediction bias
  •     <p></p>
    
    
    
        <p><p><strong>binary classification</strong></p></p>
    
    
    
        <p><p>: classification task outputs on of two mutually exclusive classes</p></p>
    
    
    
        <p><ul>
    

  • e.g., ml model that evaluates email messages and outputs either "spam" or "not spam"
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>categorical/discrete data</strong></p></p>
    
    
    
        <p><p>: features having discrete possible values</p></p>
    
    
    
        <p><ul>
    

  • e.g., house_style: Tudor, ranch, colonial
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>class</strong></p></p>
    
    
    
        <p><p>: one of a set of enumerated target values for a label</p></p>
    
    
    
        <p><ul>
    

  • e.g., spam or not spam; poodle, beagle, pug, pitbull
  •     <p></p>
    
    
    
        <p><p><strong>classification model</strong></p></p>
    
    
    
        <p><p>: a type of machine learning model for distinguishing among two or more discrete classes.</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>clustering</strong></p></p>
    
    
    
        <p><p>: grouping related examples, particualrly during unsupervised learning</p></p>
    
    
    
        <p><ul>
    

  • e.g., k-means
  •     <p></p>
    
    
    
        <p><p><strong>collaborative filtering</strong></p></p>
    
    
    
        <p><p>: making predictions about the interests of one user based on the interestes of many other users.</p></p>
    
    
    
        <p><ul>
    

  • often used in recommendation systems
  •     <p></p>
    
    
    
        <p><p><strong>confusion matrix</strong></p></p>
    
    
    
        <p><p>: a NxN table that summarizes how successful a classification model's predictions were; that is, the correlation between the label and the model's classification</p></p>
    
    
    
        <p><ul>
    

  • one axis if the label that the model predicted and the other axis is the actual label. N represents the number of classes.
  •     <p><pre><code>- e.g., in binary classification problem, N=2
    

        <p></p>
    
    
    
        <p><p><strong>continuous feature</strong></p></p>
    
    
    
        <p><p>: floating-point feature with an infinite range of possible values.</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>convergence</strong></p></p>
    
    
    
        <p><p>: informally, often refers to a state rached during taining in which training loss and validation loss change very little or not at all with each iteration after a certain number of iterations.</p></p>
    
    
    
        <p><ul>
    

  • i.e., a model reaches convergence when additional training on the current data will not improve the model.
  •     <p><ul>
    

  • note, in DL, loss values sometimes stay constant or nearly so for many iterations before finally descending, temporarily producting a false sense of convergence.
  •     <p></p>
    
    
    
        <p><p><strong>convultional neural network (CNN)</strong></p></p>
    
    
    
        <p><p>: a neural network in which at least one layer is a convolutional layer. A typical convolutional neural network consists of some combination of the following layers:</p></p>
    
    
    
        <p><ul>
    

  • convulational layers
  •     <p><ul>
    

  • pooling layers
  •     <p><ul>
    

  • dense layera
  •     <p><p>: CNNs great at image recognition</p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>convultional operation</strong></p></p>
    
    
    
        <p><p>: two step math operation</p></p>
    
    
    
        <p><ol>
    

  • element-wise multiplication of the convolutional filter and a slice of an input matrix. (The slice of the input matrix has the same rank and size as the convolutional fitler)
  •     <p><ol>
    

  • summarion of all the values in the resulting product matrix.
  •     <p><p>: a convultional layer consists of a series of convultional operations, each acting on a different slice of the input matrix</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>cost</strong></p></p>
    
    
    
        <p><p>:  synonum for loss</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>cross-entropy</strong></p></p>
    
    
    
        <p><p>: a generalization of Log Loss to multi-class classification problems. Cross-entropy quantifies the difference between two probability distributions.</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>DataFrame</strong></p></p>
    
    
    
        <p><p>: popular datatype for representing datasets in pandas.</p></p>
    
    
    
        <p><ul>
    

  • analogous to a table
  •     <p></p>
    
    
    
        <p><p><strong>decision tree</strong></p></p>
    
    
    
        <p><p>: a model represented as a sequence of branching statements.</p></p>
    
    
    
        <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>deep model/deep neural network</strong></p></p>
    
    
    
        <p><p>: neural network containing multiple hidden layers</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>discrete feature</strong></p></p>
    
    
    
        <p><p>: a feature with a finite set of possible values</p></p>
    
    
    
        <p><ul>
    

  • e.g., animal, vegetable, mineral
  •     <p></p>
    
    
    
        <p><p><strong>discriminative model</strong></p></p>
    
    
    
        <p><p>: model that predicts labels from a set of one or more features. More formally, discriminative models define the conditional probability of an output given the features and weights; that is p(output | features, weights)</p></p>
    
    
    
        <p><ul>
    

  • mod supervised learning models are discriminative
  •     <p><ul>
    

  • e.g., model that predicts whether an email is spam
  •     <p></p>
    
    
    
        <p><p><strong>embeddings</strong></p></p>
    
    
    
        <p><p>: categorical feature represented as a continuous-valued feature. Typically, an embedding is a translation of a high-dimensional vector into a low-dimensional space. For exmaple, you can represent the words in an English sentence in either of the following two ways:</p></p>
    
    
    
        <p><ul>
    

  • Sparse Vector--millions of elements where all elements are integers.
  •     <p><pre><code>- each cell in the vector represents a separate English word; the value in a cell represents the number of times taht th eword appears in a sentence
    

        <p><pre><code>    - b/c a setenence contains ~50 words, nearly all cells are 0. the few that aren't 0, usually have a low interger like 1
    

        <p><ul>
    

  • Dense Vector--several hundred-element vector where each element holds a floating-point value b/w 0 and 1
  •     <p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>epoch</strong></p></p>
    
    
    
        <p><p>: a full training pass over the entire dataset such that each example has been seen once. Thus, an epoch represetns N/batch size training iterations, where N is the total number of examples.</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>accuracy</strong></p></p>
    
    
    
        <p><p>:</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>accuracy</strong></p></p>
    
    
    
        <p><p>:</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>accuracy</strong></p></p>
    
    
    
        <p><p>:</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>accuracy</strong></p></p>
    
    
    
        <p><p>:</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>accuracy</strong></p></p>
    
    
    
        <p><p>:</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>accuracy</strong></p></p>
    
    
    
        <p><p>:</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>accuracy</strong></p></p>
    
    
    
        <p><p>:</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Cross Validation</strong></p></p>
    
    
    
        <p><p>:   Cross-validation is a technique to evaluate predictive  models by partioning the original sample into a training set to train the model, and a validation set to evaluate it. (e.g., a k-fold cross validation divides the data into k folds (or paritions), trains on each k-1 fold, and evaluate on the remaining 1 fold. This results to k models/evaluations, which can be averaged to get a overall model performance).</p></p>
    
    
    
        <p></p>
    
    
    
        <p><p><strong>Feature Importance</strong></p></p>
    
    
    
        <p><p>In linear models, feature importance can be calculated by the scale of coefficients. In tree-based methods (such as random forest), important features are likely to appear closer to the root of the tree. We can get a feature's importance for random forest by computing the averaging depth at which it appears across all trees in the forest.</p></p>
    
    
    
        <p></p>
    
    
    
    </div>
    

    Previous PostPost with a slider and lightbox
    Next PostPost with YouTube Video
    Comments (2)
    John Doe
    Posted at 15:32h, 06 December Reply

    Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

    John Doe
    Posted at 15:32h, 06 December Reply

    It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal

    John Doe
    Posted at 15:32h, 06 December Reply

    There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure there isn't anything embarrassing hidden in the middle of text.

    John Doe
    Posted at 15:32h, 06 December Reply

    The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested. Sections 1.10.32 and 1.10.33 from "de Finibus Bonorum et Malorum" by Cicero are also reproduced in their exact original form, accompanied by English versions from the 1914 translation by H. Rackham.

    Leave a Comment