Skip to content

Jane Street Tech Talk @ Tsinghua

Jane Street is a quantitative trading firm with a unique focus on technology. There will be a tech talk in Tsinghua University this Thursday.

Time: 10月30日下午3:00-5:00
Location: 清华东主楼10区103会议室

We hire people over all grades, from freshman to post-doc, from full-time to summer and winter internship. Check out our website to know more about our company and our culture, https://www.janestreet.com/culture/inside-a-day/

Below is the information about the tech talk.

Lambda, the Ultimate Config Format
by Tianyi Cui (Software Developer, Jane Street)

Complicated systems require expressive configuration languages. But language design is hard: It’s no surprise that many applications have either limited configurability or an unwieldy configuration format with complex semantics.

At Jane Street, we have seen this problem enough times that we decided to start writing our configs the same way that we write our code, in OCaml. In this talk, we’ll discuss our experiences using ocaml-plugin[1], a library we developed to embed OCaml within an application, providing a configuration language that is both expressive and familiar.

We’ll also discuss some of the potential problems of using a Turing-complete language for configuration, as well as how to capturesome of the benefits of a simpler and more constrained configuration system without giving up the power of a programming language.

[1] https://github.com/janestreet/ocaml_plugin

二十四岁的随想

今天下班回家时,故意放慢了脚步,走入干诺道行人天桥,踏上半山自动扶梯,密闭的耳塞将我和苏豪区一间间酒吧隔开,耳边响起的是《合唱》的第五乐章。我情不自禁,手舞足蹈……

前天的早上发现 iTunes 里能找到我喜欢过的几乎所有古典CD,就心情大好的买了十几张巴赫、莫扎特、贝多芬、肖邦……整个周末一张一张的换着放,不是戴着耳机就是开着音响,不亦乐乎,不管是在喧嚣的大街还是静静的夜里。很早以前就有过这样的想法,喜欢过的一切书籍、音乐、影视、软件,哪怕之后已经用不到了,也应该在自己能力允许的时候,买上一份正版。现在就从喜欢听的古典CD开始吧。

昨天去看房,打算在香港租一个住上较长时间的房子。先是看了看上环附近,然后又回到住的半山这边看,一共看了十四间房子,看得心情大好。之前一直各种各样奇怪的理由抗拒去看房、选房这件事,一直拖到把现在住的这个贵的离谱的酒店式公寓续租了第二个月,觉得再不能这样糟蹋钱了,才跑出去找中介。结果发现,自己住的家里楼下的中介服务就很赞,目前给我的感觉相当professional,看到了好几间价格和各方面条件都满意的房子,最迟这周末就会定下来。

7月1日入职,开始在 Jane Street 正式上班已经第八个星期了。目前为止,我非常享受。每天都在用着很是趁手的工具,以感到舒服的方式,和各自都超厉害的同事一起,做着对我很有挑战性的事情。不知不觉,英语变得稍微不那么磕磕绊绊了,写代码时不用把大部分的时间花在查文档上了,开始知道一点我们公司在交易的那些金融产品大概是怎么一回事了,虽然不大但却对整个公司都挺重要的小项目可以独立承担了,还有开始review实习生的工作以及担任电面面试官的角色。不过,我想要提高和完全可以提高的地方仍然有很多,我对自己这一个多月以来的工作的质与量仍然不满意。且不说公司对我的要求和同事对我的期待如何,我非常清楚,自己在工作上还远未达到自己可以达到的最佳状态。

在23岁的尾巴上,我开始恋爱了。既然大家都说秀恩爱死得快,这次我不打算透露任何细节。不过跟我熟的朋友完全可以找我八卦。我希望,自己能够不那么不成熟,能够学会关心、倾听和理解,能够用心去照顾一个人,同时了解自己的内心,能够学会爱。

说来挺有意思,我上学的地方名叫“浙大”,然后出了校门以后,上班的那座大厦和大厦楼下的街道都名为“遮打”,我还是在上班好几个星期之后才发现这个有趣的谐音。每天出入的还是一个叫“Zhe Da”的地方,就好像一切都没变,但真的一切都变了。

Programming Collective Intelligence 读书总结

  • Making Recommendations (Collaborative Filtering)
    • User-based
      • Finding similar users
        • User as vector based on item score
          • Euclidean distance
          • Pearson correlation
        • Reverse users and items, we can find similar items to a given item
      • Sort and recommend items based on
        • sum(user similarity * user’s item score) for each other user
    • Item-based
      • Find item similarities
        • These results can be cached and periodically updated
      • Sort and recommend items based on
        • sum((item similarity * user’s item score) / sum(item similarity)) for each user’s item
      • Significantly faster and better for sparse dataset
  • Discovering Groups (Clustering)
    • Supervised Learning
      • use example inputs and outputs
      • neural networks, decision trees, support-vector machines, and Bayesian filtering
    • Word Vectors of texts
    • Hierarchical Clustering
      • choose two nearest vectors to combine
      • results in binary tree
    • Can cluster articles or words
      • transpose the matrix
    • Dendrogram drawing
    • K-Means clustering
      • randomly place k centroids
      • assign every item to the nearest centroid, and move the centroid to the average location of all items assigned to them
  • Searching and Ranking
    • word index stored in relational database
    • ranking
      • content-based
        • various metrics: word frequency, document location, word distance
      • use inbound links
        • simple count
        • PageRank algorithm
          • random walk
          • sparse matrix multiplication iterations
        • use link text
      • learning from clicks
        • click-tracking neuro-network (multilayer perception network, i.e. MLP network)
          • one hidden layer
  • Optimization
    • stochastic optimization
      • numerical solution
      • cost function
    • random searching
    • hill climbing
      • increase the most promising dimension of a vector
    • simulated annealing
      • variable: temperature, starts very high and gradually gets lower
      • worse solution being accepted depending on temperature
    • generic algorithms
      • mutate, crossover, …
  • Document Filtering (to be expanded…)
    • use words as features
    • naive Bayesian classifier
    • the Fisher method
  • Modeling with Decision Trees
    • Algorithm: CART (Classification and Regression Trees)
      • choose the best split from all possible splits
        • Gini impurity
        • information entropy
          • sum of p(x)log(p(x))
      • recursively build the whole tree
      • then can be used to classify new observations
      • pruning the tree
        • when it becomes overfitted
        • checking pairs of nodes that have a common parent to see if merging them would increase the entropy by less than a specified threshold
    • Dealing with
      • missing data
        • use both branches
      • numerical outcomes
        • use variance instead of entropy
  • Building Price Models
    • k-nearest neighbors (kNN)
      • weighted
      • may need scaling or normalizing
      • to estimate the probability density
    • cross-validation
      • divide data into training sets and test sets
  • Advanced Classification: Kernel Methods and SVMs
    • basic linear classification
      • using dot-products to determine distance
    • kernel methods
      • define another dot-product == move the points into different space
    • support-vector machines
      • find the line that is as far away as possible from classes
  • Finding Independent Features
    • non-negative matrix factorization
      • factor the article-word matrix into two matrix
        • the features matrix: row for features, column for words
        • the weight matrix: row for articles, column for features
  • Evolving Intelligence
    • creating an algorithm that creating algorithms
    • mutation, crossover/breeding
    • use trees to represent algorithm to enable evolving
      • use to guess numerical functions or, game AI
  • Algorithm Summary
    • Supervised Learning
      • Bayesian Classifier
      • Decision Tree Classifier
      • Neural Networks
      • Support-Vector Machines
    • Unsupervised Learning
      • k-Nearest Neighbors
      • Clustering
      • Multidimensional Scaling
      • Non-Negative Matrix Factorization
    • Optimization

《背包问题九讲》2.0 RC1

下载地址在 http://love-oriented.com/pack/pack2rc.pdf

LLVM笔记(3):LLVM的语言(下)

(本文是 http://llvm.org/releases/2.9/docs/LangRef.html 的阅读笔记,前作为《LLVM笔记(2):LLVM的语言(中)》。)

  • Instruction Reference
    • terminator instruction
      • indicates which block should be executed after the current block is finished
      • yields ‘void’ value,
      • ret (return)
        • ret <type> <value>
        • ret void
      • branch
        • br i1 <cond>, label <iftrue>, label <iffalse>
        • br label <dest> ; Unconditional branch
        • switch <intty> <value>, label <defaultdest> [ <intty> <val>, label <dest> ... ]
        • indirectbr <somety>* <address>, [ label <dest1>, label <dest2>, ... ]
      • invoke
        • invoke [cconv] [ret attrs] <ptr to function ty> <function ptr val>(<function args>) [fn attrs] to label <normal label> unwind label <exception label>
        • unwind
        • unreachable
    • binary instruction
      • add, fadd, sub, fsub
      • mul, fmul, udiv, sdiv, fdiv, urem, srem, frem
    • bitwise binary instructions
      • shl, lshr, ashr
      • and, or, xor
    • memory instructions
      • Vector Operations
        • extractelement
          • <result> = extractelement <n x <ty>> <val>, i32 <idx> ; yields <ty>
        • insertelement
          • <result> = insertelement <n x <ty>> <val>, <ty> <elt>, i32 <idx> ; yields <n x <ty>>
        • shufflevector
          • <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> <mask> ; yields <m x <ty>>
      • Aggregate Operations
        • extractvalue
          • <result> = extractvalue <aggregate type> <val>, <idx>{, <idx>}*
        • insertvalue
          • <result> = insertvalue <aggregate type> <val>, <ty> <elt>, <idx> ; yields <aggregate type>
      • Memory Access and Addressing Operations
        • alloca
          • <result> = alloca <type>[, <ty> <NumElements>][, align <alignment>] ; yields {type*}:result
        • load
          • <result> = load <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]
          • <result> = volatile load <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]
          • !<index> = !{ i32 1 }
        • store
          • store <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>] ; yields {void}
          • volatile store <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>] ; yields {void}
        • getelementptr
          • <result> = getelementptr <pty>* <ptrval>{, <ty> <idx>}*
          • <result> = getelementptr inbounds <pty>* <ptrval>{, <ty> <idx>}*
      • Conversion Operations
        • trunc .. to (truncate value type)
        • zext .. to (zero ext)
        • sext .. to (sign ext)
        • fptrunc .. to (float point truncate)
        • fpext .. to
        • fptoui .. to (float point to unsigned int)
        • fptosi .. to
        • uitofp .. to
        • sitofp .. to
        • ptrtoint .. to (pointer to int)
        • inttoptr .. to
        • bitcast .. to
    • other instructions
      • ‘icmp’ Instruction
        • <result> = icmp <cond> <ty> <op1>, <op2> ; yields {i1} or {<N x i1>}:result
  • Intrinsic Functions
    • Variable Argument Handling Intrinsics
      • llvm.va_start
      • llvm.va_end
      • llvm.va_copy
    • Accurate Garbage Collection Intrinsics
      • llvm.gcroot
      • llvm.gcread
      • llvm.gcwrite
    • Code Generator Intrinsics
      • llvm.returnaddress
        • declare i8 *@llvm.returnaddress(i32 <level>)
      • llvm.frameaddress
        • declare i8* @llvm.frameaddress(i32 <level>)
      • llvm.stacksave
        • declare i8* @llvm.stacksave()
      • llvm.stackrestore
        • declare void @llvm.stackrestore(i8* %ptr)
      • llvm.prefetch
        • declare void @llvm.prefetch(i8* <address>, i32 <rw>, i32 <locality>)
      • llvm.pcmarker
        • declare void @llvm.pcmarker(i32 <id>)
      • llvm.readcyclecounter
        • declare i64 @llvm.readcyclecounter()
    • Standard C Library Intrinsics
      • llvm.memcpy llvm.memmove llvm.memset.*
      • llvm.sqrt.* llvm.powi.* llvm.sin.* llvm.cos.* llvm.pow.*
    • Bit Manipulation Intrinsics
      • llvm.bswap.*
        • swap high and low
      • llvm.ctpop.*
        • counts the number of bits set
      • llvm.ctlz.*
        • count the number of leading zeros
      • llvm.cttz.*
        • count the number of trailing zeros
      • llvm.sadd.with.overflow.*
      • llvm.uadd.with.overflow.*
      • llvm.ssub.with.overflow.*
      • llvm.usub.with.overflow.*
      • llvm.smul.with.overflow.*
      • llvm.umul.with.overflow.*
    • Half Precision Floating Point Intrinsics
      • llvm.convert.to.fp16
      • llvm.convert.from.fp16
    • Debugger Intrinsics
      • start with llvm.dbg. prefix
    • Exception Handling Intrinsics
      • start with llvm.eh. prefix
    • Trampoline Intrinsic
      • llvm.init.trampoline
    • Atomic Operations and Synchronization Intrinsics
      • llvm.memory.barrier
      • llvm.atomic.cmp.swap.*
      • llvm.atomic.swap.*
      • llvm.atomic.load.add.* llvm.atomic.load.sub.*
      • llvm.atomic.load.and.* llvm.atomic.load.nand.* llvm.atomic.load.or.* llvm.atomic.load.xor.*
      • llvm.atomic.load.max.* llvm.atomic.load.min.* llvm.atomic.load.umax.* llvm.atomic.load.umin.*
    • Memory Use Markers
      • llvm.lifetime.start
      • llvm.lifetime.end
      • llvm.invariant.start
      • llvm.invariant.end
    • General Intrinsics
      • llvm.var.annotation
      • llvm.annotation.*
      • llvm.trap
      • llvm.stackprotector
      • llvm.objectsize