• Hi!
    I'm Zifei.

About Me

Who Am I?

I am excited to bridge the gap between research and industry. Over the past few years, I have been building language technologies into different products with the goal of improving human lives.

I was one of the founding engineers (#4) of Lattice Data, a Silicon Valley startup in the field of data intelligence. Our mission was to turn Dark Data into structured knowledge bases. The startup was acquired by Apple. I worked at Apple on Machine Learning, and now am working at Google AI Language.

I obtained my Master's in Computer Science in Stanford University. At Stanford, I was a research assistant with Professor Christopher Ré in InfoLab. I was one of the major developers of DeepDive, a scalable probabilistic inference engine for information extraction, which was later commercialized into Lattice.

I got my Bachelor's in CS from Peking University. I was a visiting student at Technion--Israel Institute of Technology on a Research Exchange Program. I was a research intern at Toshiba in Tokyo, and Tableau Software.

For my full profile, see my LinkedIn page.


My Experience

Engineer & Researcher at Google AI Language 2019 - Present

  • Language Understanding with Deep Learning.
  • Multilingual Entity Linking at scale.

Machine Learning at Apple, China 2018 - 2019

Engineering Manager at Apple, California 2017 - 2018

Founding Engineer at Lattice Data 2015 - 2017

  • Employee #4 and Staff Engineer at Lattice Data, acquired by Apple for ~$200M.
  • Expert in transforming unstructured “dark data” to structured knowledge.
  • Architected inference engine, information extraction modules, pattern language for distant supervision, and distributed execution frameworks.
  • Built many applications that turned into shipped data products.
  • Led a team of 6 engineers to build tools for machine learning platform.
  • Built workflows for doing error analysis and improving data quality, and mentored teammates.

Research Assistant at Stanford 2013 - 2015

At Stanford, I was a research assistant advised by Professor Christopher Re, working on building and using knowledge bases. I was one of the active contributors of DeepDive, a scalable probabilistic inference engine well-received by both academy and industry.

Research Intern at Toshiba, Japan 2015 - 2015

Led a project in patent analysis. Extracted patent claim structures using probabilistic relation extraction techniques. Built a system that visualizes, compares, and searches patents, with the help of extracted structures. Published work on SIGIR 2017.

Collaborated on constructing knowledge base for semiconductor material attributes from research papers, using machine reading and relation extraction techniques.

Software Engineering Intern at Tableau Data 2014 - 2014

Worked on backend functionalities of the new generation of Tableau Server, including user and group logics, and general internal tools.


Selected Publications

  • One paper submitted to EMNLP 2020.
  • Jeffrey Ling, Nicholas FitzGerald, Zifei Shan et al, Learning Cross-Context Entity Representations from Text. arxiv
  • Okamoto Masayuki, Zifei Shan, Ryohei Orihara. Applying Information Extraction for Patent Structure Analysis. SIGIR 2017: 989-992
  • Christopher Ré, Amir A. Sadeghian, Zifei Shan et al. Feature engineering for knowledge base construction. IEEE Data Eng. Bull. 37(3): 26-40 (2014)
  • Zifei Shan, Shiyingxue Li, Yafei Dai: GameRank: Ranking and Analyzing Baseball Network. Social Informatics 2012: 244-251


2013-2015 | Machine Learning


I was among major contributors of the Stanford DeepDive project, a scalable probabilistic inference engine which was later commercialized as Lattice Data.

2015 | Information Extraction

Patent Claim Structure Extraction

Led research project as a Toshiba intern. Using DeepDive to extract patent claim structure for analysis and comparison.

Aug 2012 | Network Theory

Ranking Baseball Network

Introduced GameRank, a ranking algorithm to rank pitchers and batters in baseball networks.

Dec 2014 | Machine Learning

Capital Crunch

Predicting Investments in Tech Companies.

Nov 2014 | Network theory


Interactive ideation system, assisting humans in the brainstorming by automatically suggesting new ideas.

Jun 2014 | Speech recognition


A speech recognition decoding system that integrates knowledge for Speech Recognition using DeepDive

Jan 2015 | Machine Learning

Authorship Attribution in multi-author documents

Predict the authors of scientific publications in a multi-author setting, using writing style features.

Aug 2012 | Baseball & Network Theory

MLB Illustrator

Visualizing the MLB player networks with player statistics, using GameRank algorithm.

2010-2011 | Computer Games

2D Shooting Game

I wrote a Shooting Game on my own using OO programming with a very low-level C++ library called Haaf's Game Engine.

Designed the "barrages" (bullet movements) using a force model.

There are 3 stages with many enimies. It was too hard that I could rarely win the "hard" mode.

Get in Touch


zifeishan AT Gmail