PDF Version
Purpose
I focus on issues around data science and especially natural language processing at scale. I have domain knowledge in medical/clinical informatics and oil/gas data analysis and signal processing at scale.
Nothing makes me happier than math, statistics, coding and distributed computing.
Professional Experience
Hortonworks, San Jose CA
Principal Architect - November 2012 to Present
As a Principal Architect in the consulting services organization at Hortonworks, I provide mentorship and guidance in operationalizing Hadoop and its ecosystem to solve business problems for our clients. I work with our clients to reduce risk and time to market by providing expertise in big data.
- Provide proof-of-concepts to reduce engineering churn
- Give extensive presentations about the Hadoop ecosystem, best practices, data architecture in Hadoop
- Provide mentorship and guidance to other architects to help them become independent
- Provide review and feedback for existing physical architecture, data architecture and individual code
- Debug and solve issues with Hadoop as on-the-ground subject matter expert. This could include everything from patching components to post-mortem analysis of errors.
I focus on issues around data science and especially natural language processing at scale. I have domain knowledge in medical/clinical informatics and oil/gas data analysis and signal processing.
Explorys, Cleveland OH
“Big Data” Architect - January 2011 to November 2012
I was a “big data” architect and, prior to that, a senior software engineer on the platform team at Explorys. The team was responsible for the creation, care and maintenance of the high performance indexing infrastructure. My job required a deep understanding of the Hadoop ecosystem. I designed the next generation data architecture for the unstructured data at Explorys as well as writing, debugging, and analyzing the performance of many map reduce jobs to realize that architecture.
- Devised and lead the implementation of the next generation architecture for more efficient data ingestion and processing.
- Proficiency with mentoring and on-boarding new engineers who are not proficient in Hadoop and getting them up to speed quickly.
- Experience with being a technical lead of a team of engineers.
- Proficiency with modern natural language processing and general machine learning techniques and approaches
- Extensive experience with Hadoop and HBase, including multiple public presentations about these technologies.
- Experience with hands on data analysis and performing under pressure.
- Designed and wrote a layer on top of MapReduce to make the task of writing MapReduce jobs easier and more safe for Junior Engineers.
- Contributed much of the code in our open source project.
Game Communication, Mayfield Heights OH
Senior Engineer - November 2009 to January 2011
Designed, implemented and integrated into legacy code a scalable network infrastructure layer for an instant message and VOIP network
- Created a software routing layer and implemented an ordered, reliable protocol on top of the raw network layer
- Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network
- Created low-level network libraries in C++ and C# to send messages across the network
- Spearheaded institution of pair-programming and semi-agile practices
ION Geophysical, Houston TX
Research Geophysicist - October 2008 to November 2009
Member of the time processing R&D team, tasked to design, prototype and implement production-ready algorithms to use applied math and signal processing to reduce noise in seismic data. This was a telecommuting position from Cleveland, OH.
- Implemented multithreaded C++ applied linear algebra library consisting of efficient STL-compliant data structures and algorithms to assist in signal processing
- Assisted in the design and implementation of an internal flow-based map-reduce system for doing distributed scientific computation
- Used Tokyo Cabinet to provide an on-disk index of seismic metadata in an efficient way
- Implemented a multidimensional least squares adaptive filter that decreased wall-clock time for our most common seismic task by an order of magnitude
- Developed a fixed-precision algorithm in C for the embedded PowerPC 405 platform which uses first-order statistics to differentiate noise from initial signal in an accurate, efficient and robust way
- Spearheaded institution of pair-programming and semi-agile practices in a geographically dispersed environment
Oracle, Cleveland OH
Senior Member of Technical Staff - October 2005 to October 2008
Member of the Oracle Enterprise Repository team. We provided an enterprise J2EE application to organize and manage assets and their metadata.
- Worked as part of a geographically distributed team using agile practices
- Designed and implemented an enterprise build system around Maven, porting an existing heterogeneous build system from Ant and shell scripts to Maven
- Implemented multiple performance and scalability improvements, resulting in substantial benefits deriving from better caching strategies and algorithms with more favorable CPU/Memory complexity characteristics.
- Managed summer interns
Certifications
- Cloudera Certified Hadoop Developer - August 2012
- Cloudera Certified HBase Developer - August 2012
Education
Texas A&M University, College Station TX
Masters of Science in Mathematics - Spring 2005
- Emphasis in Computational Complexity and Theoretical Computer Science
- Advised by Dr. J. Maurice Rojas
University of Louisiana at Monroe, Monroe LA
Bachelors of Science in Mathematics and Computer Science - Spring 2002
Selected Publications, Patents & Talks
- 
“Fast Map Reduce over HBase”, Strata/Hadoop World Apache HBase Meetup, NYC, Invited Talk 
- 
“New Complexity Thresholds for Sparse Real Polynomials”, Sixth International Joint Meeting of AMS and SMM, Invited Talk 
- 
“SYSTEM AND METHOD FOR USING AN EDITABLE LIFECYCLE EVENT DISTRIBUTION LIST WITH A SERVICE METADATA REPOSITORY”, Patent Issued March 27, 2012, Patent number 8145680, Oracle Corporation 
- 
J. Maurice Rojas, Frederic Bihan, and Casey Stella ”Faster Real Feasibility via Circuit Discriminants”, In Proceedings of ISSAC 2009, pp. 39-46, ACM Press, 2009 
Honors and Awards
- Outstanding Teaching Assistant - 2005
- National Science Foundation VIGRE Fellowship - 2004
- AUF & Regents Fellowships for Oustanding Academic Achievement - 2003