DEV Community

Horia Coman
Horia Coman

Posted on • Originally published at horia141.com on

Friday Blast #57

A short guide to hard problems (2018) - most folks are familiar with the P and NP complexity classes, at least via the P vs NP problem at the core of computer science. Here a bunch of extra complexity classes are covered - BPP, BQP, PH, PSPACE and EXPTIME as well as the relationships between them. Sometimes theyโ€™re clear cut, other times we have strong reasons to believe a theyโ€™re the same, and yet other times the field is open.

Introduction to model trees from scratch (2018) - one of the cooler aspects of computer science inspired ML models (decision trees and neural networks[1]) is that theyโ€™re oftentimes โ€œcomposableโ€. Here the space partitioning approach used in decision trees is combined with linear regressions at the leaves, to produce a more powerful model than the โ€œclassicโ€ constant model one.

AWS, MongoDB, and the economic realities of open source (2019) - as usual, a good piece from Stratechery on the business side of tech. This time with a focus on open source. Itโ€™s been a thing for the last 2-3 years that tech companies in the open source infrastructure space were in for a tough time because cloud providers were offering their systems in a more convenient and cheap way. Docker, ElasticSearch, Redis, MongoDB, Hadoop and their backing companies were prime examples. This article focuses on MongoDB and Mongo Inc. But itโ€™s telling. Companies in this space need to move up the value chain if theyโ€™re going to succeed. I plan on writing my own bit on this shortly.

A primer on database replication (2017) - the various approaches to data replication in the database context. Itโ€™s nice that it looks at both SQL and NoSQL solutions and how each tackled the problem.

Billions of messages a day - Yelpโ€™s real-time data pipeline (2016) - a description of the impressive architecture Yelp has deployed to organize itโ€™s systems. A Kafka based central data repository is not that future-tech-y, and Iโ€™ve posted about such an approach several times here. But the central schema store with built-in tracking of streams, documentation and integration with code is pretty sweet.


[1] As opposed to statistics inspired ones - GLMs, SVMs etc.

Top comments (0)