By Kaarel Moppel - Compare MySQL and PostgreSQL - First off - I'm not trying to kindle any flame wars here, just trying to broaden my (your) horizons a bit, gather some ideas (maybe I'm missing out on something cool, it's the most used Open Source RDBMS after all) and to somewhat compare the two despite being a difficult thing to do correctly / objectively. Also, I'm leaving aside performance comparisons and I'm looking at just the available features, general querying experience and documentation clarity as this is I guess the most important for beginners. So this is just a list of points I made for myself, grouped in no particular order.
Disclaimer: The last time I used MySQL for some personal project is 10 years ago, so basically I'm starting from zero and it only took one and a half days to get to know it - thus if you see that I've gotten something screamingly wrong then please do leave a comment and I'll change it. Also, my bias in this article probably tends to favour Postgres...but I'm pretty sure a MySQL veteran with good knowledge of pros and cons can write up something similar also on Postgres, so my hope is that you can leave this aside and learn a thing or two about either system.
To run MySQL, I used the official Docker image, 8.0.14. Under MySQL the default InnoDB engine is meant.
|
docker run --rm -p 3306:3306 -e MYSQL_ROOT_PASSWORD=root mysql:8 |
"mysql" CLI ("psql" equivalent) and general querying experience
*When the server requires a password why doesn't it just ask for it?
|
mysql -h 0.0.0.0 -u root # adding '-p' will fix the error ERROR 1045 (28000): Access denied for user 'root'@'172.17.0.1' (using password: NO) |
- Very poor tab-completion compared to "psql". Using "mycli" instead makes much more sense. I'm myself 99% of time on CLI-s, so it's essential.
- Lot less shortcut helpers to list tables, views, functions, etc...
- Can't set to "extended output" (columns as rows) permanently, only "auto" and "per query".
- One does not need to specify a DB to connect to - I actually find it positive as it's easy to forget those database names and when once in, one can call "show databases".
- No "generate_series" function...might seem like a small thing...but with quite a costly (in time sense) impact when trying to generate some test data. there seems to be an alternative function on github but first you'd need to create a table so not quite the same.
- CLI help has links to web e.g. "help select;" shows "URL: http://dev.mysql.com/doc/refman/8.0/en/select.html" at the end of syntax description. That is great.
- If some SQL script has errors "mysql" immediately stops, whereas "psql" would continue unless a bit cryptic "-v ON_ERROR_STOP=1" flag set. I think "mysql" default behaviour is more correct here.
- No SQL standard "TABLE" syntax support. It's a nice shortcut so I use it a lot for Postgres when testing out features / looking at config or "system stats" tables.
- MySQL has index / optimizer hints, which might be a good thing to direct some queries in your favour. Postgres has decided not to implement this feature as it can also cause problems when queries are not updated when data magnitudes changes or new/better indexes are added. There's an extension though for Postgres (as usually).
- Some shorthand type casting ("::" in Postgres) seems to be missing. A small thing again sure, but a lot of small things will make out a big one.
- Some "pgbench" equivalent missing. A tiny and simple tool that I personally appreciate a lot in Postgres, really handy to quickly gauge server performance and OS behaviour under heavy load.
MySQL positive findings
- Much more configuration options (548 vs 282), allowing possibly to get better performance or specific behaviour. A double-edged sword though.
- Threaded implementation, should give better total performance for very large numbers (hundreds) of concurrent users.
- Good JSON handling features, like array range indexers for example: "$[1 to 10]" and JSON Path.
- More performance metrics views/tables in the "performance_schema". Not sure how useful the information is in there though.
- There is an official clustering product option (commercial)
- Built-in support for tablespace and WAL encryption (needs 3rd party stuff for Postgres).
- MySQL workbench, a GUI tool for queries and DB design, is way more capable (and visually nicer) as "pgadmin3/4". There's also a commercial version with even more capabilities (backup automation, auditing).
MySQL downsides
- Seems generally more complex as Postgres for a beginner - quite some options and exceptions for different storage engines like MyISAM. Options are not bad, but remember looking as a beginner here.
- Documentation provides too many details at once, making it hard to follow - moving some corner-case stuff (exceptions about old versions etc) onto separate pages would do a lot of good there. Maybe on the plus side: there physically almost 2x more documentation, so chances are than in case of some weird problems you have higher chances for finding some explanations to it.
- From documentation it seems that besides bugfixes, also features are added to minor MySQL versions quite often...which a Postgres user would find confusing.
- Less compliant with the SQL standard. At least based on sources I found googling: 1, 2, 3.
- Importing and exporting data. There's something equivalent to COPY but more complex (some specific grant and config setting involved for loading files located on the DB server) so that a separate tool for importing data, called "mysqlimport". Also found an interesting passage from the docus that points to some implicit change in transaction behaviour, depending in which way you load data:
|
With LOAD DATA LOCAL INFILE, data-interpretation and duplicate-key errors become warnings and the operation continues because the server has no way to stop transmission of the file in the middle of the operation. |
- EXPLAIN provides less value on trying to understand why a query is slow. Also, there is no EXPLAIN ANALYZE - that's a bit of a bummer as workaround with "trace" is already a bit arcane. "EXPLAIN FORMAT=JSON" provides a bit more detail to estimate the costs though.
- Full-text search is a bit half-baked. Built-in configurations seems to be tuned for English only and there is no stemming (Postgres has 15 biggest western languages covered out of the box).
- Some size limits seem arbitrary (64TB on tablespace size, 512GB on InnoDB log files [WAL I assume]). Postgres leaves those to the OS / FS (a single table/partition size is limited to 32TB though).
PostgreSQL architectural/conceptual/platform advantages
- 100% all ACID, no exceptions. MySQL has gotten a lot better with version 8, but it's not quite there yet with DDL for example.
- More advanced extension-system. MySQL has a plugins system also though, but not as generic to enable for example stored procedures in Python.
- More different index types available (6 vs 3) - for example it's possible to index strings also for regex search and there are lossy indexes for Big-data. Also, MySQL doesn't seem to support partial indexes.
- Simpler standby replica building / management. From PG10+ it's a single command on replication host side with no special config group setup.
- Synchronous replication support.
- Closer to Oracle in terms of features, SQL standard compatibility and stored procedure capabilities. Also, there are some extensions that add some Oracle string/date processing functions etc.
- Couple of more authentication options available out of the box. MySQL has also LDAP and pluggable authentication though.
- More advanced parallel query execution. Postgres is a couple of years ahead in development here since version 10, MySQL just got the very basic (select count(*) from tbl) support out with the latest 8.0.14.
- JIT (Just-in-time) compilation, e.g. "tailored machine-code" for tuple extraction and filtering. Massive savings for Data Warehouse and other row-intensive type of queries.
- Multiple storage engines. Something similar in works also for Postgres.
- Less bloat due to use of "UNDO"-based row versioning model. Work-in-progress for Postgres though.
- Threads vs processes should give a boost at high session numbers
- Built-in support for multi-master (Multi-Primary Mode) replication. There are caveats as always (CAP theorem still stands), and very few people needs something like that actually, but definitely reassuring that it's in the "core" - for Postgres there's a 3rd-party extension providing similar, but as I've understood the plan is to get it into the "core" also.
- Built-in "event scheduling". Postgres again needs a 3rd party extension or custom C code to employ background workers.
- "REPEATABLE READ" is the default transaction model, providing consistent reads throughout a transaction out of the box, saving novice RDBMS developers possibly from quite some head-scratching.
Things I found weird in MySQL
A table alias in an aggregate can break a query:
|
mysql> select count(d.*) from dept_emp de join departments d on d.dept_no = de.dept_no where emp_no < 10011; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '*) from dept_emp de join departments d on d.dept_no = de.dept_no where emp_no <' at line 1 |
- Couldn't find a list of built-in routines from the system catalog. After some googling found that:
- For ease of maintenance, we prefer to document them in only one place: the MySQL manual. Well OK, kind of makes sense, but why not to create some catalog view where one could at least have the function names and do something like "df *terminate*". Very handy in Postgres.
- One needs to always specify an index name! I personally leave it to Postgres as life has shown that it's super hard to enforce a naming policy, even when the team consists of a...single developer (yes, I'm looking at myself).
- TIMESTAMP min value is '1970-01-01 00:00:01.000000' and the more generous DATETIME starts with ’1000-01-01 00:00:00′ but doesn't know about time zones...
- The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all VARCHAR columns).
- It is not possible to (easily, w/o CASE WHEN workaround) specify if you like your NULL-s first or last, which is very weird...as this is specified in SQL Standard 2003, 15 years ago :/ By the way, default "ASC(ENDING)" mode the behaviour is also contrary to Postgres, which has NULLS LAST. Has to do with that part not specified in the SQL standard.
- No FULL OUTER JOIN. Sure, they're quite rarely used, but most of the "competitors" have them and shouldn't be too hard to implement if having LEFT JOIN etc.
- Only "Nested Loop" joins and it's variations. Postgres has additionally also "Hash" and "Merge" join which help a lot when joining millions of rows.
Things I found very weird in MySQL
- CAST() function does not support all data types :/. For example, "int" is available when declaring tables but:
|
mysql> select cast('1' as int) x; -- will work when cast to 'unsigned' ERROR 1064 (42000): You have an error in your SQL syntax |
- Some DDL (e.g. dropping a table) is not transactional! New tables are also immediately visible (empty though) to other transactions, when declared from a not-yet-committed transaction. Not ACID enough MySQL.
- CHECK constraints can be declared but they are silently ignored!
- FOREIGN KEY-s declared with the shorter REFERENCES syntax (at the end of column definitions) are not enforced and there are even no errors when the referenced table/column is missing! One needs to use the longer FOREIGN KEY + REFERENCES syntax.
- "Truncation of excess trailing spaces from values to be inserted into TEXT columns always generates a warning, regardless of the SQL mode." i.e. data is silently chopped despite of the "STRICT MODE" which is the default. For indexes truncating data would be OK (Postgres does it also), but not for data.
MySQL cool features that I would welcome in Postgres
- Implicit session variables. In PG it's also possible, but in a tedious way with "set"/set_config() + current_setting() functions. There's a patch in circulation also for Postgres but not yet in core.
|
'select @a := 42; select @a;' |
- Built-in "spatial" support. MySQL GIS functions fall short of Postgres equivalent PostGIS but having it in "contrib" and officially supported would make it a lot more visible and provide more guarantees for potential developers on lookout for a GIS platform, in result aiding the whole Postgres project.
- Generated columns. In Postgres you need views currently, but some work on that is luckily in progress already.
- Resource groups to prioritize/throttle some workloads (users) within an instance. Currently only CPU can be managed.
- "X Protocol" plugin. A relatively new thing that allows asynchronous calls from a single session!
- Auto-updated TIMESTAMP columns (ON UPDATE CURRENT_TIMESTAMP) when row is changed. In Postgres similar works only on initial INSERT and needs a trigger otherwise
- A single "SHOW STATUS" SQL command that gives a nice overview of global server status for both server events and normal query operations - Connections, Aborted_connects, Innodb_num_open_files, Bytes_received / sent, "admin commands" counter, object create/drop counters,pages_read/written, locks, etc. For Postgres it's only possible with continuous pg_stat* monitoring and/or continuous log file parsing.
- RESTART (also SHUTDOWN) - a SQL command that stops and restarts the MySQL server. It requires the SHUTDOWN privilege.
- Real clustered (index-organized) tables (PRIMARY KEY implementation). In Postgres clustering is effective only for a short(ish) time.
- There's a dead simple tool on board that auto-generates SSL certs both for server and clients.
- Fresh 8.0.14 version permits accounts to have dual passwords, designated as primary and secondary passwords. This enables smooth password phaseouts.
My verdict on MySQL 8 vs PostgreSQL 11
First again, the idea of the article is not to bash MySQL - it has shown a lot of progress recently with the latest version 8. Judging by the release notes, a lot of issues got eliminated and cool features (e.g. CTE-s, Window functions) added, making it more enterprise-suitable. There's also much more activity happening on the source code repository compared to Postgres (according to www.openhub.net), and even if it's a bit hard to acknowledge for a PostgreSQL consultant - it has much more installations and has very good future prospects to develop further due to solid financial backing, which is a bit of a topic for Postgres as it's not really owned by any company (which is a good thing in other aspects).
In summary, I recommend PostgreSQL for 99% of users needing a relational database, thanks to its lightweight nature, data integrity, and fewer surprises. The remaining 1% may require MySQL for global start-up scaling due to its multi-master support. PostgreSQL's constraints ensure data integrity, while MySQL requires constant vigilance from developers, who may overlook important safeguards under pressure. Additionally, PostgreSQL offers over 100 Foreign Data Wrappers for diverse data integration needs.
Hope you found something new and interesting for yourself, thanks for reading!
Check out the latest blogs concerning the important topic of PostgreSQL security in our security blog spot.
Big new feature in mys v8 is role system....
Great post! Very informative and balanced, which is a rarity when people compare things close to their heart like databases or text editors 🙂
I've been using MySQL for almost 20 years now, but I really like PostgreSQL too and try to keep up with what's going on with it as much as I can. I actually got started with MySQL because PG was "too correct": There was no way to disable WAL and back in 2000 or 2001 I needed to do bulk imports of data for warehouse-style queries while nothing else was writing to that database, so the WAL overhead was too much when if anything went wrong I could easily wipe the day's data and start the import again. Funny how a small accident in requirements can dictate your career, as had there been a way to disable WAL (or had I been working on OLTP instead of OLAP) I may have ended up spending 20 years on PG instead 🙂
A couple of comments on your MySQL points:
- I think the closest to pgbench in our part of the world is 'mysqlslap', which should come bundled when you install the mysql-client packages. For more advanced uses there's sysbench (which also supports PG), but I think mysqlslap would get you most if not all of what pgbench can give.
- The clustering of MySQL is complex, from a product naming / licensing point of view. This is not made easier by the fact that there's a (n Open Source) product called MySQL Cluster, but which is not the only way to do clustering with MySQL ... But my point is, there are Open Source clustering options, even from Oracle:
- MySQL Cluster, as mentioned, uses a different storage engine (NDB) and while mostly a niche product, it's a fantastic fit for its niche (it does automatic sharding and routing of queries, for example).
- MySQL Group Replication / Innodb Cluster is a way to do HA (but not sharding, at least for now) with Innodb, is official from Oracle, and is Open Source. I have not seen it in the wild much yet but it should be gaining traction. I think some features may require an Enterprise license but you can get a cluster running using only Open Source software.
- There are a few Galera-based clustering solutions that are functionally very similar to Group Replication, though more mature, as that has been around for a few years already. It is not available on the official Oracle Community release (for obvious reasons, I guess), but can be obtained as Open Source Software under a few names (at least, Percona XtraDB Cluster, MariaDB Galera Cluster, Galera Cluster).
- If sharding and scale-out are needed, TiDB is a new database that's MySQL protocol-compatible, uses RocksDB, and does sharding/ha/routing for you. In fact, while there are important architectural differences, as a PostgreSQL user you could think of it of MySQL's CockroachDB and you wouldn't be too far off IMHO 🙂
Sorry for the long rant and cheers!
Great post! For scaling, though, you can look in our own backyard. You noted, "The remaining 1% percent would be then for cases where some global start-up scaling would be required, due to native multi-master support."
You may want to check out a little company called Pex. Their requirements were so big that they blew past the limits of MongoDB, Hadoop, Cassandra, and HBase. They are using PostgreSQL with the CitusDB extension; they use 20 nodes, 1280 cores, and 2.4TB of RAM to update 80B rows per day and ingest 60k rows per second, all while being responsive to ad-hoc queries. Microsoft also has a similarly-sized Citus cluster that handles all Windows telemetry data.
Thanks for the interesting article.
How do you use pgbench?
I'd like to translate this impressive article https://www.cybertec-postgresql.com/looking-at-mysql-8-with-postgresql-goggles-on/ into Japanese and publish on our tech blog https://techracho.bpsinc.jp/ for sharing it. Is it OK for you?
I make sure to indicate the link to original, title, author name in the case.
Best regards,