This is one thing I wish I knew while learning PostgreSQL: Don’t ever use the VARCHAR column type. I have never encountered a situation where that would have been superior to using TEXT.
The problem with VARCHAR column type is that it can be a fixed-width column that doesn’t allow text longer than a certain number of characters. This maximum character length can be introduced by the person who creates the table and adds the column.
The maximum character length can also be created by a GIS program (specifically, QGIS and the ogr2ogr command line tool) as well as some online tools that can convert spreadsheets to SQL and even write the table definition and “CREATE TABLE” statement. In both cases, the program tried to guess the column type and seemed to default to VARCHAR(y character length) instead of TEXT.
The maximum character length is a problem because trying to insert more text than allowed into that field will encounter an error and the INSERT or UPDATE statement will fail. I think it’s also a bad idea to create these columns because it makes it expensive to change your mind later about what length of text will go into that field.
Other things I wish I knew earlier
- Using transactions to test your query and its affect on your data so that you can easily undo any changes.
- How and when to use indexes beyond the “btree” index type (for example, “gin” for array and JSONB columns).
- Add “time_created_at” and “time_updated_at” columns to basically every dataset. This makes it easy to sort things later; sometimes when creating a new dataset it’s not always obvious how you might want to display or manipulate the data until days, weeks, or months later. Sorting data by “when was the information changed last” is a common use case for me.