It’s worth noting that a lot of people will never need to create their own database. You may never create your own tables or other data structures either. You may only ever run backups and restores and manipulate the security on the system and let application installs create databases for you. That’s completely understandable and perfectly in line with the needs of many businesses and many accidental DBAs. However, it’s a good idea to understand what this stuff is and how it works as part of understanding SQL Server.
A Database Is Actually Files
You need to store information that you want to be able to retrieve later. It’s necessary that you organize that information. If you were working with a word processing program, you store different documents in different files. You really wouldn’t put all of your documents into a single, large file. SQL Server functions in a very similar manner. While you have a server, you’re not going to simply store all the various types of information necessary to run your business in one large pile within that server. Instead you’re going to organize that information. The initial organizational mechanism for SQL Server is the database. A database allows you to keep sets of information in separate storage areas. Further it allows you to isolate the security for those different sets of information so that you can control who gets to see or modify that data.
Within a database there are these things called tables. Tables are how the relational storage gets defined. The information that you’ve organized for storage in a given database will be added to these tables. You will be able to add or remove data to the database by way of these tables. You’ll also be able to retrieve the information from the database from these tables.
In addition to tables in the database, there are a number of different constructs that will help you manage the information within your database that are also stored within the database. These include a number of different types of objects that help you manipulate the data such as views, stored procedures and functions. You also get a full set of security objects such as roles and users.
Databases are actually made up of files that are stored somewhere on the Windows file system. These files are in a proprietary, binary format and cannot be read directly except through SQL Server. Writing data to these files is one of the most expensive and time consuming operations within SQL Server (although in most cases, the time involved is measured in microseconds). Because of this expense, it’s important to understand that your databases are defined by files and that it matters where they are stored as part of your understanding of the database.
Two Types of Files
You could simply store your data in a text file or a spreadsheet and plenty of people do. But when you need multiple people to access it at the same time, updating some of the date, deleting some, inserting new information, all at the same time, those other storage mechanisms become very problematic. That’s why you need to use a database. Because SQL Server has to do so much different kinds of work with the data that it’s storing, it’s come up with some different mechanisms for performing those actions. Databases are structured around two different types of files that store different kinds of information . You can refer to this figure as we go through the two types.
NOTE: There are actually several other file types that can be added to a database, but we’re talking fundamentals here, so we’ll keep it straight forward and simple for now.
Data File
The first type of file needed to define your database is the data file. The data file is easy to understand and explain. This type of file is where the information that you write to your database is going to be stored. Any given database can be made up of multiple data files. The data files can be placed on more than one hard drive with your system. If you have more advanced storage mechanisms, such as Storage Area Networks (SAN), you may have other constructs than simple hard drives on a server, but they will be mapped to the Windows operating system as drives and SQL Server can use those for storage of the data files as well.
Log File
The second type of file needed to create a database is the log file. A log file is a slightly more complex thing to understand than the data file. A log file records every transaction that takes place within the database. A transaction occurs when data is manipulated in some way within the system. These manipulations can be updates to existing data, adding new data by inserting it into a table, or deletes of data from tables. All these actions cause information to be written to the log file. There are a number of other functions also associated with the log file. These files are usually much smaller than the data files because the entries into the log file only need to be kept around until all the data has been successfully written to the data file. Because writing to the data file is subject to different kinds of failures, log files are kept to help deal with those failures. Once information has been written to the data file, the log entry can be removed. The process to clean up log files will be discussed in detail in another post.
When you’re initially creating the files you can adjust their size. You can adjust that size, both up and down, after creating the files as well. Each file can also be set to grow automatically and this is complicated enough so it’s worth spending a little more time understanding exactly what it means.
Autogrowth settings
Managing files on databases can be a lot of work if you have a lot of databases. You’ll have to check for the available space and then set the files to larger sizes when there is enough. One way around all this manual work is to use the autogrow property settings on the database.
A word of caution: Be very careful using this setting. You can fill a drive and cause your server to go offline.
Setting a database to autogrow means that the database will automatically adjust the file size upwards when it starts to run out of room. Many people use this setting and many applications set it to on when they’re installed. You should set an upper growth limit on the files in order to avoid that problem. You can set the files to grow by a percentage of the database or in fixed sizes. For smaller databases, growing by percentages can work, but as the database expands in size, growing by percentages becomes a longer and longer process. The best practice is to set the growth to a particular value instead of a percentage. The details on how to set all these will be covered when databases get created.
Where to place your files
When you install SQL Server, you have the option of defining where your database files will be placed. You can also adjust this through the Server Properties window. It’s always best to know where you are going to be placing files when you create a database in order to be sure that you have enough space on that drive. To see this location, connect to your server as was outlined in Database Fundamentals #2. Once connected, in the Object Explorer window, right click on the server name itself. This will bring up a context menu. Select the “Properties†menu choice at the bottom of that menu. This will open the Server Properties window and you will be in the default, “Generalâ€, tab. Select the “Database Settings†tab and you will see something very similar to this:
The interesting area to look at here is at the bottom of the window in the section titled Database default locations. You’ll see three different directories listed, one each for Data, Log and Backup. By clicking on the ellipsis at the right side of the entry you can bring up a default file browser window to make a change to the default location of your log or data files. You can also modify them by typing directly into the text box with either a physical path as shown above or using a Universal Naming Convention (UNC) path.
The goal for placing files should be to attempt to separate functionality as much as you can. If you have the capability, you’re better off separating your operating system files from your data and log files. This means placing the data and log files on something other than the C:\ drive, if possible. Further, you should at least consider storing the data and log files on completely separate drives as well. This allows for the maximum throughput of data to the files on the drives.
Once you’ve determined that the storage location is appropriate, if you’ve made changes, click on the OK button to save those changes. If you haven’t made any changes, or you don’t wish to save the changes you made, click the Cancel button. Either of these clicks will close the Server Properties window.
Each of the two file types have several properties in addition to the physical location that will matter when you create your databases. Part of the physical location will be the file name and extension. These can be any valid Windows operating system name and extensions. By default the extensions will usually be .mdf for the data file and .ldf for the log file. You can change these if you want, but it could lead to confusion since most people use the defaults as a best practice. There is a logical file name that allows you to refer to the file for operations within the database without having to refer back to the full file location including the drive. The logical name does not have to match the physical name at all, but usually they do.
Conclusion
That introduces the guts of the file storage for databases. Next post we’re going to actually create a database on the server, which is a heck of a lot easier than this post makes it feel like.
Interesting, but how files are stored into the file system?
Few days ago I imported a .txt file of 200MB size and once imported into SQL Server the size was up to 500MB.
Why this difference?
The files on the operating system are just files. Nothing special. As to how your text file went from 200mb to 500mb, without a lot more details, I’m only going to offer some speculation. First up, data types. Let’s say you have CHAR(500) defined (just making stuff up) and the actual data largely only consists of 30-50 characters in length. You’ll have an additional 450 characters stored. Another possible data type issue, let’s say you’re defining NVARCHAR instead of VARCHAR (pretty common). NVCHAR stores twice as much in size as the actual data, so 200mb can easily become 400mb. Also, you’re not talking about whether that data storage is just the data or the data and the log. A transaction can be as large as the data that’s moved through, larger even sometimes. All of that can add up to show more data than is strictly observed just by the size of some text in a text file. There are lots of other possible causes, so don’t get hung up on these examples.