Which columns should be indexed when all may be used in different search queries?
Background
I'm working on a website for a movie theather chain currently located in four different cities (might expand in the future). They use the same, single-database website for all cities, which means I have to have a column in certain tables which holds the ID of the city that each row belongs to.
Right now I have three different tables:
Cinemas
- Contains each city's cinema (ID and name).
Movies
- Contains all movies that has been/will be shown at the cinema.
Showtimes
- Contains all showtimes for all movies in all cities.
The structure of the Showtimes
table is the following:
Column Name | Column Type | Description
--------------+--------------+---------------
ID | BIGINT | (Primary) Unique ID for each showtime (perhaps unnecessary?)
CinemaID | TINYINT | Foreign key bound to Cinemas.ID
MovieID | BIGINT | Foreign key bound to Movies.ID
Showtime | DATETIME | At what date and time the movie will show
(will contain multiple rows for each movie, i.e. one row for each showtime)
How this table will be used
A user of the website must be able to:
View all current/upcoming movies and showtimes (sorted by date) in the selected city.
Example query (backend):
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? ORDER BY Showtime
Select a single movie and view all showtimes for that specific title only (in the selected city).
Example query:
SELECT Showtime FROM Showtimes WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime
Select a single day and view all movies and showtimes for that day only (in the selected city).
Example query:
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM])
So naturally I decided that I needed to create indexes for the columns.
Problem
What I'm having trouble with is deciding/determining how to index the columns properly. One index for each column seems quite expensive[1][2] so I started looking into composite indexes, which seems to be the right choice but also led to even more confusion.
From my understanding (based on what I've read) you should add the columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?) column the first in the composite index[3] (in my case that would be the Showtime
column). The only problem with that is that the index can only be used by the database if the first column is included in the search query[4][5], which it currently isn't in either of my queries.
Question
What kind of index(es) should I apply to my columns in order to cover all usage scenarios? (the last scenario may be omitted, but the first two are required)
Should I use a composite index on all columns, for some columns, or do I need a separate index for each column?
This table is updated at most a few times per week to add new showtimes.
Footnotes
1MySQL indexes - what are the best practices?
2Indexing every column in a table
3How important is the order of columns in indexes? (question)
4How important is the order of columns in indexes? (#2 top-voted answer)
5When should I use a composite index?
mysql database-design index innodb mariadb
New contributor
add a comment |
Background
I'm working on a website for a movie theather chain currently located in four different cities (might expand in the future). They use the same, single-database website for all cities, which means I have to have a column in certain tables which holds the ID of the city that each row belongs to.
Right now I have three different tables:
Cinemas
- Contains each city's cinema (ID and name).
Movies
- Contains all movies that has been/will be shown at the cinema.
Showtimes
- Contains all showtimes for all movies in all cities.
The structure of the Showtimes
table is the following:
Column Name | Column Type | Description
--------------+--------------+---------------
ID | BIGINT | (Primary) Unique ID for each showtime (perhaps unnecessary?)
CinemaID | TINYINT | Foreign key bound to Cinemas.ID
MovieID | BIGINT | Foreign key bound to Movies.ID
Showtime | DATETIME | At what date and time the movie will show
(will contain multiple rows for each movie, i.e. one row for each showtime)
How this table will be used
A user of the website must be able to:
View all current/upcoming movies and showtimes (sorted by date) in the selected city.
Example query (backend):
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? ORDER BY Showtime
Select a single movie and view all showtimes for that specific title only (in the selected city).
Example query:
SELECT Showtime FROM Showtimes WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime
Select a single day and view all movies and showtimes for that day only (in the selected city).
Example query:
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM])
So naturally I decided that I needed to create indexes for the columns.
Problem
What I'm having trouble with is deciding/determining how to index the columns properly. One index for each column seems quite expensive[1][2] so I started looking into composite indexes, which seems to be the right choice but also led to even more confusion.
From my understanding (based on what I've read) you should add the columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?) column the first in the composite index[3] (in my case that would be the Showtime
column). The only problem with that is that the index can only be used by the database if the first column is included in the search query[4][5], which it currently isn't in either of my queries.
Question
What kind of index(es) should I apply to my columns in order to cover all usage scenarios? (the last scenario may be omitted, but the first two are required)
Should I use a composite index on all columns, for some columns, or do I need a separate index for each column?
This table is updated at most a few times per week to add new showtimes.
Footnotes
1MySQL indexes - what are the best practices?
2Indexing every column in a table
3How important is the order of columns in indexes? (question)
4How important is the order of columns in indexes? (#2 top-voted answer)
5When should I use a composite index?
mysql database-design index innodb mariadb
New contributor
"one row for each showtime" -- What about a cinaplex that has multiple screens with the same movie at the same time?
– Rick James
8 hours ago
@RickJames : This chain (which is relatively small) currently only have one screen in each theater. But in case they'd get a second, would it be a problem to add another column for that without an index if we don't need to search for it (only retreive it)?
– Visual Vincent
8 hours ago
2
(Upvote for a well-written Question.)
– Rick James
7 hours ago
add a comment |
Background
I'm working on a website for a movie theather chain currently located in four different cities (might expand in the future). They use the same, single-database website for all cities, which means I have to have a column in certain tables which holds the ID of the city that each row belongs to.
Right now I have three different tables:
Cinemas
- Contains each city's cinema (ID and name).
Movies
- Contains all movies that has been/will be shown at the cinema.
Showtimes
- Contains all showtimes for all movies in all cities.
The structure of the Showtimes
table is the following:
Column Name | Column Type | Description
--------------+--------------+---------------
ID | BIGINT | (Primary) Unique ID for each showtime (perhaps unnecessary?)
CinemaID | TINYINT | Foreign key bound to Cinemas.ID
MovieID | BIGINT | Foreign key bound to Movies.ID
Showtime | DATETIME | At what date and time the movie will show
(will contain multiple rows for each movie, i.e. one row for each showtime)
How this table will be used
A user of the website must be able to:
View all current/upcoming movies and showtimes (sorted by date) in the selected city.
Example query (backend):
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? ORDER BY Showtime
Select a single movie and view all showtimes for that specific title only (in the selected city).
Example query:
SELECT Showtime FROM Showtimes WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime
Select a single day and view all movies and showtimes for that day only (in the selected city).
Example query:
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM])
So naturally I decided that I needed to create indexes for the columns.
Problem
What I'm having trouble with is deciding/determining how to index the columns properly. One index for each column seems quite expensive[1][2] so I started looking into composite indexes, which seems to be the right choice but also led to even more confusion.
From my understanding (based on what I've read) you should add the columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?) column the first in the composite index[3] (in my case that would be the Showtime
column). The only problem with that is that the index can only be used by the database if the first column is included in the search query[4][5], which it currently isn't in either of my queries.
Question
What kind of index(es) should I apply to my columns in order to cover all usage scenarios? (the last scenario may be omitted, but the first two are required)
Should I use a composite index on all columns, for some columns, or do I need a separate index for each column?
This table is updated at most a few times per week to add new showtimes.
Footnotes
1MySQL indexes - what are the best practices?
2Indexing every column in a table
3How important is the order of columns in indexes? (question)
4How important is the order of columns in indexes? (#2 top-voted answer)
5When should I use a composite index?
mysql database-design index innodb mariadb
New contributor
Background
I'm working on a website for a movie theather chain currently located in four different cities (might expand in the future). They use the same, single-database website for all cities, which means I have to have a column in certain tables which holds the ID of the city that each row belongs to.
Right now I have three different tables:
Cinemas
- Contains each city's cinema (ID and name).
Movies
- Contains all movies that has been/will be shown at the cinema.
Showtimes
- Contains all showtimes for all movies in all cities.
The structure of the Showtimes
table is the following:
Column Name | Column Type | Description
--------------+--------------+---------------
ID | BIGINT | (Primary) Unique ID for each showtime (perhaps unnecessary?)
CinemaID | TINYINT | Foreign key bound to Cinemas.ID
MovieID | BIGINT | Foreign key bound to Movies.ID
Showtime | DATETIME | At what date and time the movie will show
(will contain multiple rows for each movie, i.e. one row for each showtime)
How this table will be used
A user of the website must be able to:
View all current/upcoming movies and showtimes (sorted by date) in the selected city.
Example query (backend):
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? ORDER BY Showtime
Select a single movie and view all showtimes for that specific title only (in the selected city).
Example query:
SELECT Showtime FROM Showtimes WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime
Select a single day and view all movies and showtimes for that day only (in the selected city).
Example query:
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM])
So naturally I decided that I needed to create indexes for the columns.
Problem
What I'm having trouble with is deciding/determining how to index the columns properly. One index for each column seems quite expensive[1][2] so I started looking into composite indexes, which seems to be the right choice but also led to even more confusion.
From my understanding (based on what I've read) you should add the columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?) column the first in the composite index[3] (in my case that would be the Showtime
column). The only problem with that is that the index can only be used by the database if the first column is included in the search query[4][5], which it currently isn't in either of my queries.
Question
What kind of index(es) should I apply to my columns in order to cover all usage scenarios? (the last scenario may be omitted, but the first two are required)
Should I use a composite index on all columns, for some columns, or do I need a separate index for each column?
This table is updated at most a few times per week to add new showtimes.
Footnotes
1MySQL indexes - what are the best practices?
2Indexing every column in a table
3How important is the order of columns in indexes? (question)
4How important is the order of columns in indexes? (#2 top-voted answer)
5When should I use a composite index?
mysql database-design index innodb mariadb
mysql database-design index innodb mariadb
New contributor
New contributor
edited 7 hours ago
Visual Vincent
New contributor
asked 8 hours ago
Visual VincentVisual Vincent
1236
1236
New contributor
New contributor
"one row for each showtime" -- What about a cinaplex that has multiple screens with the same movie at the same time?
– Rick James
8 hours ago
@RickJames : This chain (which is relatively small) currently only have one screen in each theater. But in case they'd get a second, would it be a problem to add another column for that without an index if we don't need to search for it (only retreive it)?
– Visual Vincent
8 hours ago
2
(Upvote for a well-written Question.)
– Rick James
7 hours ago
add a comment |
"one row for each showtime" -- What about a cinaplex that has multiple screens with the same movie at the same time?
– Rick James
8 hours ago
@RickJames : This chain (which is relatively small) currently only have one screen in each theater. But in case they'd get a second, would it be a problem to add another column for that without an index if we don't need to search for it (only retreive it)?
– Visual Vincent
8 hours ago
2
(Upvote for a well-written Question.)
– Rick James
7 hours ago
"one row for each showtime" -- What about a cinaplex that has multiple screens with the same movie at the same time?
– Rick James
8 hours ago
"one row for each showtime" -- What about a cinaplex that has multiple screens with the same movie at the same time?
– Rick James
8 hours ago
@RickJames : This chain (which is relatively small) currently only have one screen in each theater. But in case they'd get a second, would it be a problem to add another column for that without an index if we don't need to search for it (only retreive it)?
– Visual Vincent
8 hours ago
@RickJames : This chain (which is relatively small) currently only have one screen in each theater. But in case they'd get a second, would it be a problem to add another column for that without an index if we don't need to search for it (only retreive it)?
– Visual Vincent
8 hours ago
2
2
(Upvote for a well-written Question.)
– Rick James
7 hours ago
(Upvote for a well-written Question.)
– Rick James
7 hours ago
add a comment |
3 Answers
3
active
oldest
votes
Composite Primary Key
I would define the primary key as a composite key of (CinemaID, MovieID, Showtime)
.
These 3 columns uniquely identify each row, and so having a separate ID
column is not necessary.
Composite (Secondary) Index
With this PK, the only additional index you will need for your queries is (CinemaID, Showtime)
.
Why these indexes?
A good way to think about indexes are used is to think of them as the order of columns in a spreadsheet.
Imagine a spreadsheed with (CinemaID, MovieID, Showtime)
that is sorted by each column successively.
All your queries have CinemaID
present, which means you can quickly find the "section" of the spreadsheet for that CinemaID
. Then, for your query that searches by MovieID
, you can easily find the "subsection" in the 2nd column, where MovieID
matches the searched for value.
As the 3rd column of Showtime
is also sorted, you can imagine how quickly and easily it is to find all those show times for that movie, in that cinema. The DBMS does things in a similar way and can retrieve those results extremely quickly.
As for your other queries, they all start with CinemaID
and then use Showtime
in some manner. They also need the MovieID
in their results.
So, the (CinemaID, Showtime)
index has you covered there. Again, the CinemaID
easily finds the "section" of the spreadsheet (in the analogy), and all possible showtimes (and there are going to be duplicates, assuming there is more than one screen) will be listed in order and are easily searched and/or sorted by those values.
Even better, since your primary key includes MovieID
, that column is included for all secondary indexes after the defined columns (at least for MySQL InnoDB - others engines too, but not necessarily all.)
Think of that is a 3rd column in the "spreadsheet" of our secondary index. The reason the column exists is to have all parts of the primary key available to do a lookup to the main table (aka the clustered index, in InnoDB), if necessary. In this simple case, no lookup is needed, so it is even more efficient as it doesn't require that double lookup.
Using just this primary key and a single secondary index, you should obtain excellent performance on any of the queries you listed.
Afterthought
My assumption that this uniquely identifies each row may NOT be correct if you have a movie showing on multiple screens at the same time. If you want to be able to separately identify those screens, then my solution is not best (I can provide another solution for that situation, please just let me know.)
Thanks for your answer and the thorough explanation! Currently the theather chain has only one screen in each theater. But would it, in case they got a second, be a problem to add another column for that without an index, if we don't need to search for/sort by the screen ID (only retreive it)?
– Visual Vincent
8 hours ago
1
Adding a screenID won't be an issue UNLESS you need to have the situation where, at a given cinema, there is the same movie playing on both screens, AND starting at the same time. In that case, the primary key of(CinemaID, MovieID, Showtime)
does not permit you to enter both ScreenIDs, because only one record for each combination will exist. However, that said, and given that you say you don't need to sort by it, you CAN addScreenID
as the 4th column of the primary key. In that case, you can store both identical showings, and still differentiate by screen. So, either way you are fine.
– Willem Renzema
7 hours ago
add a comment |
WHERE CinemaID = ? ORDER BY Showtime -- and
WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM]) -- need:
INDEX(CinemaID, Showtime)
WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime -- needs:
INDEX(CinemaID, MovieId, Showtime) -- or
INDEX(MovieId, CinemaID, Showtime)
Assuming that the the triple (MovieId, CinemaID, Showtime) is unique, I suggest getting rid of id
and having
PRIMARY KEY(CinemaID, MovieId, Showtime)
INDEX(CinemaID, Showtime)
Are there cases where the main part of the WHERE
is MovieID=...
?
Cinemas - A list of each city and its cinema (ID and name):
SELECT Cinema, CinemaID FROM Cimemas; -- (no index needed)
Movies - A list of movies that has been/will be shown at the cinema.
SELECT DISTINCT MovieID FROM ShowTimes WHERE CinemaID=...
INDEX(CinemaID, MovieID) -- already handled by my proposed PK
Showtimes - A list of all showtimes for all movies in all cities. -- This is a huge output; rethink the requirement for it. That is, think about what the Client will do with it.
Most of these indexes could be deduced from studying indexing cookbook and composite indexing
Note how I did not suggest indexes (single-column vs composite, plus specifics) until I saw the SELECTs
.
"columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?)" -- No. Selectivity is not the key for designing a composite index. Start with all the =
columns, in any order. (My Cookbook dwells on this topic.)
"index can only be used by the table if the first column is included in the search query" -- Mostly true. Notice how I recommended 2 indexes (remember: the PK is an index). There are cases where an index can be used for GROUP BY
or ORDER BY
, ignoring the WHERE
; but those are rare.
"first in the composite index ... Showtime" -- It is usually counterproductive to put a DATETIME
first in a composite index. In particular, your 3rd query can use both columns of INDEX(CinemaID, Showtime)
, but not both columns of INDEX(Showtime, CinemaID)
. It is easy to see this: Think of writing out two lists of cinemas and showtimes. Have one list sorted first on cinema (a la INDEX(cinema, time)
; have the other sorted on time. Think about which one would have all the rows for a particular cinema over a time range clumped ("clustered") together.
If there is also a ScreenID
due to two screens showing the same movie at the same time, tack it onto the PK. However, all the indexes will need rethinking.
(Sorry, Willem, I wrote my Answer before reading yours -- We say almost the same stuff.)
Sorry for the confusion. The first three bullet points are just a description of what tables I have and what they contain (i.e. "A list of all showtimes for all movies in all cities" means the table contains all the showtimes across the entire website, but they're not output all at once - they are what's queried by the next three bullet points located under "On the site the user must be able to").
– Visual Vincent
7 hours ago
Edited question for (hopefully) better clarity. Now, I have a couple questions regarding your answer: 1) "Are there cases where the main part of the WHERE is MovieID=...?" - Do you mean when it's only checkingMovieID
or when it's checkingMovieID
first? (if the former: Not at the moment || if the latter: Also not at the moment, how exactly does it make things different?)
– Visual Vincent
7 hours ago
1
@VisualVincent - The order of columns in an index is important. If, for example,ScreenID
is stuck in the middle of the index, but not used in theWHERE
, the rest of that index is useless for thatWHERE
.
– Rick James
7 hours ago
1
@VisualVincent - A messy query: "Show all Tuesday matinee showings of G-rated movies." The likely technique is to index some of the stuff, then work harder to filter the rest of criteria.
– Rick James
7 hours ago
1
@VisualVincent - Yes,ScreenID
last.
– Rick James
7 hours ago
|
show 6 more comments
First I would definitely not define the primary key as a composite key of (CinemaID, MovieID, Showtime)
. Think of what that means: if you want to link an entry you always need those data instead of just an id.
When you create a database I would recommend you to stick to the rules of normalization. They were made because of a reason.
I understand that you want to create it as fast and memory-saving as possible, but we have no idea yet were the bottle-neck will be.
Just create the scheme like you would do it perfectly normalized. Then fill it with test data (a few million entries) and see if the performance seriously is affected.
When you do that you also have some numbers like "this query takes 2 times more memory than the others", and then you can have a look at this.
Currently this smells of premature optimization .
Performance in software is like taste in food: you can guess if it is good, but in the end you have to try it anyway.
New contributor
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Visual Vincent is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e) {
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom)) {
StackExchange.using('gps', function() { StackExchange.gps.track('embedded_signup_form.view', { location: 'question_page' }); });
$window.unbind('scroll', onScroll);
}
};
$window.on('scroll', onScroll);
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f229064%2fwhich-columns-should-be-indexed-when-all-may-be-used-in-different-search-queries%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Composite Primary Key
I would define the primary key as a composite key of (CinemaID, MovieID, Showtime)
.
These 3 columns uniquely identify each row, and so having a separate ID
column is not necessary.
Composite (Secondary) Index
With this PK, the only additional index you will need for your queries is (CinemaID, Showtime)
.
Why these indexes?
A good way to think about indexes are used is to think of them as the order of columns in a spreadsheet.
Imagine a spreadsheed with (CinemaID, MovieID, Showtime)
that is sorted by each column successively.
All your queries have CinemaID
present, which means you can quickly find the "section" of the spreadsheet for that CinemaID
. Then, for your query that searches by MovieID
, you can easily find the "subsection" in the 2nd column, where MovieID
matches the searched for value.
As the 3rd column of Showtime
is also sorted, you can imagine how quickly and easily it is to find all those show times for that movie, in that cinema. The DBMS does things in a similar way and can retrieve those results extremely quickly.
As for your other queries, they all start with CinemaID
and then use Showtime
in some manner. They also need the MovieID
in their results.
So, the (CinemaID, Showtime)
index has you covered there. Again, the CinemaID
easily finds the "section" of the spreadsheet (in the analogy), and all possible showtimes (and there are going to be duplicates, assuming there is more than one screen) will be listed in order and are easily searched and/or sorted by those values.
Even better, since your primary key includes MovieID
, that column is included for all secondary indexes after the defined columns (at least for MySQL InnoDB - others engines too, but not necessarily all.)
Think of that is a 3rd column in the "spreadsheet" of our secondary index. The reason the column exists is to have all parts of the primary key available to do a lookup to the main table (aka the clustered index, in InnoDB), if necessary. In this simple case, no lookup is needed, so it is even more efficient as it doesn't require that double lookup.
Using just this primary key and a single secondary index, you should obtain excellent performance on any of the queries you listed.
Afterthought
My assumption that this uniquely identifies each row may NOT be correct if you have a movie showing on multiple screens at the same time. If you want to be able to separately identify those screens, then my solution is not best (I can provide another solution for that situation, please just let me know.)
Thanks for your answer and the thorough explanation! Currently the theather chain has only one screen in each theater. But would it, in case they got a second, be a problem to add another column for that without an index, if we don't need to search for/sort by the screen ID (only retreive it)?
– Visual Vincent
8 hours ago
1
Adding a screenID won't be an issue UNLESS you need to have the situation where, at a given cinema, there is the same movie playing on both screens, AND starting at the same time. In that case, the primary key of(CinemaID, MovieID, Showtime)
does not permit you to enter both ScreenIDs, because only one record for each combination will exist. However, that said, and given that you say you don't need to sort by it, you CAN addScreenID
as the 4th column of the primary key. In that case, you can store both identical showings, and still differentiate by screen. So, either way you are fine.
– Willem Renzema
7 hours ago
add a comment |
Composite Primary Key
I would define the primary key as a composite key of (CinemaID, MovieID, Showtime)
.
These 3 columns uniquely identify each row, and so having a separate ID
column is not necessary.
Composite (Secondary) Index
With this PK, the only additional index you will need for your queries is (CinemaID, Showtime)
.
Why these indexes?
A good way to think about indexes are used is to think of them as the order of columns in a spreadsheet.
Imagine a spreadsheed with (CinemaID, MovieID, Showtime)
that is sorted by each column successively.
All your queries have CinemaID
present, which means you can quickly find the "section" of the spreadsheet for that CinemaID
. Then, for your query that searches by MovieID
, you can easily find the "subsection" in the 2nd column, where MovieID
matches the searched for value.
As the 3rd column of Showtime
is also sorted, you can imagine how quickly and easily it is to find all those show times for that movie, in that cinema. The DBMS does things in a similar way and can retrieve those results extremely quickly.
As for your other queries, they all start with CinemaID
and then use Showtime
in some manner. They also need the MovieID
in their results.
So, the (CinemaID, Showtime)
index has you covered there. Again, the CinemaID
easily finds the "section" of the spreadsheet (in the analogy), and all possible showtimes (and there are going to be duplicates, assuming there is more than one screen) will be listed in order and are easily searched and/or sorted by those values.
Even better, since your primary key includes MovieID
, that column is included for all secondary indexes after the defined columns (at least for MySQL InnoDB - others engines too, but not necessarily all.)
Think of that is a 3rd column in the "spreadsheet" of our secondary index. The reason the column exists is to have all parts of the primary key available to do a lookup to the main table (aka the clustered index, in InnoDB), if necessary. In this simple case, no lookup is needed, so it is even more efficient as it doesn't require that double lookup.
Using just this primary key and a single secondary index, you should obtain excellent performance on any of the queries you listed.
Afterthought
My assumption that this uniquely identifies each row may NOT be correct if you have a movie showing on multiple screens at the same time. If you want to be able to separately identify those screens, then my solution is not best (I can provide another solution for that situation, please just let me know.)
Thanks for your answer and the thorough explanation! Currently the theather chain has only one screen in each theater. But would it, in case they got a second, be a problem to add another column for that without an index, if we don't need to search for/sort by the screen ID (only retreive it)?
– Visual Vincent
8 hours ago
1
Adding a screenID won't be an issue UNLESS you need to have the situation where, at a given cinema, there is the same movie playing on both screens, AND starting at the same time. In that case, the primary key of(CinemaID, MovieID, Showtime)
does not permit you to enter both ScreenIDs, because only one record for each combination will exist. However, that said, and given that you say you don't need to sort by it, you CAN addScreenID
as the 4th column of the primary key. In that case, you can store both identical showings, and still differentiate by screen. So, either way you are fine.
– Willem Renzema
7 hours ago
add a comment |
Composite Primary Key
I would define the primary key as a composite key of (CinemaID, MovieID, Showtime)
.
These 3 columns uniquely identify each row, and so having a separate ID
column is not necessary.
Composite (Secondary) Index
With this PK, the only additional index you will need for your queries is (CinemaID, Showtime)
.
Why these indexes?
A good way to think about indexes are used is to think of them as the order of columns in a spreadsheet.
Imagine a spreadsheed with (CinemaID, MovieID, Showtime)
that is sorted by each column successively.
All your queries have CinemaID
present, which means you can quickly find the "section" of the spreadsheet for that CinemaID
. Then, for your query that searches by MovieID
, you can easily find the "subsection" in the 2nd column, where MovieID
matches the searched for value.
As the 3rd column of Showtime
is also sorted, you can imagine how quickly and easily it is to find all those show times for that movie, in that cinema. The DBMS does things in a similar way and can retrieve those results extremely quickly.
As for your other queries, they all start with CinemaID
and then use Showtime
in some manner. They also need the MovieID
in their results.
So, the (CinemaID, Showtime)
index has you covered there. Again, the CinemaID
easily finds the "section" of the spreadsheet (in the analogy), and all possible showtimes (and there are going to be duplicates, assuming there is more than one screen) will be listed in order and are easily searched and/or sorted by those values.
Even better, since your primary key includes MovieID
, that column is included for all secondary indexes after the defined columns (at least for MySQL InnoDB - others engines too, but not necessarily all.)
Think of that is a 3rd column in the "spreadsheet" of our secondary index. The reason the column exists is to have all parts of the primary key available to do a lookup to the main table (aka the clustered index, in InnoDB), if necessary. In this simple case, no lookup is needed, so it is even more efficient as it doesn't require that double lookup.
Using just this primary key and a single secondary index, you should obtain excellent performance on any of the queries you listed.
Afterthought
My assumption that this uniquely identifies each row may NOT be correct if you have a movie showing on multiple screens at the same time. If you want to be able to separately identify those screens, then my solution is not best (I can provide another solution for that situation, please just let me know.)
Composite Primary Key
I would define the primary key as a composite key of (CinemaID, MovieID, Showtime)
.
These 3 columns uniquely identify each row, and so having a separate ID
column is not necessary.
Composite (Secondary) Index
With this PK, the only additional index you will need for your queries is (CinemaID, Showtime)
.
Why these indexes?
A good way to think about indexes are used is to think of them as the order of columns in a spreadsheet.
Imagine a spreadsheed with (CinemaID, MovieID, Showtime)
that is sorted by each column successively.
All your queries have CinemaID
present, which means you can quickly find the "section" of the spreadsheet for that CinemaID
. Then, for your query that searches by MovieID
, you can easily find the "subsection" in the 2nd column, where MovieID
matches the searched for value.
As the 3rd column of Showtime
is also sorted, you can imagine how quickly and easily it is to find all those show times for that movie, in that cinema. The DBMS does things in a similar way and can retrieve those results extremely quickly.
As for your other queries, they all start with CinemaID
and then use Showtime
in some manner. They also need the MovieID
in their results.
So, the (CinemaID, Showtime)
index has you covered there. Again, the CinemaID
easily finds the "section" of the spreadsheet (in the analogy), and all possible showtimes (and there are going to be duplicates, assuming there is more than one screen) will be listed in order and are easily searched and/or sorted by those values.
Even better, since your primary key includes MovieID
, that column is included for all secondary indexes after the defined columns (at least for MySQL InnoDB - others engines too, but not necessarily all.)
Think of that is a 3rd column in the "spreadsheet" of our secondary index. The reason the column exists is to have all parts of the primary key available to do a lookup to the main table (aka the clustered index, in InnoDB), if necessary. In this simple case, no lookup is needed, so it is even more efficient as it doesn't require that double lookup.
Using just this primary key and a single secondary index, you should obtain excellent performance on any of the queries you listed.
Afterthought
My assumption that this uniquely identifies each row may NOT be correct if you have a movie showing on multiple screens at the same time. If you want to be able to separately identify those screens, then my solution is not best (I can provide another solution for that situation, please just let me know.)
edited 8 hours ago
answered 8 hours ago
Willem RenzemaWillem Renzema
1,173168
1,173168
Thanks for your answer and the thorough explanation! Currently the theather chain has only one screen in each theater. But would it, in case they got a second, be a problem to add another column for that without an index, if we don't need to search for/sort by the screen ID (only retreive it)?
– Visual Vincent
8 hours ago
1
Adding a screenID won't be an issue UNLESS you need to have the situation where, at a given cinema, there is the same movie playing on both screens, AND starting at the same time. In that case, the primary key of(CinemaID, MovieID, Showtime)
does not permit you to enter both ScreenIDs, because only one record for each combination will exist. However, that said, and given that you say you don't need to sort by it, you CAN addScreenID
as the 4th column of the primary key. In that case, you can store both identical showings, and still differentiate by screen. So, either way you are fine.
– Willem Renzema
7 hours ago
add a comment |
Thanks for your answer and the thorough explanation! Currently the theather chain has only one screen in each theater. But would it, in case they got a second, be a problem to add another column for that without an index, if we don't need to search for/sort by the screen ID (only retreive it)?
– Visual Vincent
8 hours ago
1
Adding a screenID won't be an issue UNLESS you need to have the situation where, at a given cinema, there is the same movie playing on both screens, AND starting at the same time. In that case, the primary key of(CinemaID, MovieID, Showtime)
does not permit you to enter both ScreenIDs, because only one record for each combination will exist. However, that said, and given that you say you don't need to sort by it, you CAN addScreenID
as the 4th column of the primary key. In that case, you can store both identical showings, and still differentiate by screen. So, either way you are fine.
– Willem Renzema
7 hours ago
Thanks for your answer and the thorough explanation! Currently the theather chain has only one screen in each theater. But would it, in case they got a second, be a problem to add another column for that without an index, if we don't need to search for/sort by the screen ID (only retreive it)?
– Visual Vincent
8 hours ago
Thanks for your answer and the thorough explanation! Currently the theather chain has only one screen in each theater. But would it, in case they got a second, be a problem to add another column for that without an index, if we don't need to search for/sort by the screen ID (only retreive it)?
– Visual Vincent
8 hours ago
1
1
Adding a screenID won't be an issue UNLESS you need to have the situation where, at a given cinema, there is the same movie playing on both screens, AND starting at the same time. In that case, the primary key of
(CinemaID, MovieID, Showtime)
does not permit you to enter both ScreenIDs, because only one record for each combination will exist. However, that said, and given that you say you don't need to sort by it, you CAN add ScreenID
as the 4th column of the primary key. In that case, you can store both identical showings, and still differentiate by screen. So, either way you are fine.– Willem Renzema
7 hours ago
Adding a screenID won't be an issue UNLESS you need to have the situation where, at a given cinema, there is the same movie playing on both screens, AND starting at the same time. In that case, the primary key of
(CinemaID, MovieID, Showtime)
does not permit you to enter both ScreenIDs, because only one record for each combination will exist. However, that said, and given that you say you don't need to sort by it, you CAN add ScreenID
as the 4th column of the primary key. In that case, you can store both identical showings, and still differentiate by screen. So, either way you are fine.– Willem Renzema
7 hours ago
add a comment |
WHERE CinemaID = ? ORDER BY Showtime -- and
WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM]) -- need:
INDEX(CinemaID, Showtime)
WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime -- needs:
INDEX(CinemaID, MovieId, Showtime) -- or
INDEX(MovieId, CinemaID, Showtime)
Assuming that the the triple (MovieId, CinemaID, Showtime) is unique, I suggest getting rid of id
and having
PRIMARY KEY(CinemaID, MovieId, Showtime)
INDEX(CinemaID, Showtime)
Are there cases where the main part of the WHERE
is MovieID=...
?
Cinemas - A list of each city and its cinema (ID and name):
SELECT Cinema, CinemaID FROM Cimemas; -- (no index needed)
Movies - A list of movies that has been/will be shown at the cinema.
SELECT DISTINCT MovieID FROM ShowTimes WHERE CinemaID=...
INDEX(CinemaID, MovieID) -- already handled by my proposed PK
Showtimes - A list of all showtimes for all movies in all cities. -- This is a huge output; rethink the requirement for it. That is, think about what the Client will do with it.
Most of these indexes could be deduced from studying indexing cookbook and composite indexing
Note how I did not suggest indexes (single-column vs composite, plus specifics) until I saw the SELECTs
.
"columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?)" -- No. Selectivity is not the key for designing a composite index. Start with all the =
columns, in any order. (My Cookbook dwells on this topic.)
"index can only be used by the table if the first column is included in the search query" -- Mostly true. Notice how I recommended 2 indexes (remember: the PK is an index). There are cases where an index can be used for GROUP BY
or ORDER BY
, ignoring the WHERE
; but those are rare.
"first in the composite index ... Showtime" -- It is usually counterproductive to put a DATETIME
first in a composite index. In particular, your 3rd query can use both columns of INDEX(CinemaID, Showtime)
, but not both columns of INDEX(Showtime, CinemaID)
. It is easy to see this: Think of writing out two lists of cinemas and showtimes. Have one list sorted first on cinema (a la INDEX(cinema, time)
; have the other sorted on time. Think about which one would have all the rows for a particular cinema over a time range clumped ("clustered") together.
If there is also a ScreenID
due to two screens showing the same movie at the same time, tack it onto the PK. However, all the indexes will need rethinking.
(Sorry, Willem, I wrote my Answer before reading yours -- We say almost the same stuff.)
Sorry for the confusion. The first three bullet points are just a description of what tables I have and what they contain (i.e. "A list of all showtimes for all movies in all cities" means the table contains all the showtimes across the entire website, but they're not output all at once - they are what's queried by the next three bullet points located under "On the site the user must be able to").
– Visual Vincent
7 hours ago
Edited question for (hopefully) better clarity. Now, I have a couple questions regarding your answer: 1) "Are there cases where the main part of the WHERE is MovieID=...?" - Do you mean when it's only checkingMovieID
or when it's checkingMovieID
first? (if the former: Not at the moment || if the latter: Also not at the moment, how exactly does it make things different?)
– Visual Vincent
7 hours ago
1
@VisualVincent - The order of columns in an index is important. If, for example,ScreenID
is stuck in the middle of the index, but not used in theWHERE
, the rest of that index is useless for thatWHERE
.
– Rick James
7 hours ago
1
@VisualVincent - A messy query: "Show all Tuesday matinee showings of G-rated movies." The likely technique is to index some of the stuff, then work harder to filter the rest of criteria.
– Rick James
7 hours ago
1
@VisualVincent - Yes,ScreenID
last.
– Rick James
7 hours ago
|
show 6 more comments
WHERE CinemaID = ? ORDER BY Showtime -- and
WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM]) -- need:
INDEX(CinemaID, Showtime)
WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime -- needs:
INDEX(CinemaID, MovieId, Showtime) -- or
INDEX(MovieId, CinemaID, Showtime)
Assuming that the the triple (MovieId, CinemaID, Showtime) is unique, I suggest getting rid of id
and having
PRIMARY KEY(CinemaID, MovieId, Showtime)
INDEX(CinemaID, Showtime)
Are there cases where the main part of the WHERE
is MovieID=...
?
Cinemas - A list of each city and its cinema (ID and name):
SELECT Cinema, CinemaID FROM Cimemas; -- (no index needed)
Movies - A list of movies that has been/will be shown at the cinema.
SELECT DISTINCT MovieID FROM ShowTimes WHERE CinemaID=...
INDEX(CinemaID, MovieID) -- already handled by my proposed PK
Showtimes - A list of all showtimes for all movies in all cities. -- This is a huge output; rethink the requirement for it. That is, think about what the Client will do with it.
Most of these indexes could be deduced from studying indexing cookbook and composite indexing
Note how I did not suggest indexes (single-column vs composite, plus specifics) until I saw the SELECTs
.
"columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?)" -- No. Selectivity is not the key for designing a composite index. Start with all the =
columns, in any order. (My Cookbook dwells on this topic.)
"index can only be used by the table if the first column is included in the search query" -- Mostly true. Notice how I recommended 2 indexes (remember: the PK is an index). There are cases where an index can be used for GROUP BY
or ORDER BY
, ignoring the WHERE
; but those are rare.
"first in the composite index ... Showtime" -- It is usually counterproductive to put a DATETIME
first in a composite index. In particular, your 3rd query can use both columns of INDEX(CinemaID, Showtime)
, but not both columns of INDEX(Showtime, CinemaID)
. It is easy to see this: Think of writing out two lists of cinemas and showtimes. Have one list sorted first on cinema (a la INDEX(cinema, time)
; have the other sorted on time. Think about which one would have all the rows for a particular cinema over a time range clumped ("clustered") together.
If there is also a ScreenID
due to two screens showing the same movie at the same time, tack it onto the PK. However, all the indexes will need rethinking.
(Sorry, Willem, I wrote my Answer before reading yours -- We say almost the same stuff.)
Sorry for the confusion. The first three bullet points are just a description of what tables I have and what they contain (i.e. "A list of all showtimes for all movies in all cities" means the table contains all the showtimes across the entire website, but they're not output all at once - they are what's queried by the next three bullet points located under "On the site the user must be able to").
– Visual Vincent
7 hours ago
Edited question for (hopefully) better clarity. Now, I have a couple questions regarding your answer: 1) "Are there cases where the main part of the WHERE is MovieID=...?" - Do you mean when it's only checkingMovieID
or when it's checkingMovieID
first? (if the former: Not at the moment || if the latter: Also not at the moment, how exactly does it make things different?)
– Visual Vincent
7 hours ago
1
@VisualVincent - The order of columns in an index is important. If, for example,ScreenID
is stuck in the middle of the index, but not used in theWHERE
, the rest of that index is useless for thatWHERE
.
– Rick James
7 hours ago
1
@VisualVincent - A messy query: "Show all Tuesday matinee showings of G-rated movies." The likely technique is to index some of the stuff, then work harder to filter the rest of criteria.
– Rick James
7 hours ago
1
@VisualVincent - Yes,ScreenID
last.
– Rick James
7 hours ago
|
show 6 more comments
WHERE CinemaID = ? ORDER BY Showtime -- and
WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM]) -- need:
INDEX(CinemaID, Showtime)
WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime -- needs:
INDEX(CinemaID, MovieId, Showtime) -- or
INDEX(MovieId, CinemaID, Showtime)
Assuming that the the triple (MovieId, CinemaID, Showtime) is unique, I suggest getting rid of id
and having
PRIMARY KEY(CinemaID, MovieId, Showtime)
INDEX(CinemaID, Showtime)
Are there cases where the main part of the WHERE
is MovieID=...
?
Cinemas - A list of each city and its cinema (ID and name):
SELECT Cinema, CinemaID FROM Cimemas; -- (no index needed)
Movies - A list of movies that has been/will be shown at the cinema.
SELECT DISTINCT MovieID FROM ShowTimes WHERE CinemaID=...
INDEX(CinemaID, MovieID) -- already handled by my proposed PK
Showtimes - A list of all showtimes for all movies in all cities. -- This is a huge output; rethink the requirement for it. That is, think about what the Client will do with it.
Most of these indexes could be deduced from studying indexing cookbook and composite indexing
Note how I did not suggest indexes (single-column vs composite, plus specifics) until I saw the SELECTs
.
"columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?)" -- No. Selectivity is not the key for designing a composite index. Start with all the =
columns, in any order. (My Cookbook dwells on this topic.)
"index can only be used by the table if the first column is included in the search query" -- Mostly true. Notice how I recommended 2 indexes (remember: the PK is an index). There are cases where an index can be used for GROUP BY
or ORDER BY
, ignoring the WHERE
; but those are rare.
"first in the composite index ... Showtime" -- It is usually counterproductive to put a DATETIME
first in a composite index. In particular, your 3rd query can use both columns of INDEX(CinemaID, Showtime)
, but not both columns of INDEX(Showtime, CinemaID)
. It is easy to see this: Think of writing out two lists of cinemas and showtimes. Have one list sorted first on cinema (a la INDEX(cinema, time)
; have the other sorted on time. Think about which one would have all the rows for a particular cinema over a time range clumped ("clustered") together.
If there is also a ScreenID
due to two screens showing the same movie at the same time, tack it onto the PK. However, all the indexes will need rethinking.
(Sorry, Willem, I wrote my Answer before reading yours -- We say almost the same stuff.)
WHERE CinemaID = ? ORDER BY Showtime -- and
WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM]) -- need:
INDEX(CinemaID, Showtime)
WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime -- needs:
INDEX(CinemaID, MovieId, Showtime) -- or
INDEX(MovieId, CinemaID, Showtime)
Assuming that the the triple (MovieId, CinemaID, Showtime) is unique, I suggest getting rid of id
and having
PRIMARY KEY(CinemaID, MovieId, Showtime)
INDEX(CinemaID, Showtime)
Are there cases where the main part of the WHERE
is MovieID=...
?
Cinemas - A list of each city and its cinema (ID and name):
SELECT Cinema, CinemaID FROM Cimemas; -- (no index needed)
Movies - A list of movies that has been/will be shown at the cinema.
SELECT DISTINCT MovieID FROM ShowTimes WHERE CinemaID=...
INDEX(CinemaID, MovieID) -- already handled by my proposed PK
Showtimes - A list of all showtimes for all movies in all cities. -- This is a huge output; rethink the requirement for it. That is, think about what the Client will do with it.
Most of these indexes could be deduced from studying indexing cookbook and composite indexing
Note how I did not suggest indexes (single-column vs composite, plus specifics) until I saw the SELECTs
.
"columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?)" -- No. Selectivity is not the key for designing a composite index. Start with all the =
columns, in any order. (My Cookbook dwells on this topic.)
"index can only be used by the table if the first column is included in the search query" -- Mostly true. Notice how I recommended 2 indexes (remember: the PK is an index). There are cases where an index can be used for GROUP BY
or ORDER BY
, ignoring the WHERE
; but those are rare.
"first in the composite index ... Showtime" -- It is usually counterproductive to put a DATETIME
first in a composite index. In particular, your 3rd query can use both columns of INDEX(CinemaID, Showtime)
, but not both columns of INDEX(Showtime, CinemaID)
. It is easy to see this: Think of writing out two lists of cinemas and showtimes. Have one list sorted first on cinema (a la INDEX(cinema, time)
; have the other sorted on time. Think about which one would have all the rows for a particular cinema over a time range clumped ("clustered") together.
If there is also a ScreenID
due to two screens showing the same movie at the same time, tack it onto the PK. However, all the indexes will need rethinking.
(Sorry, Willem, I wrote my Answer before reading yours -- We say almost the same stuff.)
edited 6 hours ago
answered 8 hours ago
Rick JamesRick James
42.3k22258
42.3k22258
Sorry for the confusion. The first three bullet points are just a description of what tables I have and what they contain (i.e. "A list of all showtimes for all movies in all cities" means the table contains all the showtimes across the entire website, but they're not output all at once - they are what's queried by the next three bullet points located under "On the site the user must be able to").
– Visual Vincent
7 hours ago
Edited question for (hopefully) better clarity. Now, I have a couple questions regarding your answer: 1) "Are there cases where the main part of the WHERE is MovieID=...?" - Do you mean when it's only checkingMovieID
or when it's checkingMovieID
first? (if the former: Not at the moment || if the latter: Also not at the moment, how exactly does it make things different?)
– Visual Vincent
7 hours ago
1
@VisualVincent - The order of columns in an index is important. If, for example,ScreenID
is stuck in the middle of the index, but not used in theWHERE
, the rest of that index is useless for thatWHERE
.
– Rick James
7 hours ago
1
@VisualVincent - A messy query: "Show all Tuesday matinee showings of G-rated movies." The likely technique is to index some of the stuff, then work harder to filter the rest of criteria.
– Rick James
7 hours ago
1
@VisualVincent - Yes,ScreenID
last.
– Rick James
7 hours ago
|
show 6 more comments
Sorry for the confusion. The first three bullet points are just a description of what tables I have and what they contain (i.e. "A list of all showtimes for all movies in all cities" means the table contains all the showtimes across the entire website, but they're not output all at once - they are what's queried by the next three bullet points located under "On the site the user must be able to").
– Visual Vincent
7 hours ago
Edited question for (hopefully) better clarity. Now, I have a couple questions regarding your answer: 1) "Are there cases where the main part of the WHERE is MovieID=...?" - Do you mean when it's only checkingMovieID
or when it's checkingMovieID
first? (if the former: Not at the moment || if the latter: Also not at the moment, how exactly does it make things different?)
– Visual Vincent
7 hours ago
1
@VisualVincent - The order of columns in an index is important. If, for example,ScreenID
is stuck in the middle of the index, but not used in theWHERE
, the rest of that index is useless for thatWHERE
.
– Rick James
7 hours ago
1
@VisualVincent - A messy query: "Show all Tuesday matinee showings of G-rated movies." The likely technique is to index some of the stuff, then work harder to filter the rest of criteria.
– Rick James
7 hours ago
1
@VisualVincent - Yes,ScreenID
last.
– Rick James
7 hours ago
Sorry for the confusion. The first three bullet points are just a description of what tables I have and what they contain (i.e. "A list of all showtimes for all movies in all cities" means the table contains all the showtimes across the entire website, but they're not output all at once - they are what's queried by the next three bullet points located under "On the site the user must be able to").
– Visual Vincent
7 hours ago
Sorry for the confusion. The first three bullet points are just a description of what tables I have and what they contain (i.e. "A list of all showtimes for all movies in all cities" means the table contains all the showtimes across the entire website, but they're not output all at once - they are what's queried by the next three bullet points located under "On the site the user must be able to").
– Visual Vincent
7 hours ago
Edited question for (hopefully) better clarity. Now, I have a couple questions regarding your answer: 1) "Are there cases where the main part of the WHERE is MovieID=...?" - Do you mean when it's only checking
MovieID
or when it's checking MovieID
first? (if the former: Not at the moment || if the latter: Also not at the moment, how exactly does it make things different?)– Visual Vincent
7 hours ago
Edited question for (hopefully) better clarity. Now, I have a couple questions regarding your answer: 1) "Are there cases where the main part of the WHERE is MovieID=...?" - Do you mean when it's only checking
MovieID
or when it's checking MovieID
first? (if the former: Not at the moment || if the latter: Also not at the moment, how exactly does it make things different?)– Visual Vincent
7 hours ago
1
1
@VisualVincent - The order of columns in an index is important. If, for example,
ScreenID
is stuck in the middle of the index, but not used in the WHERE
, the rest of that index is useless for that WHERE
.– Rick James
7 hours ago
@VisualVincent - The order of columns in an index is important. If, for example,
ScreenID
is stuck in the middle of the index, but not used in the WHERE
, the rest of that index is useless for that WHERE
.– Rick James
7 hours ago
1
1
@VisualVincent - A messy query: "Show all Tuesday matinee showings of G-rated movies." The likely technique is to index some of the stuff, then work harder to filter the rest of criteria.
– Rick James
7 hours ago
@VisualVincent - A messy query: "Show all Tuesday matinee showings of G-rated movies." The likely technique is to index some of the stuff, then work harder to filter the rest of criteria.
– Rick James
7 hours ago
1
1
@VisualVincent - Yes,
ScreenID
last.– Rick James
7 hours ago
@VisualVincent - Yes,
ScreenID
last.– Rick James
7 hours ago
|
show 6 more comments
First I would definitely not define the primary key as a composite key of (CinemaID, MovieID, Showtime)
. Think of what that means: if you want to link an entry you always need those data instead of just an id.
When you create a database I would recommend you to stick to the rules of normalization. They were made because of a reason.
I understand that you want to create it as fast and memory-saving as possible, but we have no idea yet were the bottle-neck will be.
Just create the scheme like you would do it perfectly normalized. Then fill it with test data (a few million entries) and see if the performance seriously is affected.
When you do that you also have some numbers like "this query takes 2 times more memory than the others", and then you can have a look at this.
Currently this smells of premature optimization .
Performance in software is like taste in food: you can guess if it is good, but in the end you have to try it anyway.
New contributor
add a comment |
First I would definitely not define the primary key as a composite key of (CinemaID, MovieID, Showtime)
. Think of what that means: if you want to link an entry you always need those data instead of just an id.
When you create a database I would recommend you to stick to the rules of normalization. They were made because of a reason.
I understand that you want to create it as fast and memory-saving as possible, but we have no idea yet were the bottle-neck will be.
Just create the scheme like you would do it perfectly normalized. Then fill it with test data (a few million entries) and see if the performance seriously is affected.
When you do that you also have some numbers like "this query takes 2 times more memory than the others", and then you can have a look at this.
Currently this smells of premature optimization .
Performance in software is like taste in food: you can guess if it is good, but in the end you have to try it anyway.
New contributor
add a comment |
First I would definitely not define the primary key as a composite key of (CinemaID, MovieID, Showtime)
. Think of what that means: if you want to link an entry you always need those data instead of just an id.
When you create a database I would recommend you to stick to the rules of normalization. They were made because of a reason.
I understand that you want to create it as fast and memory-saving as possible, but we have no idea yet were the bottle-neck will be.
Just create the scheme like you would do it perfectly normalized. Then fill it with test data (a few million entries) and see if the performance seriously is affected.
When you do that you also have some numbers like "this query takes 2 times more memory than the others", and then you can have a look at this.
Currently this smells of premature optimization .
Performance in software is like taste in food: you can guess if it is good, but in the end you have to try it anyway.
New contributor
First I would definitely not define the primary key as a composite key of (CinemaID, MovieID, Showtime)
. Think of what that means: if you want to link an entry you always need those data instead of just an id.
When you create a database I would recommend you to stick to the rules of normalization. They were made because of a reason.
I understand that you want to create it as fast and memory-saving as possible, but we have no idea yet were the bottle-neck will be.
Just create the scheme like you would do it perfectly normalized. Then fill it with test data (a few million entries) and see if the performance seriously is affected.
When you do that you also have some numbers like "this query takes 2 times more memory than the others", and then you can have a look at this.
Currently this smells of premature optimization .
Performance in software is like taste in food: you can guess if it is good, but in the end you have to try it anyway.
New contributor
New contributor
answered 48 mins ago
EmarciEmarci
1
1
New contributor
New contributor
add a comment |
add a comment |
Visual Vincent is a new contributor. Be nice, and check out our Code of Conduct.
Visual Vincent is a new contributor. Be nice, and check out our Code of Conduct.
Visual Vincent is a new contributor. Be nice, and check out our Code of Conduct.
Visual Vincent is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e) {
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom)) {
StackExchange.using('gps', function() { StackExchange.gps.track('embedded_signup_form.view', { location: 'question_page' }); });
$window.unbind('scroll', onScroll);
}
};
$window.on('scroll', onScroll);
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f229064%2fwhich-columns-should-be-indexed-when-all-may-be-used-in-different-search-queries%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e) {
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom)) {
StackExchange.using('gps', function() { StackExchange.gps.track('embedded_signup_form.view', { location: 'question_page' }); });
$window.unbind('scroll', onScroll);
}
};
$window.on('scroll', onScroll);
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e) {
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom)) {
StackExchange.using('gps', function() { StackExchange.gps.track('embedded_signup_form.view', { location: 'question_page' }); });
$window.unbind('scroll', onScroll);
}
};
$window.on('scroll', onScroll);
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e) {
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom)) {
StackExchange.using('gps', function() { StackExchange.gps.track('embedded_signup_form.view', { location: 'question_page' }); });
$window.unbind('scroll', onScroll);
}
};
$window.on('scroll', onScroll);
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
"one row for each showtime" -- What about a cinaplex that has multiple screens with the same movie at the same time?
– Rick James
8 hours ago
@RickJames : This chain (which is relatively small) currently only have one screen in each theater. But in case they'd get a second, would it be a problem to add another column for that without an index if we don't need to search for it (only retreive it)?
– Visual Vincent
8 hours ago
2
(Upvote for a well-written Question.)
– Rick James
7 hours ago