By Peter Bell

A SImple SQL Question . . .

I *know* this is simple, but I'm not very good with SQL . . . A user has a job. I want to return user.FirstName and job.Salary. I want to join the User to their (single) job using a left outer join (so the unemployed still show up). I want to get a list of all users where lastname like 'b%'.

In theory a user can only have one job, but because I have a versioning system, I need to only return jobs where job.LiveVersion = 1 (so if a user has three versions of a job it should only return the single LiveVersion=1 of them and if a user is joined to a job which has records but where none of them are LiveVersion=1, the user should be returned, but without a job).

Naive SQL is SELECT user.FirstName, job.Salary FROM user INNER JOIN job ON user.JobID = job.JobID WHERE user.LastName LIKE 'b%' AND job.LiveVersion = 1.

The problem is that this then excludes any users who for whatever reason are joined to a job without a live version :-<

I'm trying to solve the generalized problem as part of an ORM, but if I could solve this I think I could generalize. Is it a subquery that is required?

Any help much appreciated! I'm running MSSQL 2000/2005 if dialect makes a difference.

Comments
Put the WHERE clause logic for the live version in the ON clause of the left outer join:

SELECT
user.FirstName,
job.Salary
FROM
user
INNER JOIN
job
ON
(
user.JobID = job.JobID
AND
job.LiveVersion = 1
)
WHERE
user.LastName LIKE 'b%' AND

This way, the JOIN itself needs to bee both the user-job join AND the live version. If either of those breaks (false) then only the left table (user).
# Posted By Ben Nadel | 4/21/07 11:38 AM
Duh. Perfect. That is soooo obvious now that you say it :->

Thanks Ben!
# Posted By Peter Bell | 4/21/07 11:50 AM
What Ben said.

I will add, though, that when solving this same situation generally (ie, for DataMgr), I use subqueries. It allows me to define something like LiveVersion = 1 as a filter for that table and then use the subquery to get the value from the other table.

(note that DataMgr can't specifically do this, but I am speaking more generally about how you could write an ORM/DAL to behave)

That being said, if you will need several fields from another table, it would probably be worth the extra effort to create your ORM to use the outer join.
# Posted By Steve Bryant | 4/21/07 11:54 AM
Having a multi-part ON clause is one of the most awesome features of SQL. Not only that, but it makes the SQL more efficient as it produces smaller intermediary tables before the WHERE clause is every executed.... of course, the SQL server's optimization engine probably does a lot of that for you, but it's good practice to have (and in cases like this is really the only way to go).
# Posted By Ben Nadel | 4/21/07 11:55 AM
@ Steve, Yep. Using outer join to solve the case where A has-one B and you might want to access n-properties from B so the join makes sense. I may well use subqueries for some other parts of the ORM, however.

@Ben, Great to know - nice to have an expert handy :->
# Posted By Peter Bell | 4/21/07 1:01 PM
@Ben, Shouldn't that be an outer join? With the inner join, it looks to me like you're still excluding the unemployed.

SELECT user.FirstName, job.Salary
FROM user
LEFT JOIN job ON (user.JobID = job.JobID AND job.LiveVersion = 1)
WHERE user.LastName LIKE 'b%'

The subquery version looks like this:

SELECT user.FirstName,
(SELECT TOP 1 Salary FROM job WHERE job.JobID= user.JobID AND job.LiveVersion=1) as Salary
FROM user
WHERE user.LastName LIKE 'b%'

While this is fine, in my experience the outer join version will optimize better on most RDBMS platforms.

The only situation where I'd use the subquery would be in cases where there might be more than one job per user, but we actually only want to pull one record per user. In your case, you've stated there is only one Job per user (hopefully this is guaranteed by a unique constraint on JobID and LiveVersion in your Job table) so the left outer join will work fine.
# Posted By David C-L | 4/21/07 1:42 PM
Oh whoops. You are right. That should have been an OUTER join.
# Posted By Ben Nadel | 4/22/07 10:02 AM
@David, Hadn't even noticed Ben had put an inner - I just changed the join cancatenator (which was using left outer join still) and it worked perfectly.

Thanks for the subquery syntax too. I remember there are cases where you HAVE to use the subquery - I certainly use it when I may have one or more aggregate has-many joins.

As an aside, does anyone have any favorite concatenation code for returning "ProductIDList" where Category has-many products? The max, min, count and the like is easy but I seem to remember you need to do something a little funkier (at least in SQL server) to get a comma delimited concatenation of the values of a field as part of an aggregate subquery in the select . . .
# Posted By Peter Bell | 4/22/07 4:35 PM
I can't test it on this computer, but I think you have to do something like this:

DECLARE @id_list VARCHAR;

SET @id_list = '';

SELECT
@id_list = (@id_list + tbl.id)
FROM
[your_table] tbl

Something like that. I am sure there are other ways to do it though.
# Posted By Ben Nadel | 4/22/07 11:02 PM
Oh, and that:

@id_list = (@id_list + tbl.id)

is supposed to have some sort of comma in it:

@id_list = (@id_list + ',' + tbl.id)
# Posted By Ben Nadel | 4/22/07 11:03 PM
I was pretty excited to see this tip since I've never found a good way to handle that in SQL (I just get all the results for ProductID and generate the ProductIDList in Coldfusion, usually using CFOUTPUT with the GROUP attribute).

But unfortunately, your suggestion didn't work for me, at least in MS SQL Server.
# Posted By David C-L | 4/23/07 11:32 AM
Ben,

"smaller intermediary tables before the WHERE clause is every executed.... of course, the SQL server's optimization engine probably does a lot of that for you, but it's good practice to have (and in cases like this is really the only way to go)."

My understanding is that in general, DBMSs tend to do the select operation (which is basically evaluating the where clause) before the joins, because joins are the most expensive operation, and we'd want to do that on smaller tables. Of course, if the join would so greatly reduce the number of rows to perform the selection (where clause) then it may do that first.

But you are right, it does do a lot of that for you, calculating estimated costs of performing which operations in what order, and selecting the best one from there, based on the table sizes and everything. Much more than I would care to think about most of the time when writing a query. =)

Peter,

I still don't see why changing INNER to OUTER in your original query shouldn't work. I may be missing something, but I would think both queries produce equivalent trees...
# Posted By Sam | 4/23/07 11:35 AM
@David,

I'll post working code here when I get a chance if ben doesn't beart me to it - I KNOW I've seen and used concatenation code something like what Ben provided in the past.

@Sam, My mistake. My original SQL should have said LEFT OUTER JOIN which is actually what I'm doing as I don't want to exclude the unemployed!
# Posted By Peter Bell | 4/23/07 11:40 AM
Here is a snippet I just ran on MS SQL server:

DECLARE @temp TABLE(
   id INT
);

DECLARE @id_list VARCHAR( 1000 );


INSERT INTO @temp (
   id
)(
   SELECT 1 AS id UNION ALL
   SELECT 21 UNION ALL
   SELECT 32 UNION ALL
   SELECT 43
);


SET @id_list = '';

SELECT
@id_list = (@id_list + LTRIM( RTRIM( STR( id ) ) ) + ',')
FROM @temp

SELECT
@id_list AS id_list
# Posted By Ben Nadel | 4/23/07 11:46 AM
Hi Ben,

Wow. Any idea how that'd fit into a subquery in the select? Could you put all of that in a SELECT FirstName, LastName, (all that code) AS ConcatenatedJobIDs . . . as part of a list query?
# Posted By Peter Bell | 4/23/07 11:58 AM
Pete,

Please don't take this as definitive, as I have rarely done this... but I do not think that you can MIX concatenation AND standard data-retrieval selects in the same Select statement. So, if your eventual query was going to return more than one records (ie. N users), I am not sure you could then, for each row, have a sub-query that concatenates a list of IDs (as this would mix the two forms... not even sure how the variable would work).

But again, do not take that as the truth. If you are going to return just ONE row (ie. 1 user), no reason you couldn't create the concat list before the main query (as i demonstated) and then just select the concat list as an additional query column.
# Posted By Ben Nadel | 4/23/07 12:05 PM
Hi Ben,

Got it. Let me do some digging because I have a feeling I found SQL that allowed me to do this for a list once - but it has been a long time :-> If I find something, I'll post it.
# Posted By Peter Bell | 4/23/07 12:07 PM
That would be awesome, as I know we have all come up against a situation where this sort of solution would rock.
# Posted By Ben Nadel | 4/23/07 12:11 PM
Is LiveVersion a NULL allowable field? If so, you might want to wrap job.LiveVersion = 1 in a ISNULL() like so AND ISNULL(job.LiveVersion, 0) = 1. At a glance, it looks like that might return the result you're looking for...
# Posted By Paul marcotte | 5/2/07 9:49 PM
Hi Paul, I'm ensuring LiveVersion isn't null, so not a problem, but a good point - thanks!
# Posted By Peter Bell | 5/3/07 9:29 AM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.