Wikidata talk:WikiProject every politician/United States of America/Archive/2021/01

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Useful queries

Congressional districts by affiliation for 115th House of Representatives

#defaultView:Map
SELECT DISTINCT 
  ?item ?itemLabel 
  ?district ?districtLabel 
  ?shape 
  (SAMPLE(?image) as ?image) 
  ?group ?groupLabel 
  (IF(MIN(?groupID) = "0", "Republican", IF(MIN(?groupID) = "1", "Democrat", "Other")) AS ?layer) 
WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?item p:P39 ?rep.             # Item has position held statement
  ?rep ps:P39 wd:Q13218630.     # Position held statement has Position Held value of US Representative
  ?rep pq:P2937 wd:Q18740945.   # Position held statement has Parliamentary Term value of 115th US Congress
  ?rep pq:P768 ?district.       # Position held statement has Electoral District value
  ?district wdt:P3896 ?shape.   # Electoral District has Geoshape value
  ?rep pq:P4100 ?group.         # Position held statement has Parliamentary Group value
  ?item wdt:P18 ?image.         # Item has Image value
  BIND(IF(?group = wd:Q29468, "0", IF(?group = wd:Q29552, "1", "2")) AS ?groupID)
}
GROUP BY ?item ?itemLabel ?layer ?district ?districtLabel ?shape ?image ?group ?groupLabel
Try it!

Every Senator from every congress

WikiProject every politician has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. Hello,

I've poked around a bit, but I'm not clear how this effort is being organized. I've pulled the data for all historical Senators from the government website and reconciled that against info in wikidata. I'm getting familiar with OpenRefine and can probably do the upload myself, but would like to discuss if there is anyone else working on this. What's the best way to get this done?

You can find the data I've generated here. I plan on adding some more documentation about it shortly.

Regards, Gettinwikiwidit

P.S. FWIW, I already filled in all the members of the 116th congress. P.P.S. Should there be a separate list of participants for sub-projects? Can there be?

Gettinwikiwidit (talk) 01:31, 27 July 2020 (UTC)


@Jura1: I'm open to discuss with any and everyone. -- Gettinwikiwidit (talk) 12:30, 15 August 2020 (UTC)

Senate checks broken

Hey there,

One issue with the checks for "missing" information is that it fails to notice if the list itself is completely empty. Currently the SPARQL queries search for United States senator (Q13217683), but there appear to be no such entities at all. There are however lots of United States senator (Q4416090) entities and in fact this is what you get when you auto-complete "United States senator".

I haven't figured out how the list of current senators is generated, but maybe it's good to test that the checks use the same entity somehow? Or maybe simply have a test that the complete list isn't empty.

I'm happy to fix this, but I don't quite follow how to edit the information.

Regards, Gettinwikiwidit (talk) 09:58, 27 July 2020 (UTC)


Hmm... The former redirects to the latter if you click on it, but SPARQL doesn't seem to know the difference if you run the linked queries. Gettinwikiwidit (talk) 10:07, 27 July 2020 (UTC)


It looks like United States senator (Q13217683) was merged into United States senator (Q4416090) on September 2019. Unfortunately, SPARQL queries don't follow redirects, so this broke all the queries based on this. I've changed the template on Wikidata:WikiProject every politician/United States of America to fix this. Teester (talk) 11:54, 27 July 2020 (UTC)

List not updating

WikiProject every politician has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. @Oravrattas: Hello, I've made some changes to the model for statements like position held (P39) United States senator (Q4416090) which I expected to require changes to the ListeriaBot table but looking at the results it appears as though the table is not regenerating since the current statements don't yet have elected in (P2715) qualifiers on them. Any advice for debugging this issue? Regards, Gettinwikiwidit (talk) 06:00, 15 November 2020 (UTC)

P.S. Does the project ping work? Gettinwikiwidit (talk) 06:00, 15 November 2020 (UTC)
@Gettinwikiwidit: It worked for me, so possibly yes. ミラP@Miraclepine 06:01, 15 November 2020 (UTC)
Hmm... I was pretty sure earlier experiments with templatizing the SPARQL query worked, but they don't seem to now. If I change it to an explicit query it works now. Sorry for the noise. Gettinwikiwidit (talk) 06:25, 15 November 2020 (UTC)
Ping worked, but I just don't have any advice. –MJLTauk 16:56, 15 November 2020 (UTC)

Current state of this data set

Hello,

I thought I'd offer a few words about this data set as I uploaded a major portion of it. Previous discussions about this data can be found here and here.

Background

In broad strokes, the position held (P39) United States senator (Q4416090) statements previously had one per senator, but that left little room to specify details about each election. Then considering that the United States Senate was designed to ensure that the entire Senate isn't overturned in a given election by means of the Senate class system, a new model was implemented which had a single position held (P39) statement for each legislative term (Q15238777) whereby a typical election ensures a senator serves three consecutive terms. The advantage of this model is it sets a common denominator across all Senate classes. That is to say that it puts a priority on queries around which senators served at the same time over where a senator is within his particular tenure. This closely models how the data is stored in the US Senate Biographical Directory and makes it easier to cross-reference against work performed in a given Congress.

Beyond modeling which legislative Congress a given senator served in, we're left with the task of how to model the difference in the seats the Senate class system prescribes. After some discussion it was decided to encode this in the electoral district (P768) property so that a United States Senate seat (Q101500234) for a given state would have located in the administrative territorial entity (P131) property describing the U.S. state (Q35657) the seat belonged to. This leaves the position held (P39) property to United States senator (Q4416090) for simplicity and relegates this detail to a qualifier.

Where we are

All of the statements in the new model have a parliamentary term (P2937) qualifier pointing to a legislative term (Q15238777). Statements of the previous model still exist and are distinguished by lacking this qualifier. It was suggested here that those statements should be removed. Here on out, I'll focus on statements in the new model.

The following shows just how inconsistent the claims without a YouTube channel ID (P2397) modifier were.
SELECT ?pred (COUNT(?pred) AS ?cnt) WHERE {
  ?sen p:P39 ?stmt;
    wdt:P31 wd:Q5.
  ?stmt ps:P39 wd:Q4416090.
  FILTER(NOT EXISTS { ?stmt pq:P2937 ?term. })
  ?stmt ?pred ?val.
}
GROUP BY ?pred
Try it!
electoral district was represent variously with represents (P1268) and of (P642). The only useful info in these statements that doesn't exist in the new model is elected in (P2715) and it's proposed below how it can be calculated and even this is unevenly applied. This info can be cross referenced against the various lists in Wikipedia. Gettinwikiwidit (talk) 22:35, 26 November 2020 (UTC)

How this data was collected

The vast majority of the data was collected from the US Senate Biographical Directory which maintains information in XML format. (e.g. Warren Robinson Austin) Mostly this contains information about which senator served in which legislative Congress, though there is a biography section with harder to parse (and inconsistent) information. In a first pass these statements were created with a start time (P580) and end time (P582) which matched the legislative term, though not all senators served the entire term of each legislative Congress they served in.

In a subsequent pass this was augmented by using this list of Senate appointments indicating when some of these terms were cut short. Note that as mentioned in this link, this only contains information since 1913 when the Seventeenth Amendment was passed to establish the direct election of senators.

Also complicating matters is the fact that the date of a Senate appointment doesn't always match the date the senator began to serve. We don't yet have a good way to model the date of the appointment, but we can use appointed by (P748) to model who made the appointment. (Most often, the governor at the time.)

Issues with the data

As hinted at above there are still a number of entries with overlapping terms. These need to be teased out one way or another. Suggestions welcome. (Perhaps DBPedia??) Here is a query to collect all the overlapping terms since 1913.

SELECT * WHERE {
  {
    SELECT DISTINCT (COUNT(?stmt) AS ?cnt) ?district ?start WHERE {
      ?sen p:P39 ?stmt;
        wdt:P31 wd:Q5.
      ?stmt ps:P39 wd:Q4416090;
        pq:P2937 ?term;
        pq:P580 ?start;
        pq:P768 ?district.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    GROUP BY ?district ?start
  }
  FILTER(?cnt > 1)
  FILTER(?start > "1913-01-01T00:00:00Z"^^xsd:dateTime)
  ?sen2 p:P39 ?stmt2;
    wdt:P31 wd:Q5.
  ?stmt2 ps:P39 wd:Q4416090;
    pq:P2937 ?term;
    pq:P580 ?start;
    pq:P768 ?district.
  OPTIONAL { ?nextterm wdt:P155 ?term. ?sen2 p:P39 [ pq:P2937 ?nextterm ]. }
}
ORDER BY (?district) (?start)
Try it!


The statements are collated by district and start time to make it easier to see where the overlaps occur. In addition it lists which of the senators served in the succeeding term under the theory that that senator is most likely the senator that took over. (There are exceptions to this "rule".) Since the start times have already been modified for appointed senators, the vast majority of these are senators elected in a special election.

I have yet to do an analysis of the information before 1913 since these were not handled by direct elections. I believe most were by elections in the State houses.

With pandas it's pretty easy to extract tables from Wikidata with Python. Looking at all the lists of US senators by state we can pull down the start and ends of "runs" in the senate. (i.e. consecutive terms serving in the same seat) I made a table of (senator,seat,start term, start time, end term, end time) for all of these runs. Joining this info with the table generated from the SPARQL query above, I can figure out the start and end times of each senator for a seat in the given term. From there, it's pretty easy to generate commands to update wikidata. I plan on doing that within a few days. Gettinwikiwidit (talk) 09:19, 20 November 2020 (UTC)
Actually, this can be done for all senators once this info is extracted. I'll run some tests before applying it more broadly. Regards, Gettinwikiwidit (talk) 09:19, 20 November 2020 (UTC)
Everything after 1913 is now updated except for one oddball entry which corresponds to a senator who was never seated according to this wiki. Not sure how we should handle this. Regards Gettinwikiwidit (talk) 02:19, 21 November 2020 (UTC)
  • I plan on applying a similar treatment to the pre-1913 terms where there is overlap. I wish these source of these dates could be explicitly referenced though since they're not coming from the US Congress Biographical Directory. Actually, the biographical directory does have dates, but they're embedded in prose and harder to parse. I guess we can expect people to update them over time if the dates used are disputed. Gettinwikiwidit (talk) 21:38, 21 November 2020 (UTC)
While matching up data uploaded from the Biographical Directory of the United States Congress with the "List of state senators" Wikipedia lists mentioned above, there were a few which didn't line up. I.e. the senator serving in the seat doesn't exist for a given term in on or the other or else is not at either end of a run serving in a seat. Noting them here, but there should be some follow up.
SELECT ?sen (?senLabel AS ?senator) (?qdistrictLabel AS ?district) (?qtermLabel AS ?term) ?stmt WHERE {
  {
    SELECT ?sen ?senLabel ?qdistrictLabel ?qtermLabel ?stmt WHERE {
      VALUES (?sen ?stmt) {
        (wd:Q653713 wds:Q653713-39F43CF8-CF2B-4ED2-8AE5-34F88A7F7B56)
        (wd:Q202950 wds:Q202950-E892526F-B24F-4BB9-83E1-DB804269924C)
        (wd:Q1348975 wds:Q1348975-FDC6D66B-A7B0-4CC9-9E08-1F965BD0B574)
        (wd:Q883164 wds:Q883164-8E705A20-7F22-4A3A-8414-E88A4C4A115E)
        (wd:Q167795 wds:Q167795-DFE74420-2178-4CD6-BDAE-BFCB67DB767B)
        (wd:Q1283683 wds:Q1283683-BAFCA71C-737D-485C-A5AD-9CFDF4CC8979)
        (wd:Q5906536 wds:Q5906536-4E86CFD1-67A1-4E54-9C6A-6A55DA051D3B)
        (wd:Q5934173 wds:Q5934173-665EAD4A-B000-49FB-B4D0-8E88781BDC96)
        (wd:Q2622644 wds:Q2622644-024DB516-A850-4BBE-B02E-BF14F2FB5D1C)
        (wd:Q1700299 wds:Q1700299-A6EEB472-AF66-482C-9D0E-407436CA66FB)
        (wd:Q388215 wds:Q388215-172451FB-B657-451C-9069-FDE64A0A867B)
        (wd:Q273549 wds:Q273549-56B5CCDF-E60D-4EB5-BCED-68928AF85454)
        (wd:Q3068384 wds:Q3068384-FF047D96-DF25-4060-A86F-73331F16BC44)
        (wd:Q3068384 wds:Q3068384-E41804D3-F1B4-482D-9C18-07A5E9A2548C)
      }
      ?stmt pq:P768 ?qdistrict;
        pq:P2937 ?qterm.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
  }
}
Try it!
results
sen senator district term stmt
wd:Q202950 John Henry Maryland Class 1 senate seat 5th United States Congress wds:Q202950-E892526F-B24F-4BB9-83E1-DB804269924C
wd:Q1348975 Simon Cameron Pennsylvania Class 1 senate seat 37th United States Congress wds:Q1348975-FDC6D66B-A7B0-4CC9-9E08-1F965BD0B574
wd:Q5906536 William S. Kenyon (Iowa politician) Iowa Class 2 senate seat 62nd United States Congress wds:Q5906536-4E86CFD1-67A1-4E54-9C6A-6A55DA051D3B
wd:Q167795 James Murray Mason Virginia Class 1 senate seat 37th United States Congress wds:Q167795-DFE74420-2178-4CD6-BDAE-BFCB67DB767B
wd:Q5934173 James Henry Lane Kansas Class 2 senate seat 39th United States Congress wds:Q5934173-665EAD4A-B000-49FB-B4D0-8E88781BDC96
wd:Q2622644 Alexander Caldwell Kansas Class 2 senate seat 43rd United States Congress wds:Q2622644-024DB516-A850-4BBE-B02E-BF14F2FB5D1C
wd:Q883164 John Taylor South Carolina Class 2 senate seat 14th United States Congress wds:Q883164-8E705A20-7F22-4A3A-8414-E88A4C4A115E
wd:Q1283683 Lafayette Young Iowa Class 2 senate seat 62nd United States Congress wds:Q1283683-BAFCA71C-737D-485C-A5AD-9CFDF4CC8979
wd:Q653713 John Eager Howard Maryland Class 1 senate seat 5th United States Congress wds:Q653713-39F43CF8-CF2B-4ED2-8AE5-34F88A7F7B56
wd:Q1700299 John H. Bankhead II Alabama Class 2 senate seat 79th United States Congress wds:Q1700299-A6EEB472-AF66-482C-9D0E-407436CA66FB
wd:Q3068384 Frank Leslie Smith Illinois Class 3 senate seat 70th United States Congress wds:Q3068384-E41804D3-F1B4-482D-9C18-07A5E9A2548C
wd:Q3068384 Frank Leslie Smith Illinois Class 3 senate seat 69th United States Congress wds:Q3068384-FF047D96-DF25-4060-A86F-73331F16BC44
wd:Q653713 John Eager Howard Maryland Class 1 senate seat 5th United States Congress wds:Q653713-39F43CF8-CF2B-4ED2-8AE5-34F88A7F7B56

connecting the sequence of Senate seat holders

Once we have unique start time (P580) qualifiers sorting by electoral district (P768) and start time (P580) should allow us to calculate which senator replaces (P1365) and replaced by (P1366) which other senator.

This should now be possible now that the above task is complete. Gettinwikiwidit (talk) 00:17, 24 November 2020 (UTC)
I've gone ahead and done this. I also scraped the replaces (P1365) and replaced by (P1366) information out of all the Wikipedia pages and compared what existed against what was generated by the process described above. This revealed a couple of bugs in the data which have since been fixed as well as some oddball cases reported below. Gettinwikiwidit (talk) 23:10, 25 November 2020 (UTC)
Now that this is done we can make queries like this:
SELECT * WHERE {
  {
    SELECT ?sen ?senLabel ?replaces ?replacedBy ?start ?end WHERE {
      VALUES ?sen {
        wd:Q508752
      }
      ?sen p:P39 ?stmt.
      ?stmt ps:P39 wd:Q4416090;
        pq:P2937 ?term.
      {
        ?stmt pq:P1365 ?replaces;
          pq:P580 ?start.
      }
      UNION
      {
        ?stmt pq:P1366 ?replacedBy;
          pq:P582 ?end.
      }
      BIND(COALESCE(?start, ?end) AS ?date)
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    ORDER BY (?sen) (?date)
  }
}
Try it!

to generate data which can be used for Wikipedia info boxes. It needs to be enhanced just a bit to produce start times for initial holders of a seat, but the basic idea works. Gettinwikiwidit (talk) 15:13, 26 November 2020 (UTC)

Adding elected in qualifiers

I think we can assume that senators elected for the start of a given legislative Congress are elected in (P2715) in the year prior to that term. We might also assume that the subsequent legislative Congresses until the next election for that class served by the same senator were because of that same election. This should allow us to calculate a large number of elected in (P2715) values. Presumably the majority of the rest were either appointments or arrived at through special election (at least after 1913).

Does this sound like a valid assumption?

Where we go from here

The above lists a couple of projects to clean up the data currently in the store. Please feel free to open up new topics here with suggestions of how to clean up the data or descriptions of ongoing work to clean up the data with sample queries, etc. so that other people may benefit from your work and/or take off where you left off should you move on to other projects.

I've (mostly) enjoyed the work I've put in so far and hope that it provides a stepping stone to improve the data set even further.

Regards, Gettinwikiwidit (talk) 06:35, 17 November 2020 (UTC)

Individual States

Does this project include individual states? I'm going through for New Jersey and am curious if it should be tracked anywhere. Acebarry (talk) 23:11, 17 November 2020 (UTC)

Hi, @Acebarry:. Do you mean state legislatures? My contribution so far has only been to add all the info for United States senators. ( Though as mentioned above there is some clean up to do. ) On the every politician project page, you'll see that their ambitions include having local government politicians listed as well. If that's something you'd like to work on and would like some tips, I'm happy to share. I see that there is a Wikipedia page which lists state senators. If you click through to a given senator, you'll find a Wikidata item link in the sidebar. Looking at a typical one it doesn't seem to have history. There may be links to source data in the Wikipedia references. So, a bit of a rambling answer here, but I think it can be summarized as follows: The project doesn't include local governments yet but there seems to be interest in having it done. There's info out there, we just need a plan for collecting and organizing it. Hope this helps. Gettinwikiwidit (talk) 09:11, 20 November 2020 (UTC)
@Acebarry: Oh, it looks like you're adding the New Jersey state legislature entities. I guess I don't understand your question. Are you asking if anyone else is trying to do the same? I'm not, to be clear. Gettinwikiwidit (talk) 21:43, 20 November 2020 (UTC)
@Gettinwikiwidit: Exactly! I want to know if anyone is doing state legislatures and if they are if I should be tracking progress somewhere. If not would it be appropriate for me to make a subpage under this? E.g. WikiProject every politician/United States of America/New Jersey ? Acebarry (talk) 16:54, 21 November 2020 (UTC)
WikiProject every politician has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. Pinging the project for guidance. If we don't hear back I say go with your suggestion. Thanks for your contribution! Gettinwikiwidit (talk) 21:04, 21 November 2020 (UTC)
@Gettinwikiwidit: Secondary subpages for individual states would be nice and something I support. –MJLTauk 21:22, 21 November 2020 (UTC)
@Gettinwikiwidit: What MJL said. ミラP@Miraclepine 21:46, 21 November 2020 (UTC)
Yes, a subpage for New Jersey, or any other state, would be great! --Oravrattas (talk) 04:51, 22 November 2020 (UTC)
I made the page for New Jersey. I'm going thru the current senators and adding their data. Acebarry (talk) 01:07, 23 November 2020 (UTC)

US representatives with no US Congressional Biographical Directory entries

Nearly all United States representative (Q13218630) items have a US Congress Bio ID (P1157). The few that don't are listed here:

SELECT ?item ?itemLabel ?rep ?successorLabel ?successor ?successorBIOID WHERE {
  ?item p:P39 ?rep;
    wdt:P31 wd:Q5.
  ?rep ps:P39 wd:Q13218630.
  FILTER(NOT EXISTS { ?item wdt:P1157 ?bioid})
  OPTIONAL { ?rep pq:P1366 ?successor.
            ?successor wdt:P1157 ?successorBIOID }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

I have no idea where Q4714468 came from but the rest are split into two cases: Those who were elected but never seated and those who are referred to in the Biographical Directory entries of the succeeding representative but have no entry themselves.

I'll follow up with the US House historian to see about adding the missing entries, but we should probably try to model the information in that list to be able to generate a similar table with Wikidata. I'm not planning on working on this anytime soon, but I thought I'd place a marker here for future work. Post here if you'd like to take on this task. Regards, Gettinwikiwidit (talk) 05:56, 22 November 2020 (UTC)

@Gettinwikiwidit: I checked and saw that Q4714468 had been deleted at an enwiki AFD as a hoax; while there are arwiki and arzwiki articles, the arwiki article's one source is a newspaper scan that doesn't mention Almeida at all and the arzwiki article is unsourced with three external links to the WD entry's identifiers. Also, the Freebase ID is based on the hoax and other two (VIAF and ISNI) are for a sightly similarly named but more notable Portugal MP Artur Manuel Giesteira de Almeida (Q102302952). Hence, I thank an WD:RFD is needed for Almeida, though the arwiki and arzwiki articles have to be deleted before the item can be deleted. ミラP@Miraclepine 02:06, 25 November 2020 (UTC)
@Miraclepine: I've never been through this process. I'll see how far I can get on my own. Thanks. Gettinwikiwidit (talk) 05:20, 25 November 2020 (UTC)
Spotted this just now - I've flagged it as a hoax and listed it for deletion, though it may take a while to sort out the linked pages. Andrew Gray (talk) 20:56, 26 November 2020 (UTC)

Appointed senators

It's probably worth cross checking all United States senator (Q4416090) entries against this list. In the first pass, this list was used. Gettinwikiwidit (talk) 22:19, 22 November 2020 (UTC)

Oddball replaces situation

Hey there, the Joseph R. Grundy (Q1548061) entry has replaces (P1365) set to William Scott Vare (Q6222555) but according to this Wikipedia page William Scott Vare (Q6222555) was elected but never actually seated. Should Joseph R. Grundy (Q1548061) have replaces (P1365) set to the last person to actually occupy the seat? I'll leave it for now. Gettinwikiwidit (talk) 09:34, 25 November 2020 (UTC)

Henry Johnson (Q767426) with Alexander Porter (Q166016) is another case per this Wikipedia page Gettinwikiwidit (talk) 11:39, 25 November 2020 (UTC)
James Ross (Q503524) with Albert Gallatin (Q500046) is even odder because Gallatin actually served in office until he was expelled for being ineligible. Gettinwikiwidit (talk) 11:39, 25 November 2020 (UTC)

Interesting tidbit

William Windom (Q1374474) served non-consecutive periods within a single legislative term as he left his job and was then re-elected to his own seat. I've created two position held (P39) entries for him. Gettinwikiwidit (talk) 05:54, 26 November 2020 (UTC)

This definitely sounds the best way of doing it. I wondered about mentioning this sort of case as a possibility but foolishly assumed it would never actually happen for a senator! Andrew Gray (talk) 20:57, 26 November 2020 (UTC)

Biographical Directory of the United States Senate wrong

There are a number of places where the text found in a bioguide entry doesn't match the data. In this case Abraham Baldwin (Q329766) is marked as having served in the 10th United States Congress (Q4547180) but he had died before it started. I'll use this section to note these instances. It should probably be followed up with at the bioguide. Gettinwikiwidit (talk) 02:20, 27 November 2020 (UTC)

There are several members listed in Wikidata either without a bioid or a bioid which doesn't point anywhere.
results
itemLabel successorLabel successorBIOID
Isaac Bloom (Q1670880) Daniel C. Verplanck V000088
Jack Swigert (Q348358) Daniel Schaefer S000109
Thomas Tillotson (Q2427638) Theodorus Bailey B000049
David Scott (Q5239582) John Murray M001109
Thomas D. Singleton (Q2423319) Robert Campbell C000098
Orrin Dubbs Bleakley (Q7104081) Earl Hanley Beshlin B000421

Completing the list of historical House of Representatives members

WikiProject every politician has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. I just grabbed the list of all ids for members of the House from the Biographical Directory and compared that against what's in Wikidata. There are a few entries which don't exist in the Biographical Directory but are mentioned in entries for other members (see above). I've contacted them about this. Other than those, there are 11029 members which are in Wikidata and 1778 (funny number) which are missing. If anyone has any advice on how to add new entries or any template they'd like me to use, please let me know. Otherwise, I'll try to figure it out and post back here. Regards, 23:29, 27 November 2020 (UTC)

Sadly the Biographical directory does not have information about which districts were served. However this site does at least back to 1974. I'm currently scraping this data. Wikipedia itself can be scraped for older terms, but I'm unable to track down from which sources this information was derived. Gettinwikiwidit (talk) 23:26, 6 December 2020 (UTC)
Trying to reconcile the two sites against each other turned up only a handful of discrepancies which should be easy to manual follow up on.
SELECT DISTINCT ?rep ?repLabel ?bioid ?missingDirectoryTerm ?bioURL ?congressURL ?article WHERE {
  VALUES (?bioid ?missingDirectoryTerm ) {
    ("S000847" "96")
    ("T000410" "92")
    ("H001092" "116")
    ("Z000001" "98")
    ("M000249" "109")
    ("M000388" "111")
    ("C000542" "101")
    ("S000716" "97")
    ("D000211" "80")
    ("B000966" "99")
    ("R000249" "94")
    ("D000373" "107")
    }
  ?rep wdt:P1157 ?bioid;
       p:P39 ?stmt.
  OPTIONAL {
?rep       ^schema:about ?article. }
  ?article schema:isPartOf <https://en.wikipedia.org/>.
  
  OPTIONAL { ?stmt ps:P39 wd:Q13218630. }
  OPTIONAL { ?stmt prov:wasDerivedFrom/pr:P854 ?ref. }
  OPTIONAL { ?stmt pq:P2937 ?term. }
  OPTIONAL { ?stmt pq:P4100 ?party. }
  BIND(URI(CONCAT("https://www.congress.gov/member/william-steiger/",?bioid,"?searchResultViewType=expanded")) AS ?congressURL)
  BIND(URI(CONCAT("http://bioguide.congress.gov/scripts/biodisplay.pl?index=",?bioid)) AS ?bioURL )
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?repLabel
Try it!

Gettinwikiwidit (talk) 11:34, 7 December 2020 (UTC)

In addition, there are ten representatives who switched parties (one twice!!) which I haven't yet attempted to map out. Gettinwikiwidit (talk) 11:34, 7 December 2020 (UTC)
And here are 11 missing in the other direction. Gettinwikiwidit (talk) 13:12, 7 December 2020 (UTC)
Perhaps surprisingly or more likely thanks to @Teester: all the terms with districts found here with a handful of exceptions and given the caveats above are faithfully recorded in Wikidata. I've manually fixed the few outliers and will follow up on the items above. For items before 1973, I'd prefer to have a source other than Wikipedia, but I can try scraping info from pages like this. Gettinwikiwidit (talk) 23:33, 7 December 2020 (UTC)
Or rather all the terms which are in Wikidata are accurate with the caveats above. There are around 6500 claims to add. I'm arranging them now. Gettinwikiwidit (talk) 00:58, 8 December 2020 (UTC)
I've added references to the site which explicitly mentions the district. Gettinwikiwidit (talk) 23:57, 7 December 2020 (UTC)

Numbers of serving Senators - error-check

@Gettinwikiwidit: I've adapted this query which finds all cases where a specific "class" seat has two members at the same time - may be useful for sorting out the overlaps (linked not templated as it has a || and the template gets upset). It's set up to ensure that it only looks at "new-style" seats with class embedded, to avoid confusion while the old-style entries are still in place as well.

Currently 15 cases - a group of modern ones, a couple of very earlies, and a few late-c19th. Andrew Gray (talk) 21:54, 28 November 2020 (UTC)

@Andrew Gray: Thanks very much. I added a tweak to show the statements where the overlap occurs to make it easier to click through to follow up. Also, each of the seat classes inherit from United States Senate seat (Q101500234) to make queries like this easier. Gettinwikiwidit (talk) 22:55, 28 November 2020 (UTC)
Neat - definitely a bit smoother. Hadn't twigged the inheritance worked that way, and wasn't sure how to do it without bringing in the other seat items. Andrew Gray (talk) 23:39, 28 November 2020 (UTC)
One weird case here: James Fenner (Q880519). Neither the Wikipedia page nor the Biographical Directory lists a precise date. Gettinwikiwidit (talk) 11:34, 29 November 2020 (UTC)
Actually here's another John Taylor (Q883164). Gettinwikiwidit (talk) 11:36, 29 November 2020 (UTC)
These are mostly cleaned up but it turned up another error. Which lead to this having more results than I expected. I'll clean these up. But a few of them are from people resigning on the first day of the new term. I'm not sure how these should be accounted for but some have references to that single day term in the Biographical Directory. Gettinwikiwidit (talk) 11:48, 29 November 2020 (UTC)
Er.. Fixed now, so that query might not paint the picture it did when I posted it, but I think you get the idea. Gettinwikiwidit (talk) 12:01, 29 November 2020 (UTC)
My feeling is that if they resign on the first day of the new term, they still served on that one day - so start and end same day? Andrew Gray (talk) 14:03, 29 November 2020 (UTC)
I don't feel strongly about it but that's how I've done it for now. Gettinwikiwidit (talk) 02:27, 30 November 2020 (UTC)
This query looks up the number of distinct (state & class) Senate seats found in each term, checks the number of states they represent, and flags up any discrepancies. Some quick sampling suggests that at least some of the discrepancies are correct, though - for example the 70th Congress (1927-29) only ever had 95 seats represented, as William Vare (PA Class 3) was never seated but no-one was elected to replace him in this period. Andrew Gray (talk) 22:50, 28 November 2020 (UTC)
It's probably worth pointing out for cases like William Scott Vare (Q6222555) I've deprecated the rank for position held statements. It might be worth filtering these out, depending on what your query is after. Gettinwikiwidit (talk) 02:52, 29 November 2020 (UTC)
I think you can simply add a wikibase:rank wikibase:BestRank clause to the query to do this. Gettinwikiwidit (talk) 02:53, 29 November 2020 (UTC)
It's a little tricky, I think - wikibase:BestRank will maybe not work right if there's a different P39 with preferred status? Might be better to specifically ask for 'normal' or 'preferred', or specifically filter out 'deprecated'.
I've been wondering if deprecation is the best way to go for these "never actually made it to the Senate" options - it's not quite how we normally use deprecation. However, I can't think of a better way to do it that is consistent with how the sources handle it, unless we create a "senator-elect" position (which seems to be how the Biographical Dictionary talks about them). But that has its own complications. Andrew Gray (talk) 14:03, 29 November 2020 (UTC)
I think if you look at the definition of DeprecatedRank, this feels pretty squarely on target. I'm happy to use another position, though. Gettinwikiwidit (talk) 02:30, 30 November 2020 (UTC)
However, one person has already mistook the meaning of DeprecatedRank already so maybe this isn't the best plan. FWIW, Jura1 suggested it.
Started looking at this. It looks like the Tennessee Class 2 senate seat (Q101499000) was vacant for the entire 27th United States Congress (Q4632667) per this Wikipedia page.
Albert Gallatin (Q500046) served part of the 2nd United States Congress (Q210241) but was unseated for having served illegitimately.
Joseph Lane (Q372848) is not listed as having served in the 35th United States Congress (Q4635622) per the Biographical Directory. But that seems to be an error per this page.
The South Carolina Class 3 senate seat (Q101498991) was vacant during the Civil War and hence terms 37th United States Congress (Q4635921), 38th United States Congress (Q4636059) and 39th United States Congress (Q4636192) according to this page though only the 37th shows up in this report because neither state was filled in the latter two.
Thomas Tipton (Q2427664) is not listed as serving in the 39th United States Congress (Q4636192) in the Biographical Directory, but is per this page.
Louisiana Class 3 senate seat (Q101499042) was vacant for the whole of the 43rd United States Congress (Q4637968) per this page.
George Hearst (Q561826) is simply missing from 50th United States Congress (Q4639974). It looks like an oversight, though I'm not sure how this would happen.
Matthew Quay (Q6059259) is similarly missing from 56th United States Congress (Q4640768).
L. Heisler Ball (Q1709972) is similarly missing from 57th United States Congress (Q4640851). This one at least was missing from my cache of the biographical data from when I downloaded this stuff, so it looks like it was added later.
This seat was vacant in the 56th United States Congress (Q4640768).
Pennsylvania Class 3 senate seat (Q101498986) in the 70th United States Congress (Q4643047) was William Vare mentioned above.
That seems to be all of them. FWIW, I used this query to feed into this one which made it very easy to click through and see what was happening. Regards, Gettinwikiwidit (talk) 03:35, 30 November 2020 (UTC)
I've fixed the missing entries mentioned above. Gettinwikiwidit (talk) 06:41, 30 November 2020 (UTC)

Party leaders of the Senate

Here's a table we can parse. There are textual references in the Biographical Directory but grepping for say "minority leader" only turns up a handful and only one is marked as such in Wikidata. Gettinwikiwidit (talk) 10:36, 29 November 2020 (UTC)

Replicating Wikipedia tables

I've generated a query which produces output similar to Wikipedia's list of senators by state.. I plan on reconciling one against the other. The query has one row for the start of each senate "run" and one for the end of each senate "run". Collecting these by (senator, seat) seat pairs you can produce a table showing the start and end for each senate run similar to the Wikipedia pages. Writing the code to collect the tables revealed cases where a senator was replaced by (P1366) by a senator (i.e. ended a run) but missing the preceding replaces (P1365) (i.e. start of that run). In almost all cases it was a senator replacing himself after a vacancy. I've added the extra qualifiers so that these match up now. I.e for each seat, there is a chain of replaces (P1365) -> replaced by (P1366) qualifiers from the establishment of the seat to the current occupants. Note that the chain is not broken during vacancies, but the gap is represented in the start time (P580) and end time (P582) qualifiers. This is true even for the large gap for seats left vacant during the Civil War. Regards, Gettinwikiwidit (talk) 12:38, 1 December 2020 (UTC)

It's probably worth noting that the time of the establishment of the seat is not produced by this table, nor does it produce an end time for the current occupants. Gettinwikiwidit (talk) 12:40, 1 December 2020 (UTC)
@Andrew Gray: I repaired quite a few tables in Wikipedia where the rows didn't match up with the stated timelines even when the stated timelines did match what is in the Biographical Directory. Having the rows improperly lined up made doing a reconciliation against those tables difficult. Now that they're fixed, I'll try to do another round. Gettinwikiwidit (talk) 12:52, 4 December 2020 (UTC)
At this point all the end terms line up and all the start terms line up but three oddball cases: Leverett Saltonstall (Q880369) Tom Stewart (Q2440140) Nathan L. Bachman (Q1768884). For the last I think Wikipedia is just wrong ( see note below ). Note, this is *terms*, not times. There may be more work to do on tweaking the times. Gettinwikiwidit (talk) 14:19, 4 December 2020 (UTC)
There are about 30 runs left which have either a start or an end time off by more than a day. Gettinwikiwidit (talk) 14:29, 4 December 2020 (UTC)

Withdrawal vs. Expulsion

Just a heads up that there seems to be a distinction between the date of "withdrawing from the senate" and the date of "expulsion from the senate" and these seem to be handled inconsistently in Wikipedia. This query contains the cases that came to my attention. Gettinwikiwidit (talk) 12:33, 2 December 2020 (UTC)

FWIW, the note on Tom Stewart here expresses a strong viewpoint of when a senators service begins. I'm not sure how authoritative it is, though. @Andrew Gray: Gettinwikiwidit (talk) 10:04, 4 December 2020 (UTC)
@Gettinwikiwidit: It seems a bit of a thorny question. Most of the Biographical Dictionary entries just say "service from...", but some of the more recent ones do split the dates out - eg Robert Mendendez has "appointed on January 17, 2006 ... service began on January 17, 2006, and took the oath of office on January 18, 2006", I've also spotted an "effective" date on some appointments - Jean Carnahan was "appointed to the United States Senate on December 4, 2000, effective January 3, 2001". As far as I can work out from w:Seniority in the United States Senate, service is usually reckoned from the date of appointment, but it's not explicitly clear how that works with cases like Stewart where they need to resign from another post. This Senate list and the Biographical Dictionary strongly suggests that for a case like Stewart, he'd be counted as a Senator from when he "assumed his senatorial duties", ie January not November. But there might be a subtlety here I'm missing. Andrew Gray (talk) 22:57, 4 December 2020 (UTC)
Here's another case like it - Mark Hatfield "delayed taking oath of office until January 10, 1967, to finish term as governor; ... served from January 10, 1967,". So that seems to be another one consistent with only becoming a Senator once you resign from the conflicting post. Andrew Gray (talk) 22:06, 5 December 2020 (UTC)
  • Another weird one - James Shields (Q923522) James Shields (who represented Illinois, Minnesota and Missouri!) had a complicated first term - elected, took his seat, and election voided later in March 1849. Immediately re-elected and came back in October 1849. The BD has him only serving from October; I've left it on I think your preferred version of a short term for the March service as well. Andrew Gray (talk) 22:54, 6 December 2020 (UTC)
This same guy served as Senator for three different states! Not that there's anything wrong with that, but it's also weird. Gettinwikiwidit (talk) 21:39, 7 December 2020 (UTC)
Yeah - have to remember him as a trivia question answer :-). This is a nice cross-check for our data, incidentally - w:List of members of the United States Congress from multiple states says there have been only two multiple-state senators, him for three and Waitman T. Willey (Q1751424) for two. Our query agrees. Always felt that if we can answer the weird trivia questions, it's a good indication we'll be able to reliably answer the serious ones as well! Andrew Gray (talk) 22:57, 8 December 2020 (UTC)

Discrepancies between new and old data

@Gettinwikiwidit: I've been looking at comparing the new data with what was already in place, prior to removing the old data. First of all, checking that all the items with old-style records (no term qualifier) now have a new-style record (term & date) as well.

So far, so good. Comparing the data in the two gets a little more complicated. Most don't have any dates, but there are 97 cases where the old-style data and new-style data have different start dates, and 149 cases where they have different end dates. There are 206 items with a discrepancy, as some are on both lists. They seem to fall into a few general groups.

  1. Lots of pre-1930s terms with 3 March vs 4 March end dates.
  2. It's missing some very brief appointments. Paul Tsongas (Q283884) resigned as Senator one day early, and John Kerry (Q22316) was appointed to replace him that same day - so he served one day of the old term. The newly imported data treats Tsongas as going up until the end of his term, and then Kerry starting on the first day of the new term. In some cases there's a discrepancy on both items, on others just the one - eg Richard Nixon (Q9588) is missing his extra month as a Senator, but the dates on his predecessor Sheridan Downey (Q2278283) are correct. I think in most of these the existing data is correct, and these may be missing from the source? Should hopefully be easy to add those short terms, though.
My inclination is to completely ignore the old model data. I'm not sure I follow the example mentioned above, but I'm happy to discuss how to handle any outlier cases. Gettinwikiwidit (talk) 01:33, 3 December 2020 (UTC)
  1. Other records have discrepancies over exact start/end dates (eg Kelly Loeffler (Q76570207) and Lincoln Chafee (Q44690) just have a few days difference), presumably using different definitions,
  2. And finally a few are probably just typos on the original import from WP.

For the 3 March vs 4 March end dates, I'm not entirely sure which is correct. The Senate chronological list seems to treat older terms as running from March 4 to March 3 (one ends the day before the other starts), but since 1935 it lists them as January 3 to January 3 (starting and ending on the same day). Not sure if this is a legal change, or just one in convention.

I was just about to write a note about this. For now I've left differences of a single day to be dealt with later. I'm guessing that some people assume you can't have two senators serving on the same day and are reluctant to record them as such where as others prefer having zero gap. I personally don't mind leaving this ambiguous because I think there's no way future consistency can be guaranteed. Gettinwikiwidit (talk) 01:31, 3 December 2020 (UTC)

I will try and find a bit of time over the next few days to fix up some of these discrepancies. Andrew Gray (talk) 00:22, 3 December 2020 (UTC)

@Andrew Gray: This sounds similar to the Replicating Wikipedia tables project mentioned above which I'm just finishing up. I'm reluctant to take the old Wikidata as authority, though and have been tending to use the Biographical Directory and Wikipedia as authoritative in that order, though there have been errors discovered in both. Gettinwikiwidit (talk) 12:44, 3 December 2020 (UTC)
It's probably worth mentioning that I've been making edits this morning. I'm not sure when you did your checks, but that seems like it's worth taking into account. Gettinwikiwidit (talk) 01:31, 3 December 2020 (UTC)
Also per PC, I was planning on deleting all the old model claims by the end of this week. I'll hold off for another week or until I hear from you so as not to get in your way. Regards, Gettinwikiwidit (talk) 12:07, 3 December 2020 (UTC)
It's probably worth mentioning that at this point only a handful of start and end times differ by more than a day from what can be found in the list of senators by state Wikipedia pages. I'll package up my software on post back so people can have a look at it. I'll also post back the remaining cases in case they can't be resolved trivially. 12:43, 3 December 2020 (UTC)
@Gettinwikiwidit: All sounds great. I'll chip away at some of these dates and if you're also coming at it state-by-state, it sounds like we'll get through them pretty fast! Agree entirely that the "old data" shouldn't be taken as authoritative, but if they disagree it looks like a good prompt to double-check for anomalies.
For the "short terms" I mentioned above, this change is an example of what I mean - Tsongas resigned after the election, and Kerry was appointed on 2 January, so he served for a single day of the 98th Congress and then started a new term in the 99th. These do seem to be real service, albeit something of a legal fiction (WP suggests it's a dodge to gain seniority), so probably we should include them? Hopefully they'll all pop out naturally as we work through these.
In terms of deletion, I think you should be good to remove all the old-style terms that don't show up on the above reports (about 90% of them), but if it's easier to wait and do them all in one fell swoop then that works for me. Andrew Gray (talk) 22:34, 3 December 2020 (UTC)
@Andrew Gray: I can wait. I've also saved off all P39 claims for senators as of today as a precaution. I'll just do this next week for the whole lot. Regards, Gettinwikiwidit (talk) 22:40, 3 December 2020 (UTC)
Sounds good. I've done a bundle just now; where the old start/end date is incorrect, I've updated it so it matches the new data and thus will disappear from the queries to avoid having to look at it twice. Will keep working at it tomorrow. Andrew Gray (talk) 23:55, 3 December 2020 (UTC)
@Andrew Gray: Thanks. Just saw this edit of yours. This resource looks great. Noting it here for posterity. Gettinwikiwidit (talk) 22:52, 4 December 2020 (UTC)
Yeah, it's proving very helpful. I assume it's primarily drawn from the Biographical Dictionary, but there are a couple of discrepancies - eg the Dictionary Carnahan leaving on 25/11/2002 and Talent starting on 23/11/2002, while the chronological list has both on the same date (23rd). Which is probably more likely to be correct! Andrew Gray (talk) 23:04, 4 December 2020 (UTC)
All start date mismatches now done except Albert Gallatin (Q500046) (who is the "not clear if he officially served as a Senator or not" case) and James Shields (Q923522) (noted above, unclear how officially to model his "first term", 5-15 March 1849). Going to try and get the rest of the end dates done tonight. Andrew Gray (talk) 19:03, 7 December 2020 (UTC)
And all end dates mismatches done, except for Albert Gallatin (Q500046) (as above), and the ones set to 3 March or 4 March. I've left these alone, as these seem to be ones where we'd be wanting to standardise term start/end dates anyway. Andrew Gray (talk) 19:33, 7 December 2020 (UTC)
@Andrew Gray: So are we good to remove the old model claims? Would you like to do it? I'm waiting on a clear go ahead. Gettinwikiwidit (talk) 14:14, 8 December 2020 (UTC)
@Gettinwikiwidit: Got my wires crossed and replied on the project chat thread not this one, sorry :-). I think we're good to go assuming we're still going to do a later run to standardise on either 3 March or 4 March. I'm happy to line this up in the next day or two. This list is the one I think is OK to delete; everyone on it has been checked with one of the date queries upthread, and is either resolved or a 3/4 March issue.
This list does not include the four "weird sort-of-Senators" who don't have new-style items with terms; I figured you'd want to keep those tagged for now so we don't lose track of them. Andrew Gray (talk) 22:51, 8 December 2020 (UTC)
  • Noting here. We can clean up this edit later. This is where I left off:

Start time off by 1 day or more:

SELECT ?senLabel ?stmt ?prop (?oldStart AS ?lastChecked) ( ?newStart AS ?wikiLastChecked ) (?start AS ?current) WHERE {
VALUES (?sen ?seat ?term ?oldStart ?newStart) {
( wd:Q237220 wd:Q101498964 wd:Q4642363 "1925-01-08T00:00:00Z"^^xsd:dateTime "1924-12-17T00:00:00Z"^^xsd:dateTime )
( wd:Q271243 wd:Q101498968 wd:Q4642306 "1922-10-03T00:00:00Z"^^xsd:dateTime "1922-11-21T00:00:00Z"^^xsd:dateTime )
( wd:Q271023 wd:Q101498917 wd:Q4641029 "1907-01-29T00:00:00Z"^^xsd:dateTime "1907-01-22T00:00:00Z"^^xsd:dateTime )
( wd:Q368920 wd:Q101498920 wd:Q4646121 "1980-05-17T00:00:00Z"^^xsd:dateTime "1980-05-19T00:00:00Z"^^xsd:dateTime )
( wd:Q433351 wd:Q101498973 wd:Q2573610 "1797-12-08T00:00:00Z"^^xsd:dateTime "1797-12-11T00:00:00Z"^^xsd:dateTime )
( wd:Q1689229 wd:Q101498927 wd:Q2057259 "2002-11-23T00:00:00Z"^^xsd:dateTime "2002-11-25T00:00:00Z"^^xsd:dateTime )
( wd:Q343849 wd:Q101498941 wd:Q4643765 "1945-07-25T00:00:00Z"^^xsd:dateTime "1945-07-24T00:00:00Z"^^xsd:dateTime )
( wd:Q5997815 wd:Q101498867 wd:Q4645016 "1961-12-07T00:00:00Z"^^xsd:dateTime "1962-01-10T00:00:00Z"^^xsd:dateTime )
( wd:Q714960 wd:Q101498872 wd:Q2395126 "1800-04-03T00:00:00Z"^^xsd:dateTime "1800-05-03T00:00:00Z"^^xsd:dateTime )
( wd:Q457691 wd:Q101498984 wd:Q230796 "1796-11-09T00:00:00Z"^^xsd:dateTime "1796-12-08T00:00:00Z"^^xsd:dateTime )
( wd:Q1148970 wd:Q101498997 wd:Q4643047 "1928-04-04T00:00:00Z"^^xsd:dateTime "1928-04-05T00:00:00Z"^^xsd:dateTime )
( wd:Q1571326 wd:Q101498946 wd:Q4644951 "1960-03-16T00:00:00Z"^^xsd:dateTime "1960-03-23T00:00:00Z"^^xsd:dateTime )
( wd:Q1579558 wd:Q101499072 wd:Q4643468 "1938-02-01T00:00:00Z"^^xsd:dateTime "1938-02-11T00:00:00Z"^^xsd:dateTime )
( wd:Q323511 wd:Q101498874 wd:Q347346 "1991-05-08T00:00:00Z"^^xsd:dateTime "1991-05-09T00:00:00Z"^^xsd:dateTime )
( wd:Q925743 wd:Q101499000 wd:Q3556780 "1993-01-02T00:00:00Z"^^xsd:dateTime "1993-01-05T00:00:00Z"^^xsd:dateTime )
( wd:Q1840588 wd:Q101498950 wd:Q3556780 "1993-01-21T00:00:00Z"^^xsd:dateTime "1993-01-23T00:00:00Z"^^xsd:dateTime )
( wd:Q1000051 wd:Q101498961 wd:Q4643305 "1934-01-01T00:00:00Z"^^xsd:dateTime "1933-12-18T00:00:00Z"^^xsd:dateTime )
( wd:Q5120540 wd:Q101499054 wd:Q18740945 "2018-04-09T00:00:00Z"^^xsd:dateTime "2018-04-02T00:00:00Z"^^xsd:dateTime )
( wd:Q888132 wd:Q101498869 wd:Q168778 "2006-01-17T00:00:00Z"^^xsd:dateTime "2006-01-18T00:00:00Z"^^xsd:dateTime )
  }
  # VALUES ?sen { wd:Q1148970 }
  ?sen p:P39 ?stmt;
       wdt:P31 wd:Q5.
  ?stmt ps:P39 wd:Q4416090;
        pq:P768 ?seat;
        pq:P2937 ?term;
        pq:P580 ?start.

  BIND("P580" AS ?prop)
  FILTER( ?oldStart = ?start )
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

End date of by more than 1 day:

SELECT ?senLabel ?stmt ?prop (?oldEnd AS ?lastChecked) ( ?newEnd AS ?wikiLastChecked ) (?end AS ?current) WHERE {
VALUES (?sen ?seat ?term ?oldEnd ?newEnd) {
( wd:Q376645 wd:Q101498902 wd:Q347346 "1992-11-10T00:00:00Z"^^xsd:dateTime "1992-11-03T00:00:00Z"^^xsd:dateTime )
( wd:Q1376165 wd:Q101498859 wd:Q4632990 "1845-05-01T00:00:00Z"^^xsd:dateTime UNDEF )
( wd:Q453709 wd:Q101499040 wd:Q3596857 "1996-11-05T00:00:00Z"^^xsd:dateTime "1996-11-06T00:00:00Z"^^xsd:dateTime )
( wd:Q202950 wd:Q101498973 wd:Q2573610 "1797-07-10T00:00:00Z"^^xsd:dateTime "1797-12-10T00:00:00Z"^^xsd:dateTime )
( wd:Q1374474 wd:Q101499049 wd:Q4638577 "1881-03-07T00:00:00Z"^^xsd:dateTime UNDEF )
( wd:Q1374474 wd:Q101499049 wd:Q4638577 "1883-03-03T00:00:00Z"^^xsd:dateTime UNDEF )
( wd:Q1672420 wd:Q101499049 wd:Q4643615 "1942-11-17T00:00:00Z"^^xsd:dateTime "1942-11-03T00:00:00Z"^^xsd:dateTime )
( wd:Q2059697 wd:Q101499057 wd:Q4646042 "1978-12-14T00:00:00Z"^^xsd:dateTime "1978-12-12T00:00:00Z"^^xsd:dateTime )
( wd:Q723444 wd:Q101498869 wd:Q4646187 "1982-12-27T00:00:00Z"^^xsd:dateTime "1982-12-20T00:00:00Z"^^xsd:dateTime )
( wd:Q457691 wd:Q101498984 wd:Q2395126 "1800-08-01T00:00:00Z"^^xsd:dateTime UNDEF )
( wd:Q711521 wd:Q101498865 wd:Q4635921 "1861-03-08T00:00:00Z"^^xsd:dateTime "1861-03-06T00:00:00Z"^^xsd:dateTime )
( wd:Q932530 wd:Q101498988 wd:Q2573610 "1797-10-01T00:00:00Z"^^xsd:dateTime UNDEF )
( wd:Q371165 wd:Q101498988 wd:Q1906490 "1801-05-06T00:00:00Z"^^xsd:dateTime "1801-03-05T00:00:00Z"^^xsd:dateTime )
( wd:Q880519 wd:Q101498988 wd:Q4547180 "1807-09-01T00:00:00Z"^^xsd:dateTime UNDEF )
( wd:Q883164 wd:Q101498880 wd:Q4550107 "1816-11-01T00:00:00Z"^^xsd:dateTime UNDEF )
( wd:Q925743 wd:Q101499000 wd:Q3556780 "1994-12-01T00:00:00Z"^^xsd:dateTime "1994-12-02T00:00:00Z"^^xsd:dateTime )
( wd:Q11815 wd:Q101498883 wd:Q223336 "1794-05-27T00:00:00Z"^^xsd:dateTime "1794-03-27T00:00:00Z"^^xsd:dateTime )
( wd:Q1355895 wd:Q101498993 wd:Q4635921 "1861-03-28T00:00:00Z"^^xsd:dateTime "1861-07-11T00:00:00Z"^^xsd:dateTime )
}
  # VALUES ?sen { wd:Q1148970 }
  ?sen p:P39 ?stmt;
       wdt:P31 wd:Q5.
  ?stmt ps:P39 wd:Q4416090;
        pq:P768 ?seat;
        pq:P2937 ?term.
  OPTIONAL { ?stmt pq:P582 ?end. }

  BIND("P582" AS ?prop)
  FILTER( ?oldEnd = ?end )
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
@Andrew Gray: None of the above examples have changed. Does this mean you checked them and deemed Wikidata to be accurate? If so, then it means Wikipedia is wrong. I'll do another round of reconciling to see what it looks like now after your edits. Gettinwikiwidit (talk) 14:14, 8 December 2020 (UTC)
Sorry - I was working on the date mismatches within WD, not on these ones. I haven't been crosschecking to the WP lists, so not sure I can say anything useful about these cases. I suspect they're mostly variants on "would have started their term on Day X, but didn't resign from the other job until a few weeks later". Andrew Gray (talk) 22:51, 8 December 2020 (UTC)
  • @Gettinwikiwidit: Spotted another import issue - statehood. Most new states came in and immediately elected two Senators, but in WD these have mostly been backdated to the start of that term. I've run a report here showing the first holder of each class seat in a state and comparing it to statehood - there's a lot of negative dates. Some are positive, as the state didn't elect Senators for a while (eg Iowa stuck it out for almost two years) but a) none should be negative, and b) with a few rare exceptions, eg NY and NC, both senators are likely to have turned up on the same day. Andrew Gray (talk) 23:29, 8 December 2020 (UTC)
@Andrew Gray: I fixed all the negative ones which seemed pretty easy reading the entry in the Biographical Directory, but when trying to audit the ones which are positive there are a lot of examples of language like "elected in 1848 as a Democrat to the United States Senate as one of the first Senators from the State of Iowa; reelected in 1852 and served from December 7, 1848, to March 3, 1859", which I assume means the start time should be the time the state was admitted, but it's just not explicit. Even the old model data doesn't start from the admission to the Union, probably because of confusion from this language. This may mean adding extra terms. Here George W. Jones (Q438644) is listed as starting in 30th United States Congress (Q4634661) though Iowa was admitted to the Union during the 29th United States Congress (Q4632990). I've only eyeballed it on this pass because my concentration was waning, but there is definitely some work to do here. Gettinwikiwidit (talk) 04:43, 9 December 2020 (UTC)
@Andrew Gray: I went through your list again and I'm pretty confident that all the dates of the senators first holding the given seat are correct. Two ones of note pop out, though. Did Jean Noel Destréhan (Q435963) actually serve as senator? I'm not clear from the description. Also, for Francis E. Warren (Q882624) there is the usual ambiguity of when he started serving versus when he took the oath of office.
@Gettinwikiwidit: This all looks great. I think the very long gaps as with Iowa do indicate an actual gap in Senators, so no need for extra terms - the WP articles suggest there was a deadlock in the state government. So Iowa was entitled to two senators from 1846, but didn't actually appoint them until 1848; the missing period is thus vacant with no members. I'd go with the "served from" date quoted by the Biographical Dictionary (or the Chronological List) in the absence of anything else.
For Destréhan (excellent name!) I think he probably was a Senator. The BD describe him as one, not as "senator-elect" like they do for some others. Not clear quite what "before qualifying" means, though - taking the oath? Given the short timeframe and the fact that the Senate was in recess (and there was a war on!) it might just mean that he never turned up before resigning. Andrew Gray (talk) 22:48, 9 December 2020 (UTC)
@Andrew Gray: In my last pass I used exactly the "served from" lines to check. Only a few items didn't use that language. Gettinwikiwidit (talk) 22:52, 9 December 2020 (UTC)
This Wikipedia entry suggests a detail about James Turner (Q369751) which might not be reflected in the Biographical Directory. Or it could be wrong. Just noting it here. Gettinwikiwidit (talk) 22:58, 9 December 2020 (UTC)
  • @Gettinwikiwidit: So I think I've found the discrepancy. This query which I estimated would find all the claims we needed to remove had an error in it - "?ps2 pq:P4100 ?seat". It was meant to require that the "new-style term" had a valid seat, but I slipped up when writing the query and actually required a valid party - P4100 not P768. Switching this to P768 finds 105 claims, all of which can presumably be deleted in the same way.
This also neatly explains why the earlier run missed King and Sanders - neither had a P4100 assigned for the new-style terms, so they wouldn't be on the first query. Andrew Gray (talk) 13:52, 12 December 2020 (UTC)
@Andrew Gray: Okay, I'll delete them. I just wanted to give you a chance to give them the once over. Regards, Gettinwikiwidit (talk) 19:14, 12 December 2020 (UTC)

Errors in the biographical directory

Cordell Hull is listed as serving until 3 March 1933, but his successor Nathan Bachman is listed as having started on February 28, 1933. Gettinwikiwidit (talk) 10:00, 4 December 2020 (UTC)

@Andrew Gray: FWIW, I set up a monitor of the Biographical Directory to check if there are any changes the list of terms served (for both Senate and Representative positions) and to my surprise 10 senators have been updated since a couple of days ago. And they're not current senators either.
It seemed worth noting. Gettinwikiwidit (talk) 09:46, 5 December 2020 (UTC)
Huh! Interesting indeed. And now I look at it I see there's a version/date tag in the XML, which could be useful in future. Andrew Gray (talk) 22:44, 5 December 2020 (UTC)

Whigs

Should there be a separate item for the Whig party in the United states? We have 238 United States senator (Q4416090) with parliamentary group (P4100) pointing to Whigs (Q108700). I'm guessing they should be pointing at Whig Party (Q42183). Gettinwikiwidit (talk) 04:28, 5 December 2020 (UTC)

Agree - I think we can just flip these all straight over. Andrew Gray (talk) 22:45, 5 December 2020 (UTC)
Fixed. Gettinwikiwidit (talk) 10:38, 6 December 2020 (UTC)

Tracking addition of new Senators or Representatives

There is a page on www.congress.gov which lists senator and representative ids for members who have served since 1973. Following changes to that page should be an easy way to see when there are new members of either house. The list also includes non-voting member of the U.S. House of Representatives (Q5253588) which I haven't gotten around to reconciling the Wikidata for yet. @Andrew Gray: Gettinwikiwidit (talk) 08:10, 9 December 2020 (UTC)

Similar project tracking US legislators

I've only just discovered this git repository. I'll try to cross check against the data already uploaded here as well as reach out to them. Regards, Gettinwikiwidit (talk) 23:38, 25 December 2020 (UTC) @Andrew Gray:

Jan 2021 update

@Gettinwikiwidit: New that Jan 3 has rolled around, looking at running some updates for the new term - are you planning to do these, or do you want me to have a shot at it? As I understand it:

  • 66 senators (in class 1/3 seats) automatically have a new term starting 3 January with the same seat details
  • 25 senators in class 2 seat were re-elected and will get new terms starting 3 January
  • 1 senator in a class 3 seat (Georgia) had an appointee, who has been re-appointed pending the election in a couple of days
  • 1 senator in a class 2 seat (Georgia again) had an appointee, who has been re-appointed pending the election in a couple of days
  • 6 senators in class 2 seats are *new*

I think this is all correct per w:2020 United States Senate elections, but please correct me if I'm wrong (which is distinctly possible with Georgia). I can queue up the edits for these tonight but happy to hold off if you're going to import from the official data. Andrew Gray (talk) 19:33, 3 January 2021 (UTC)

@Andrew Gray: I haven't started to look at this. I believe you said you'd update the end time (P582) for senators of the 116th United States Congress (Q28227688). I'm happy to leave the rest up to you as well, but will follow up as well. Feel free to leave it to me if you'd prefer. FWIW, the United States representative (Q13218630) of the 116th United States Congress (Q28227688) need to have their end time (P582) updated as well. I'll take care of this. Thanks again for your help with this. Regards, Gettinwikiwidit (talk) 23:17, 3 January 2021 (UTC)
@Gettinwikiwidit: Great. I wasn't sure where you'd got up to with the Representatives so didn't want to leap in with those.
I've closed off the existing terms, added minimal records for the six new members, and I'm just setting up a run to copy across seats/parties for the other 94 now. Andrew Gray (talk) 23:46, 3 January 2021 (UTC)
Update: all Senators now set up... except being an idiot I've listed them all as starting on 2020-1-3! Inevitable. Happens every year. I'll get that fixed now :-) Andrew Gray (talk) 00:04, 4 January 2021 (UTC)
Thanks very much! Sorry to be a pain, but would you mind adding references to these claims? Gettinwikiwidit (talk) 06:51, 4 January 2021 (UTC)
Hmm.. It looks like the Biographical Directory of the United States Congress (Q1150348) entries aren't up yet, though they're referred to here. Maybe we'll have to cycle back if this is to be the reference. They are up at that link, though. Gettinwikiwidit (talk) 07:19, 4 January 2021 (UTC)
Lastly, there's a new link for Biographical Directory of the United States Congress (Q1150348): https://bioguide.congress.gov/search/bio/A000009. Should we updated US Congress Bio ID (P1157) and run that purge script? Regards, Gettinwikiwidit (talk) 07:30, 4 January 2021 (UTC)
@Gettinwikiwidit: Sorry for missing refs - I've put a placeholder "imported from English Wikipedia" ref on all of them for the moment and can change that when the IDs go live.
Re the new ID links, interesting! In terms of updating US Congress Bio ID (P1157), I wonder if we could do some general tidying-up at the same time. At the moment all the references use reference URL (P854) and then a bioguideretro URL. We could switch this over to be stated in (P248):Biographical Directory of the United States Congress (Q1150348) and US Congress Bio ID (P1157):bioguideID for each reference, which makes it a little cleaner and future-proofs it in terms of future URL changes. Plus, making these edits would purge the item anyway. Thoughts? Andrew Gray (talk) 18:23, 4 January 2021 (UTC)
@Andrew Gray: I'm game, but I don't quite get how the future proofing would work. Isn't the link mechanism in a claim value the same as the one in a qualifier value? Or would you use an explicit url and avoid the link mechanism? Gettinwikiwidit (talk) 22:29, 4 January 2021 (UTC)
@Gettinwikiwidit: Using the property as a reference does have the same update-lag problems as using it as a normal claim, but since items with it would need to be purged anyway if the formatter URL for the property changes in the future, it won't make much difference (plus, who knows, that bug might get fixed sometime and it can happen automatically). The main advantage is that if the bioguideretro URLs stop working in the future, the URL in the references won't be stuck on the old format.
Is the new site now working for all IDs? If so, I think we can switch over the preferred formatter URL, and then I'll look at setting up the reference updates to go in a day or two later. Andrew Gray (talk) 23:21, 4 January 2021 (UTC)
@Andrew Gray: Thanks. It looks like it is up, but doesn't yet have the latest senators. I've updated US Congress Bio ID (P1157) and made the new url the preferred rank. Gettinwikiwidit (talk) 00:43, 5 January 2021 (UTC)
@Andrew Gray: FWIW, I noticed that the bioguide has updated. Here is the Tommy Tuberville entry. Gettinwikiwidit (talk) 09:51, 5 January 2021 (UTC)
@Gettinwikiwidit: Oh, good catch. As far as I can work out, what's happened here is that Perdue can't be appointed, as he was in a Class 2 seat, and so it's now "a term with no-one yet elected" rather than "a term where someone left and an appointee fills it". So there are 99 sitting members, and one vacant seat in Georgia; one of those members may lose her appointed seat in a couple of days depending on the outcome of the election. I've deleted Perdue's entry for this term.
Re the formatter URL, looks good. I've purged a couple as tests and they seem to be OK but will give it another day to make sure (it's sometimes a bit unpredictable), and then I'll start running everything tomorrow. Andrew Gray (talk) 19:49, 5 January 2021 (UTC)
Thanks very much. FWIW, I cross checked the 117th United States Congress (Q65089999) United States Senate (Q66096) with this website and it ties out with what's currently in Wikidata. Gettinwikiwidit (talk) 00:12, 6 January 2021 (UTC)
@Andrew Gray: Thanks. Hmm.. The oldest link is not redirecting to the new site: https://bioguide.congress.gov/scripts/biodisplay.pl?index=T000278. I'm pretty sure there was a period where this wasn't working. Now I'm confused about what best practice should be.  :-( Gettinwikiwidit (talk) 10:13, 19 January 2021 (UTC)
@Gettinwikiwidit: huh, interesting! I think the best approach is for us to use the preferred URL form set out by the service, which is I think currently the preferred one on the property. I still have not managed to crack the wb-cli references problem so have triggered the purge script to run through them all for now, and that'll make sure the updates have gone in. Andrew Gray (talk) 21:56, 19 January 2021 (UTC)

Updating end times for the 116th Congress Representatives

I've been meaning to do a write up of my progress with putting position held (P39) United States representative (Q13218630) claims on a similar footing as position held (P39) United States senator (Q4416090) claims. This won't be it, but the short story is that I believe there are claims with parliamentary term (P2937) and electoral district (P768) qualifiers for all terms since 1973 where I had a good reference. I believe I've found a good reference for the older terms, but haven't finished preparing that data. No older claims have been examined or removed.

That said, per the above topic, I've taken care of adding end time (P582) qualifiers for the entire 116th United States Congress (Q28227688) but have not yet gotten to adding claims for the 117th United States Congress (Q65089999). I did take care of the low hanging fruit by making pages for the 117th United States Congress (Q65089999) for both the House and the Senate.

Regards, Gettinwikiwidit (talk) 10:29, 5 January 2021 (UTC)

The bioguide annoyingly doesn't have district information. I'm going to scrape this page. I generally prefer to get information from original sources if possible. Gettinwikiwidit (talk) 11:11, 5 January 2021 (UTC)
Page scraped. 117th United States Congress (Q65089999) complete, though there are vacancies for Louisiana's 5th congressional district (Q6689045) and New York's 22nd congressional district (Q2814994). Gettinwikiwidit (talk) 15:29, 5 January 2021 (UTC)

Archive this page??

@Andrew Gray: I'm thinking of archiving this page now that a lot of this work has been done and starting again with a description of the choices made and an outline of work yet to be done. Whaddaya think? Gettinwikiwidit (talk) 10:16, 19 January 2021 (UTC)

Sounds good to me! Andrew Gray (talk) 22:29, 19 January 2021 (UTC)