Wikidata:Property proposal/Baidu Baike page numeric ID

From Wikidata
Jump to navigation Jump to search

‎Baidu Baike page numeric ID[edit]

Originally proposed at Wikidata:Property proposal/Authority control

   Not done
Descriptionnumerical identifier for an article or other page on Baidu Baike (Q803722)
Data typeExternal identifier
Allowed values\d+
Example 1Qu Bo (Q6059091)10076
Example 2Book and Sword Chronicles (Q4942962)11202
Example 3Xu Xu (Q107616098)8939392
Sourcebaike.baidu.com
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Planned useadding external links to information about Chinese topics
Number of IDs in source25.54 million (as of Feb 2022)
Implied notabilityWikidata property for an identifier that does not imply notability (Q62589320)
Formatter URLhttps://baike.baidu.com/view/$1.htm
See alsoTemplate:Baidu Baike (Q104804206)
Applicable "stated in"-valueBaidu Baike (Q803722)
Distinct-values constraintyes
Wikidata projectWikiProject East Asia (Q21829890)

Motivation[edit]

There is a lot of information on Baidu Baike (Q803722), one of the most prominent Chinese-language online collaborative encyclopedias, that should be linked with Wikidata items. I am using the numerical page ID rather than the page name, because my understanding is that pages can be moved, and the numerical page ID is a stable identifier that will redirect to the newest page name. The process for obtaining this ID is described in en:Template:Baidu Baike, but the idea is to look at the page source, then search for "clickstream.setLogGlobalParam" and copy the "lemmaId" number.

As it is user generated content that is semi-regulated, presence of a Baidu Baike page should not be used to determine notability, but as the content is encyclopedic it would be very helpful as a starting point to look for sources for many articles. As there are 25.54 million pages, there are many notable articles on Baike that have no Wikidata item right now.

--Habst (talk) 17:44, 23 November 2023 (UTC)[reply]

For an addressing of the valid copyright violation concerns from previous proposals, see responses below at Special:Diff/2017180633 and Special:Diff/2020221085. --Habst (talk) 13:44, 29 November 2023 (UTC)[reply]

Discussion[edit]

 Support as nominator,

C933103 (talkcontribslogs) Daniel Mietchen (talkcontribslogs) FudimeZ (talkcontribslogs)

Notified participants of WikiProject East Asia --Habst (talk) 17:49, 23 November 2023 (UTC)[reply]

In trying to go over some potential values, I can't seem to find a numerical page ID for this article: https://baike.baidu.com/item/%E5%86%AF%E5%BF%97%E5%BC%BA/20615042

It's for the Chinese athlete Feng Zhiqiang. I think the usual method isn't working because there are multiple people that share his Chinese (and English) name of 冯志强, so when you search for clickstream.setLogGlobalParam( in the page source and take the first lemma ID of 419722, it redirects to this page which is for the martial artist Feng Zhiqiang, not the athlete Feng Zhiqiang. Any ideas? --Habst (talk) 22:31, 23 November 2023 (UTC)[reply]

  • Previous proposals: 2015, 2018, 2021, 2021.--GZWDer (talk) 15:21, 24 November 2023 (UTC)[reply]
    @GZWDer: Thanks for the links. Based on reading those proposals, there are valid concerns about linking to copyvio content.
    For the record, my inspiration to make this proposal was to link the Wu Yuang (Q59576565) item (Chinese track and field sprinter) to its Baidu Baike page, because when I was trying to look up info on this person I was tipped off to a lot of verifiably true useful pieces of info (i.e. medals won, championships records) based on machine translation from the athlete's Baidu Baike page I found. The utility wasn't from blindly trusting the info itself, but from finding some starting points to then verify from other sources.
    There is already a Wikipedia template used to linking to these articles here: en:Template:Baidu Baike so I can see that at least some of the articles have encyclopedic value, as they are used in the English Wikipedia. As there are over 25.54 million IDs, I would try to not paint with a broad brush and see that if many of them have value, I think it would be worth including as an external identifier.
    Of course, as with any user-generated content website, there will be some IDs that are blatant copyright violations that we shouldn't link. But the same is true of YouTube video ID (P1651), Twitter post ID (P5933), and many other UGC websites that Wikidata has authority control properties for. We will have to handle those on a case-by-case basis to not include links to copyright violations on Wikidata. --Habst (talk) 21:35, 24 November 2023 (UTC)[reply]
 Oppose as the reasons for opposition to the prior proposals have not been addressed in this one. Mahir256 (talk) 19:33, 27 November 2023 (UTC)[reply]
@Mahir256, thanks for the response -- what reasons specifically do you think should be addressed? From my reading of the proposals, it seems like the primary objection was due to the idea that Baidu Baike is uniquely prone to copyright violations, which is entirely valid. However, there is also indisputably lots of original content / properly cited content on Baidu Baike as well that would be very valuable to have connected to Wikidata, even if some pages are copyright violations.
My idea to mitigate this would be to add links on a case-by-case basis, only when no copyright violations are found and there is no reason to suspect copyright violation. I think that if we disallow Baidu Baike, we would need to at least make a case that Baike is more prone to copyright violation than existing properties YouTube video ID (P1651), Twitter post ID (P5933), or even Fandom article ID (P6262), which I don't think has been comprehensively proven yet. --Habst (talk) 13:30, 29 November 2023 (UTC)[reply]