-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
网页端接口大部分都换成wbi,需要w_rid了 #631
Comments
这个 |
这么说,可生成? |
这个 wbi 是必须登录的吗? |
不是,我观察到的结果是不管登录与否,个人信息api获取到的那两个值都是一样的 |
看不懂啊, |
https://api.bilibili.com/x/web-interface/nav 里的数据在同一时间内是定值 且三小时过去了仍然没变
|
Python demo By Drelf2018 Nemo2011/bilibili-api#290 (comment) and z0z0r4 import hashlib
import time
from functools import reduce
import httpx
HEADERS = {"User-Agent": "Mozilla/5.0", "Referer": "https://www.bilibili.com"}
def getMixinKey(ae):
oe = [46, 47, 18, 2, 53, 8, 23, 32, 15, 50, 10, 31, 58, 3, 45, 35, 27, 43, 5, 49, 33, 9, 42, 19, 29, 28, 14, 39, 12, 38, 41,
13, 37, 48, 7, 16, 24, 55, 40, 61, 26, 17, 0, 1, 60, 51, 30, 4, 22, 25, 54, 21, 56, 59, 6, 63, 57, 62, 11, 36, 20, 34, 44, 52]
le = reduce(lambda s, i: s + ae[i], oe, "")
return le[:32]
def encWbi(params: dict):
resp = httpx.get("https://api.bilibili.com/x/web-interface/nav")
wbi_img: dict = resp.json()["data"]["wbi_img"]
img_url: str = wbi_img.get("img_url")
sub_url: str = wbi_img.get("sub_url")
img_value = img_url.split("/")[-1].split(".")[0]
sub_value = sub_url.split("/")[-1].split(".")[0]
me = getMixinKey(img_value + sub_value)
wts = int(time.time())
params["wts"] = wts
Ae = "&".join([f'{key}={value}' for key, value in params.items()])
w_rid = hashlib.md5((Ae + me).encode(encoding='utf-8')).hexdigest()
return w_rid, wts
if __name__ == "__main__":
w_rid, wts = encWbi({"mid": 558830935}) |
demo: /~https://github.com/12345-mcpython/bilibili-console/tree/main/bilibili/utils.py # ps=5 即想给 API 传的参数
"https://api.bilibili.com/x/web-interface/wbi/index/top/feed/rcmd?" + encrypt_wbi("ps=5") |
到现在也还没变,可能是一天? |
现在已经变了 |
|
无所谓,我两小时后都返校了( |
|
需要进行排序的,类似 APP 的 sign 算法 |
可以从 |
请问我的代码是照着写的,为什么爬出来的结果是“账号未登录”,代码如下: import httpx HEADERS = {"User-Agent": "Mozilla/5.0", "Referer": "https://www.bilibili.com/"} def getMixinKey(ae): def encWbi(params: dict): if name == "main":
|
@AronTK 你的请求标头(headers)没有cookie呀,相当于你是未登录状态去爬数据,当然会返回“账号未登录”。参考一个成熟的爬虫代码,改一改吧。 |
解决大问题是说之前写的爬虫都跑了大半年了没问题,最近突然就数据出错了呢,好不容易找到这个项目,真是雪中送碳解决了燃眉之急,再次跑通了,主要参考了wbi签名 参考代码from functools import reduce
from hashlib import md5
import urllib.parse
import time
import requests
mixinKeyEncTab = [
46, 47, 18, 2, 53, 8, 23, 32, 15, 50, 10, 31, 58, 3, 45, 35, 27, 43, 5, 49,
33, 9, 42, 19, 29, 28, 14, 39, 12, 38, 41, 13, 37, 48, 7, 16, 24, 55, 40,
61, 26, 17, 0, 1, 60, 51, 30, 4, 22, 25, 54, 21, 56, 59, 6, 63, 57, 62, 11,
36, 20, 34, 44, 52
]
def getMixinKey(orig: str):
'对 imgKey 和 subKey 进行字符顺序打乱编码'
return reduce(lambda s, i: s + orig[i], mixinKeyEncTab, '')[:32]
def encWbi(params: dict, img_key: str, sub_key: str):
'为请求参数进行 wbi 签名'
mixin_key = getMixinKey(img_key + sub_key)
curr_time = round(time.time())
params['wts'] = curr_time # 添加 wts 字段
params = dict(sorted(params.items())) # 按照 key 重排参数
# 过滤 value 中的 "!'()*" 字符
params = {
k : ''.join(filter(lambda chr: chr not in "!'()*", str(v)))
for k, v
in params.items()
}
query = urllib.parse.urlencode(params) # 序列化参数
wbi_sign = md5((query + mixin_key).encode()).hexdigest() # 计算 w_rid
params['w_rid'] = wbi_sign
return params
def getWbiKeys() :
'获取最新的 img_key 和 sub_key'
resp = requests.get('https://api.bilibili.com/x/web-interface/nav')
resp.raise_for_status()
json_content = resp.json()
img_url: str = json_content['data']['wbi_img']['img_url']
sub_url: str = json_content['data']['wbi_img']['sub_url']
img_key = img_url.rsplit('/', 1)[1].split('.')[0]
sub_key = sub_url.rsplit('/', 1)[1].split('.')[0]
return img_key, sub_key
def get_query(**parameters: dict):
"""
获取签名后的查询参数
"""
img_key, sub_key = getWbiKeys()
signed_params = encWbi(
params=parameters,
img_key=img_key,
sub_key=sub_key
)
query = urllib.parse.urlencode(signed_params)
return query 通过这段代码完成query的返回,然后参考工程中其他原来的接口,将之前接口的参数传递给get_query方法即可获取签名后的查询参数子串了,直接拼接到原来接口上就可以了。 使用demo: def getpageinfo(page):
query = get_query(mid=137324885, ps=30, pn=page)
url_getvideo = f'https://api.bilibili.com/x/space/wbi/arc/search?{query}'
print(url_getvideo)
try:
videoinfo = requests.get(url_getvideo,headers=headers).json() # 获取当前页视频的信息
print(videoinfo)
videoinfo = videoinfo['data']['list']['vlist']
except Exception as e:
videoinfo = None 这个主要是获取B站UP主所有视频信息的一个接口使用方法,采用上述方案后又可以愉快的玩耍了。 |
问问,token以及web_locate 这些参数对于wbi是必须传入的吗? Nemo2011/bilibili-api#301 (comment) 是否必须按照web端参数一个不漏的传? |
感谢逆向和分析,按照文档写了下,可以验证通过。 |
我这里试了是不需要token和web_locate之类的参数,单独传一个mid也可以 只要计算好对应的w_rid就行 |
Rust 版本,测试覆盖0%: |
@poly000 这个代码我得改改,写得有些问题 |
啊这 |
贡献一个php代码: //---------------获取用户信息开始---------------
// 对 imgKey 和 subKey 进行字符顺序打乱编码
function getMixinKey($orig) {
$mixinKeyEncTab = array(46, 47, 18, 2, 53, 8, 23, 32, 15, 50, 10, 31, 58, 3, 45, 35, 27, 43, 5, 49,
33, 9, 42, 19, 29, 28, 14, 39, 12, 38, 41, 13, 37, 48, 7, 16, 24, 55, 40,
61, 26, 17, 0, 1, 60, 51, 30, 4, 22, 25, 54, 21, 56, 59, 6, 63, 57, 62, 11,
36, 20, 34, 44, 52); // 将 mixinKeyEncTab 补充完整,包括各个元素的值
$temp = '';
foreach ($mixinKeyEncTab as $n) {
$temp .= $orig[$n];
}
return substr($temp, 0, 32);
}
// 为请求参数进行 wbi 签名
function encWbi($params, $img_key, $sub_key) {
$mixin_key = getMixinKey($img_key . $sub_key);
$curr_time = time();
$chr_filter = '/[!\'\(\)*]/';
$query = [];
$params['wts'] = $curr_time; // 添加 wts 字段
// 按照 key 重排参数
ksort($params);
foreach ($params as $key => $value) {
$filtered_value = preg_replace($chr_filter, '', (string)$value);
$query[] = urlencode($key) . '=' . urlencode($filtered_value);
}
$query_string = implode('&', $query);
$wbi_sign = md5($query_string . $mixin_key);
return $query_string . '&w_rid=' . $wbi_sign;
}
// 获取最新的 img_key 和 sub_key
function getWbiKeys() {
$url = 'https://api.bilibili.com/x/web-interface/nav';
$json = get_Url($url);
$data = json_decode($json, true);
$img_url = $data['data']['wbi_img']['img_url'];
$sub_url = $data['data']['wbi_img']['sub_url'];
$img_key = substr(strrchr($img_url, '/'), 1, -4);
$sub_key = substr(strrchr($sub_url, '/'), 1, -4);
return ['img_key' => $img_key, 'sub_key' => $sub_key];
}
function get_bili_query($parameters)
{
$img_key_sub_key = getWbiKeys();
$params = [];
foreach ($parameters as $key => $value) {
$params[$key] = $value;
}
return encWbi(
$params,
$img_key_sub_key['img_key'],
$img_key_sub_key['sub_key']
);
}
//---------------获取用户信息END--------------- 使用: public function getBiliData() {
$parameters = [
'mid' => '1',
'ps' => '25'
];
$query = get_bili_query($parameters);
$url = "https://api.bilibili.com/x/space/wbi/arc/search?$query";
$json = get_Url($url);
$jsonObj = json_decode($json,true);
$vlist = $jsonObj['data']['list']['vlist'];
return array_slice($vlist, 0, 3); //获取前3条
} |
/**
* 模拟浏览器访问
* @param $url
* @return bool|string
*/
function get_Url($url) {
$ifpost = 0;
$datafields = '';
$cookiefile = '';
$user_agent = $_SERVER['HTTP_USER_AGENT'];
$v = false;
//构造随机ip
$ip_long = array(
array('607649792', '608174079'), //36.56.0.0-36.63.255.255
array('1038614528', '1039007743'), //61.232.0.0-61.237.255.255
array('1783627776', '1784676351'), //106.80.0.0-106.95.255.255
array('2035023872', '2035154943'), //121.76.0.0-121.77.255.255
array('2078801920', '2079064063'), //123.232.0.0-123.235.255.255
array('-1950089216', '-1948778497'), //139.196.0.0-139.215.255.255
array('-1425539072', '-1425014785'), //171.8.0.0-171.15.255.255
array('-1236271104', '-1235419137'), //182.80.0.0-182.92.255.255
array('-770113536', '-768606209'), //210.25.0.0-210.47.255.255
array('-569376768', '-564133889'), //222.16.0.0-222.95.255.255
);
$rand_key = mt_rand(0, 9);
$ip= long2ip(mt_rand($ip_long[$rand_key][0], $ip_long[$rand_key][1]));
//模拟http请求header头
$header = array("Connection: Keep-Alive","Accept: text/html, application/xhtml+xml, */*", "Pragma: no-cache", "Accept-Language: zh-Hans-CN,zh-Hans;q=0.8,en-US;q=0.5,en;q=0.3","User-Agent: .$user_agent",'CLIENT-IP:'.$ip,'X-FORWARDED-FOR:'.$ip);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, $v);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$ifpost && curl_setopt($ch, CURLOPT_POST, $ifpost);
$ifpost && curl_setopt($ch, CURLOPT_POSTFIELDS, $datafields);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$cookiefile && curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiefile);
$cookiefile && curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiefile);
curl_setopt($ch,CURLOPT_TIMEOUT,60); //允许执行的最长秒数
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
$ok = curl_exec($ch);
curl_close($ch);
unset($ch);
return $ok;
} |
如果有人访问https://api.bilibili.com/x/space/wbi/arc/search 遇到request was banned问题
headers可以加入referer字段解决问题,用下面格式:(SESSDATA可以用浏览器无痕模式获取)
|
verifyString有点复杂
/~https://github.com/DIYgod/RSSHub/blob/2b233e878c11cc4660aa2066d4a91aa9501ea97e/lib/v2/bilibili/utils.js#LL8-L13C3
/~https://github.com/DIYgod/RSSHub/blob/2b233e878c11cc4660aa2066d4a91aa9501ea97e/lib/v2/bilibili/cache.js#LL11-L43
The text was updated successfully, but these errors were encountered: