This is a scraper that runs on Morph. To get started see the documentation
This repo scrap HK court cases from https://e-services.judiciary.hk/dcl/index.jsp?lang=en
This repositry is a prototype for the one of the projects pitched in the first g0vhk.io Hackathon on 2018-06-23 in Hong Kong. The main document written by project owner, Selina, listed the details in this Hackpad. Please go and read.
The scraped data can be found at https://morph.io/ylchan87/HKCourtList
scraper.py
the scraper, get called by morph.io, makes http request and parse the reply with courtParser.py
courtParser.py
parse the html to get the fields and save to database, as defined by dataModel.py
dataModel.py
the sqlAlchemy data model for the court cases
extractor.py
util to explode a html table to a python list of list (i.e. 2D array)
testTableExtract.py
test script to test extractor.py
a row in the court's timetable, the "main" table of the SQL DB
Case no. uniquely identifying a case. A case can have many events, when there's multiple hearings and trials. An event can also deal with multiple cases at the same time.
many-to-many relationship with event
many-to-many relationship with event
many-to-many relationship with event
Could be 1. Offence nature of the event (theft, robbery, etc) 2. Procedure type of the event (trial, hearing, mention, summon etc)
many-to-many relationship with event
To download data sign in with GitHub
rows 0 / 0
id | category | court | datetime | parties | parties_atk | parties_def |
---|---|---|---|---|---|---|
1
|
CRHPI
|
No.43
|
2019-03-18T09:30:00+00:00
|
|
hidden
|
hidden
|
2
|
CRHPI
|
No.43
|
2019-03-18T09:50:00+00:00
|
|
hidden
|
hidden
|
3
|
CRHPI
|
No.43
|
2019-03-18T10:50:00+00:00
|
|
hidden
|
hidden
|
4
|
CRHPI
|
No.43
|
2019-03-18T11:10:00+00:00
|
|
hidden
|
hidden
|
5
|
CRHPI
|
No.43
|
2019-03-18T11:30:00+00:00
|
|
hidden
|
hidden
|
6
|
CRHPI
|
No.43
|
2019-03-18T14:30:00+00:00
|
|
hidden
|
hidden
|
7
|
O14
|
No.41
|
2019-03-18T09:30:00+00:00
|
|
hidden
|
hidden
|
8
|
O14
|
No.41
|
2019-03-18T09:30:00+00:00
|
|
hidden
|
hidden
|
9
|
MCL
|
No.6
|
2019-03-18T09:30:00+00:00
|
hidden
|
|
|
10
|
MCL
|
No.6
|
2019-03-18T09:30:00+00:00
|
|
hidden
|
hidden
|
To download data sign in with GitHub
rows 10 / 455
id | name_zh | name_en |
---|---|---|
1
|
余敏奇聆案官
|
Master R. Yu
|
2
|
許家灝聆案官
|
Master K.H. Hui
|
3
|
雷健文聆案官
|
Master Lui
|
4
|
林文瀚上訴庭副庭長
|
Hon Lam VP
|
5
|
張澤祐上訴庭法官
|
Hon Cheung JA
|
6
|
鮑晏明上訴庭法官
|
Hon Barma JA
|
7
|
薛偉成上訴庭法官
|
Hon Zervos JA
|
8
|
朱芬齡上訴庭法官
|
Hon Chu JA
|
9
|
潘兆初上訴庭法官
|
Hon Poon JA
|
10
|
區慶祥上訴庭法官
|
Hon Au JA
|
To download data sign in with GitHub
rows 10 / 216531
id | caseNo | description |
---|---|---|
1
|
HCPI1037/2018
|
|
2
|
HCPI1038/2018
|
|
3
|
HCPI1041/2018
|
|
4
|
HCPI1042/2018
|
|
5
|
HCPI866/2017
|
|
6
|
HCPI519/2016
|
|
7
|
HCA2917/2018
|
民事訴訟
|
8
|
HCA2769/2017
|
民事訴訟
|
9
|
HCA1185/2018
|
|
10
|
HCA249/2019
|
民事訴訟
|
To download data sign in with GitHub
rows 10 / 5088
id | name_zh | name_en |
---|---|---|
1
|
反對自動解除破產
|
Objections to discharge
|
2
|
有關無力償還的雜項申請
|
Miscellaneous Insolvency Application
|
3
|
簡易判決
|
O.14 List
|
4
|
破產呈請
|
Bankruptcy Petition
|
5
|
核對列表聆訊/案件管理會議
|
Check List/Case Management Conference
|
6
|
核對列表審核聆訊 (人身傷亡案件)
|
Checklist Review Hearing(PI Cases)
|
7
|
公司清盤呈請
|
Winding Up Petition
|
8
|
勞資審裁處
|
Labour Tribunal
|
9
|
傳票(停止代表訴訟當事人)
|
Summons (To cease acting)
|
10
|
法庭指示
|
For Directions
|
To download data sign in with GitHub
rows 10 / 1115
id | name_zh | name_en |
---|---|---|
1
|
方氏律師事務所
|
FONGS
|
2
|
劉陳高律師事務所
|
Lau, Chan & Ko
|
3
|
柯廣輝律師事務所
|
Or & Partners
|
4
|
梁鳳慈律師行
|
Winnie Leung & Co.
|
5
|
尹麗儀律師行
|
Mandy Wan & Co.
|
6
|
陳應達律師事務所
|
Y.T. Chan & Co.
|
7
|
|
CHIH
|
8
|
蘇龍律師事務所
|
So, Lung & Associates
|
9
|
陳、陳律師行
|
Chan & Chan
|
10
|
何韋律師行
|
Howse Williams
|
To download data sign in with GitHub
rows 10 / 408557
event_id | judge_id |
---|---|
1
|
1
|
2
|
1
|
3
|
1
|
4
|
1
|
5
|
1
|
6
|
1
|
7
|
2
|
8
|
2
|
9
|
3
|
10
|
3
|
To download data sign in with GitHub
rows 0 / 0
event_id | case_id |
---|---|
1
|
1
|
2
|
2
|
3
|
3
|
4
|
4
|
5
|
5
|
6
|
6
|
7
|
7
|
8
|
8
|
9
|
9
|
10
|
10
|
To download data sign in with GitHub
rows 0 / 0
event_id | tag_id |
---|---|
1
|
6
|
2
|
6
|
3
|
6
|
4
|
6
|
5
|
6
|
6
|
6
|
7
|
3
|
8
|
3
|
9
|
9
|
10
|
10
|
To download data sign in with GitHub
rows 10 / 137639
event_id | lawyer_id |
---|---|
7
|
12
|
7
|
13
|
7
|
14
|
8
|
15
|
9
|
16
|
10
|
17
|
11
|
16
|
12
|
10
|
14
|
20
|
17
|
23
|
To download data sign in with GitHub
rows 10 / 68578
event_id | lawyer_id |
---|---|
1
|
1
|
2
|
3
|
3
|
5
|
4
|
7
|
5
|
9
|
6
|
11
|
13
|
18
|
16
|
21
|
18
|
23
|
19
|
23
|
To download data sign in with GitHub
rows 10 / 68578
event_id | lawyer_id |
---|---|
1
|
2
|
2
|
4
|
3
|
6
|
4
|
8
|
5
|
10
|
6
|
4
|
13
|
19
|
16
|
22
|
18
|
24
|
19
|
24
|
Average successful run time: 7 minutes
Total run time: 10 days
Total cpu time used: about 5 hours
Total disk space used: 79.7 MB