_AugmentedIntelligence Forums
[RELEASE] WikiMedia Dumps2SQL - Printable Version

+- _AugmentedIntelligence Forums (http://macdaddy4sure.com/forum)
+-- Forum: _AugmentedIntelligence (http://macdaddy4sure.com/forum/forumdisplay.php?fid=3)
+--- Forum: Datamining and Datasets (http://macdaddy4sure.com/forum/forumdisplay.php?fid=12)
+--- Thread: [RELEASE] WikiMedia Dumps2SQL (/showthread.php?tid=2)



[RELEASE] WikiMedia Dumps2SQL - Macdaddy4sure - 09-12-2023

_AugmentedIntelligence WikiMedia Dumps2SQL

I have written a WikiMedia program for parsing dumps from the like of: Wikipedia, Wiktionary, WikiSimple, WikiHow, etc.
Download Link: https://github.com/Macdaddy4sure/WikipediaParser3
1. Install MySQL if you have not installed already.
1. Go to https://www.mysql.com/downloads/
2. Download the community version of MySQL
3. After the executable has been downloaded; execute it.
4. Install the community installer and the most recent version of MySQL server. Note: I would recommend installing the Visual Studio C++ Connector if you plan to edit or mod the source of _AugmentedIntelligence or if you wish to use your own or third party software.
5. Make sure MySQL Server is installed as a service.
2. Download a Wikia Dump from https://dumps.wikimedia.org/backup-index.html
3. Download 7-zip if you have not installed it already.
1. Go to https://www.7-zip.org/download.html
2. I recommend downloading the 64-bit Windows Executable file (64-bit Windows x64)
3. After the executable has been downloaded, install it
4. Extract the Wikia dump XML file from the archive.
5. Download the following PowerShell tool for extracting the dump.
1. Open PowerShell as an administrator.
2. Run the following: Install-Script Split-Wikipedia -Scope CurrentUser
3. Close PowerShell and launch it again as an administrator.
4. In PowerShell Prompt type: Split-Wikipedia -Path ./enwiki-latest-pages-articles.xml
5. This process will take a number of hours.
6. The articles are saved to ./articles
6. Download and install Git
1. If you have not installed git; Go to:https://git-scm.com/download/win
2. Download and install git.
3. Open Advanced System Settings by right clicking on This PC and left clicking on Properties.
4. Click on Environment Variables.
5. Under the Path variable, paste the installation path or bin directory of git.
6. Save and close Environment Variables.
7. Download or Clone the repository
1. Go to: https://github.com/Macdaddy4sure/WikipediaParser3
2. Click on the green button toward the top right of the page.
3. Copy the text URL inside the menu.
4. Open Command Prompt and navigate to the directory you would like to store the source and executable for Wikipedia Parser.
5. Type in the prompt: git clone https://github.com/Macdaddy4sure/WikipediaParser3.git
8. Use DirectoryUpLevel.exe to move all files inside of ./articles to another directory.
1. DirectoryUpLevel.exe is located inside the x64 directory under the cloned repository.
2. Open DirectoryUpLevel.exe and type in the path to the articles directory.
3. I recommend creating a directory on the same level as the articles directory and call it ./Done.
4. Enter that directory into the second prompt in DirectoriesUpLevel.
5. Press enter and the process will be executed.
9. Open Command Prompt and Create a new database inside of MySQL for the Wiki you are uploading.
10. Open WikipediaParser3 and configure the program.
1. Open WikipediaParser3
2. Press 0 (Zero) and press enter to open the settings menu.
3. Use the options in the menu to configure the program.
11. Option 5. The Wiki Dump location should be the directory of the “Done” folder.
1. Make sure all of the forward slashes are forward slashes (/).
12. Press 0 to return to the main menu.
13. Press 1 to start the process.[/code]