Thread: Offline Wiki Dumps [474-742]

Page 1 of 3 123 LastLast
Results 1 to 10 of 25
  1. #1 Offline Wiki Dumps [474-742, osrs, rs3] 
    Renown Programmer
    Greg's Avatar
    Join Date
    Jun 2010
    Posts
    1,179
    Thanks given
    260
    Thanks received
    1,012
    Rep Power
    2003
    Attached image

    Scraping the live wiki sites or using web.archive.org is no more.

    I contacted Fandom and they fixed the database dumps which had been down for the past few years.
    Once extracted the data totals almost 490gb (221342 wiki pages, 0 images) so I wrote a script to filter page revisions by date and namespaces and ran it for some of the most common revisions
    Spoiler for code:

    Code:
    import java.io.File
    import java.io.FileNotFoundException
    import java.io.FileReader
    import java.io.FileWriter
    import java.time.LocalDate
    import java.time.format.DateTimeFormatter
    import java.util.*
    import java.util.concurrent.TimeUnit
    import javax.xml.stream.*
    import javax.xml.stream.events.XMLEvent
    
    /**
     * Takes full pages_full.xml containing all pages and history from Special:Statistics
     * Filters pages by namespace and revisions by date
     * Outputs xml
     * When output is zipped with bzip2 can be read with [WikiTaxi](https://www.yunqa.de/delphi/products/wikitaxi/index)
     */
    object RunescapeWikiPagesFullFilter {
    
        private class Parser(val date: LocalDate, dir: String, val namespacesToDump: Set<String>, val message: Boolean) {
    
            val file = File(dir, "runescapewiki-latest-pages-articles-${date.toString().replace(" ", "-")}.xml")
    
            init {
                if (!file.exists()) {
                    file.createNewFile()
                }
            }
    
            val pageEvents = mutableListOf<XMLEvent>()
            val revisionEvents = mutableListOf<XMLEvent>()
            val mostRecentRevisionEvents = mutableListOf<XMLEvent>()
            val eventWriter = XMLOutputFactory.newInstance().createXMLEventWriter(FileWriter(file))!!
            val eventFactory = XMLEventFactory.newInstance()!!
            val redirectPattern = "#(?:REDIRECT|redirect) \\[\\[(.*)]]".toRegex()
    
            var validRevision = false
            var revision = 0
            var type: String? = null
            var pages = 0
            var skipPage = false
            var title: String = ""
    
            val namespaceIndex = mutableMapOf<String, Int>()
            val namespaces = mutableSetOf<Int>()
            var namespaceId: String? = null
    
            fun reset() {
                revision = 0
                type = null
                title = ""
                validRevision = false
                pageEvents.clear()
                revisionEvents.clear()
                mostRecentRevisionEvents.clear()
            }
    
            fun parse(event: XMLEvent) {
                var event = event
                if (skipPage) {
                    if (event.eventType == XMLStreamConstants.END_ELEMENT && event.asEndElement().name.localPart == "page") {
                        skipPage = false
                    }
                    return
                }
    
                when (event.eventType) {
                    XMLStreamConstants.START_ELEMENT -> {
                        type = event.asStartElement().name.localPart
                        when (type) {
                            "namespace" -> namespaceId = event.asStartElement().attributes.next().value
                            "revision" -> revision = 1
                        }
                    }
                    XMLStreamConstants.CHARACTERS -> {
                        when (type) {
                            "namespace" -> {
                                val id = namespaceId?.toInt()
                                if (id != null) {
                                    namespaceIndex[event.asCharacters().toString().trim()] = id
                                    namespaces.add(id)
                                    namespaceId = null
                                }
                            }
                            "timestamp" -> {
                                val timestamp = LocalDate.parse(event.asCharacters().toString(), inputFormatter)
                                validRevision = timestamp.isBefore(date)
                                type = null
                            }
                            "ns" -> {
                                val id = event.asCharacters().toString().toIntOrNull()
                                if (id != null) {
                                    if (namespaces.contains(id)) {
                                        skipPage = true
                                        reset()
                                        return
                                    } else if (message) {
                                        println(title)
                                    }
                                }
                            }
                            "title" -> {
                                val chars = event.asCharacters()
                                if (!chars.isWhiteSpace) {
                                    title = chars.toString()
                                    if (title.startsWith("Template:Signatures/") || title.startsWith("Template:Signature/") || title.startsWith("Template:Userbox/")) {
                                        skipPage = true
                                        reset()
                                        return
                                    }
                                }
                            }
                        }
                    }
                }
    
                when (revision) {
                    2 -> revision = 0
                    1 -> revisionEvents.add(event)
                    else -> pageEvents.add(event)
                }
    
                when (event.eventType) {
                    XMLStreamConstants.END_ELEMENT -> {
                        when (event.asEndElement().name.localPart) {
                            "namespaces" -> {
                                namespaces.removeAll(namespacesToDump.map { namespaceIndex[it] })
                            }
                            "revision" -> {
                                if (validRevision) {
                                    mostRecentRevisionEvents.clear()
                                    mostRecentRevisionEvents.addAll(revisionEvents)
                                }
                                revisionEvents.clear()
                                revision = 2
                            }
                            "siteinfo" -> {
                                pageEvents.forEach {
                                    eventWriter.add(it)
                                }
                            }
                            "page" -> {
                                if (!ignoreEmptyPages || mostRecentRevisionEvents.isNotEmpty()) {
                                    val pageClose = pageEvents.removeAt(pageEvents.lastIndex)
                                    var priorRedirect: String? = null
                                    pageEvents.forEach { event ->
                                        eventWriter.add(event)
                                        if (event.eventType == XMLStreamConstants.START_ELEMENT && event.asStartElement().name.localPart == "redirect") {
                                            priorRedirect = event.asStartElement().attributes.next().value
                                        }
                                    }
    
                                    var priorEventType: String? = null
                                    mostRecentRevisionEvents.forEach {
                                        var event = it
                                        // Replace redirect revision text with title from page redirect tag (fixes redirects for rs3 renames)
                                        if (priorRedirect != null && event.eventType == XMLStreamConstants.CHARACTERS && priorEventType == "text") {
                                            val text = event.asCharacters().toString()
                                            if (text.contains("#redirect", true)) {
                                                val result = redirectPattern.matchEntire(text)?.groupValues?.last()
                                                if (result != null && result != priorRedirect && !result.startsWith("$priorRedirect#")) {
                                                    event = eventFactory.createCharacters(text.replace(result, priorRedirect!!))
                                                }
                                            }
                                        }
                                        priorEventType = if (event.eventType == XMLStreamConstants.START_ELEMENT) event.asStartElement().name.localPart else null
    
                                        eventWriter.add(event)
                                    }
                                    eventWriter.add(pageClose)
                                    eventWriter.flush()
                                    pages++
                                }
                                reset()
                            }
                            "mediawiki" -> eventWriter.add(event)
                        }
                    }
                }
            }
    
            fun finish() {
                println("Total pages: $pages for $date")
                eventWriter.close()
            }
    
            companion object {
                private val inputFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss'Z'", Locale.ENGLISH)!!
            }
        }
    
        private const val ignoreEmptyPages = true
    
        @JvmStatic
        fun main(args: Array<String>) {
            try {
                System.setProperty("entityExpansionLimit", "0")
                System.setProperty("totalEntitySizeLimit", "0")
                System.setProperty("jdk.xml.totalEntitySizeLimit", "0")
                val namespaces = setOf(
                    "",
                    "Template",
                    "Category",
                    "Update",
                    "Exchange",
                    "Charm",
                    "Calculator",
                    "Map",
                    "Transcript"
                )
                val directory = "${System.getProperty("user.home")}\\Downloads\\runescape_pages_full\\"
                val dates = mapOf(
                    474 to LocalDate.of(2007, 11, 12),
                    530 to LocalDate.of(2009, 2, 9),
                    550 to LocalDate.of(2009, 7, 7),
                    562 to LocalDate.of(2009, 9, 18),
                    592 to LocalDate.of(2010, 3, 2),
                    614 to LocalDate.of(2010, 8, 24),
                    634 to LocalDate.of(2011, 1, 31),
                    667 to LocalDate.of(2011, 10, 16),
                    718 to LocalDate.of(2012, 6, 13),
                    742 to LocalDate.of(2012, 11, 19)
                )
                val parsers = dates.map { (revision, date) -> Parser(date, directory, namespaces, false) }
                val factory = XMLInputFactory.newInstance()
                val eventReader = factory.createXMLEventReader(FileReader("${directory}runescape_pages_full.xml"))
                val start = System.currentTimeMillis()
                while (eventReader.hasNext()) {
                    val event = eventReader.nextEvent()
                    parsers.forEach { it.parse(event) }
                }
                parsers.forEach {
                    it.finish()
                }
                println("Took ${TimeUnit.MILLISECONDS.toMinutes(System.currentTimeMillis() - start)} mins")
            } catch (e: FileNotFoundException) {
                e.printStackTrace()
            } catch (e: XMLStreamException) {
                e.printStackTrace()
            }
        }
    }


    How to use

    The dumps are ready to be viewed offline using WikiTaxi or you can extract them and parse for data you want using the standard java xml dom.
    Note pages still use the current rs3 names (e.g Dragon claw rather than claws) however redirects will get you to the right page.

    Example of some of the data you could scrap from these:
    • Item/NPC/Object details
    • Combat bonuses
    • Drop tables
    • Charm drop rates
    • Grand exchange prices
    • Calculations
    • Skill data


    And so much more...

    Downloads

    Fandom

    474 (2007-11-12) [3.0mb, 12981 pages]
    530 (2009-02-09) [6.8mb, 25882 pages]
    550 (2009-07-07) [8.1mb, 31092 pages]
    562 (2009-09-18) [8.6mb, 32568 pages]
    592 (2010-03-02) [10.3mb, 37682 pages]
    614 (2010-08-24) [12.6mb, 45452 pages]
    634 (2011-01-31) [13.7mb, 50354 pages]
    667 (2011-10-16) [16.4mb, 77837 pages]
    718 (2012-06-13) [19.6mb, 115623 pages]
    742 (2012-11-19) [27.8mb, 142785 pages]

    Official Wiki/Weird Gloop

    OSRS (2023-10-08) [49.3mb, 135796 pages]
    RS3 (2023-10-08) [149.1mb, 434507 pages]

    Dates are the day before the update after the revision wanted.

    Images
    Last edited by Greg; 10-08-2023 at 10:10 PM.
    Attached imageAttached image
    Reply With Quote  
     


  2. #2  
    Celestial - Founder
    Classic's Avatar
    Join Date
    Nov 2014
    Posts
    233
    Thanks given
    119
    Thanks received
    47
    Rep Power
    39
    Great, thank you!
    Reply With Quote  
     

  3. #3  
    Extreme Donator

    JayArrowz's Avatar
    Join Date
    Sep 2008
    Posts
    104
    Thanks given
    99
    Thanks received
    107
    Rep Power
    810
    Good release thanks!
    Reply With Quote  
     

  4. #4  
    Member Offline Wiki Dumps [474-742] Market Banned


    Luke132's Avatar
    Join Date
    Dec 2007
    Age
    35
    Posts
    12,574
    Thanks given
    199
    Thanks received
    7,106
    Rep Power
    5000
    Great work cheers chief.

    Attached imageAttached image
    Reply With Quote  
     

  5. #5  
    Extreme Donator


    Join Date
    Apr 2019
    Posts
    332
    Thanks given
    140
    Thanks received
    167
    Rep Power
    1248
    Nice contribution
    Reply With Quote  
     

  6. #6  
    Registered Member
    Tyluur's Avatar
    Join Date
    Jun 2010
    Age
    26
    Posts
    5,103
    Thanks given
    1,818
    Thanks received
    1,767
    Rep Power
    2438
    Another one? amazin
    Quote Originally Posted by blakeman8192 View Post
    Keep trying. Quitting is the only true failure.
    Spoiler for skrrrrr:

    Attached image
    Reply With Quote  
     

  7. #7  
    ⚔️ Battle614 - Superiority ⚔️

    Battle614's Avatar
    Join Date
    Aug 2020
    Posts
    243
    Thanks given
    72
    Thanks received
    472
    Rep Power
    803
    Thanks for this

    Attached image
    Reply With Quote  
     

  8. #8  
    Registered Member
    Join Date
    Nov 2020
    Posts
    16
    Thanks given
    2
    Thanks received
    7
    Rep Power
    11
    Should I use winRar or winzip

    I downloaded Java/JDK.. now I need server and client files???
    Reply With Quote  
     


  9. #9  
    Blurite

    Corey's Avatar
    Join Date
    Feb 2012
    Age
    26
    Posts
    1,491
    Thanks given
    1,245
    Thanks received
    1,729
    Rep Power
    5000
    Quote Originally Posted by Ezvy View Post
    Should I use winRar or winzip

    I downloaded Java/JDK.. now I need server and client files???
    this has nothing to do with running a server and client
    Attached image
    Reply With Quote  
     

  10. #10  
    Registered Member

    Join Date
    Sep 2020
    Posts
    21
    Thanks given
    34
    Thanks received
    27
    Rep Power
    237
    Hey,

    Thanks for the release I was looking for something like this. How would the community feel about having an online web-application to view the data as opposed to having to download it? If people deem it to be beneficial or novel, would anyone be interested in me pursuing it? It also would not be a very hard job I'm guessing although I have never heard of wikitaxi before and what the structure of the data is. I can always write some program to manipulate it however the community or myself desire. I would prefer to have the data in maybe JSON format and a nice UI to display and search through the data.
    Reply With Quote  
     

Page 1 of 3 123 LastLast

Thread Information
Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)


User Tag List

Similar Threads

  1. [RS3] Runescape Wiki Itembox Dump
    By Greg in forum Configuration
    Replies: 8
    Last Post: 01-18-2021, 03:29 PM
  2. Need full unedited runescape 508 cache
    By dabbbb in forum Help
    Replies: 4
    Last Post: 04-28-2010, 04:47 PM
  3. Full 459 Runescape Website
    By Roboyto in forum Show-off
    Replies: 63
    Last Post: 04-21-2010, 08:38 PM
  4. [591]Full Grand Exchange Item Dump
    By Discardedx2 in forum Configuration
    Replies: 10
    Last Post: 02-21-2010, 08:47 PM
  5. Adding the FULL 100% Runescape Fire Cape
    By newservermaker in forum Models
    Replies: 4
    Last Post: 02-05-2009, 06:27 PM
Tags for this Thread

View Tag Cloud

Posting Permissions
  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •