follow us on facebook follow us on twitter email us

Real-Time Crime Forecasting Challenge

About the Challenge
Inviting our nation’s brightest minds to develop algorithms that advance crime forecasting.

Posted By: Office of Justice Programs
Category: Scientific/Engineering
Skill: Algorithms Interest: Science & Research Submission Dates: 12 a.m. ET, Feb 22, 2017 - 11:59 p.m. ET, Feb 28, 2017 Winners Announced: Jul 31, 2017

This challenge seeks to harness the advances in data science to address the challenges of crime and justice. It encourages data scientists across all scientific disciplines to foster innovation in forecasting methods. The goal is to develop algorithms that advance place-based crime forecasting through the use of data from one police jurisdiction.

 

Judging Criteria

Effectiveness of the forecast

NIJ will use the Prediction Accuracy Index to measure the effectiveness of the forecasts. See NIJ.gov for details: http://nij.gov//funding/pages/fy16-crime-forecasting-challenge.aspx#judgingcriteria

Efficiency of the forecast

NIJ will use the Predictive Efficiency Index to measure the efficiency of the forecasts. See NIJ.gov for details: http://nij.gov//funding/pages/fy16-crime-forecasting-challenge.aspx#judgingcriteria

How to Enter

Entries must be submitted through the Office of Justice Programs’ Grants Management System (GMS).

Contestants must submit entries under the appropriate contestant type by selecting the applicable solicitation within GMS. Contestant types types are:

  • Student: Enrolled as a full-time student in high school or as a full-time, degree-seeking student in an undergraduate program (associate or bachelor’s degree).
  • Small Team/Business: A team comprised of 1-20 individuals, or a small business with less than 21 employees. Teams should enter using a team name and provide a list of all individuals on the team. A small business should enter using the name of its business.
  • Large Business: A business with more than 20 employees. A large business should enter using the name of its business.

Note that Student and Small Team/Business contestants may enter the Large Business category, but Large Businesses may not enter the Student or Small Team/Business categories.

See the Real-Time Crime Forecasting Challenge page on NIJ.gov for details on how to enter.

Prizes
Students $200,000.00 40 prizes of $5,000 each.
Small Teams/Businesses $400,000.00 40 prizes of $10,000 each.
Large Business Prize $600,000.00 40 prizes of $15,000 each.

214 Discussions for "Real-Time Crime Forecasting Challenge"

  • Show Replies [+]
    NIJ
    Please see below for our plan to communicate scores of both the challenge winners and all contestants: Regarding the leaderboard, we are pulling together the complete leaderboard. We hope to have it posted by early September. There are a significant number of data points that must be gathered, sorted, and double checked. The leaderboard will include the winners and two runners up for each category, crime type, time period, and score index (PAI and PEI*). Regarding scores for all contestants, we will be contacting every contestants to thank them for their participation and provide them with their scores. We already have made the scoring algorithms available so that contestants can score their own submissions.

  • Show Replies [+]
    PASDA
    Team PASDA (Mike Porter and I) will be posting a paper in the next couple of weeks detailing our approach. For the students who did the competition, I am currently looking to fund PhD students through some money from NSF. So if you are interested in doing a PhD in CS/Math focused on crime and security modeling, let me know! George Mohler gmohler@iupui.edu

    • Reply
      Mohammad
      Can you please post the maximum PAI and PEI* for each category and timeframe? Also, our team (PTL) is mentioned only in the detailed list. Was it missing from the original list by mistake? Can you please post the top x teams in each category? Thank you!

    • Reply
      TJ
      Can you also list the total number of submissions in each of the three categories -- Student, Small business and Large business? Thanks!

        • Reply
          CJ
          Since the total number of participants is rather small, could you please publish the whole leaderboard with everybody's results for every category?

          • Reply
            NIJ
            NIJ will be posting the scores for winners and two runners-up in each category. NIJ will also be posting the shapefiles for each winning submission.

  • Show Replies [+]
    NIJ
    In response to questions regarding how submissions were judged, we have posted the following files containing the Visual Basic macros used to determine eligibility and scores for each submission: (1) The Runner Macro was used to do the initial simple setup and call the other two macros. (2) The Join Macro joined each submission to the actual data for each crime type and period. (3) The Calculations Macro obtained the needed data points from the joined shapefiles created above and inserted them into an excel document. The excel document created by the Calculations Macro was conditionally formatted to indicate if values (e.g., max grid cell size, a, and A) were in compliance with the requirements of the Challenge and to trigger manual review of the shapefiles to examine the cause of the violation.

    • Reply
      NIJ
      We are working hard to finalize the list of prizewinners. We hope to have them posted before the end of July. We apologize for delay and appreciate your patience.

        • Reply
          Take it easy my friend!
          First, no one mentioned "Early June". Second, for a good number of participants, money is just a motive. The real goal is to compare tens or probably hundreds of crime prediction methods out there to one another and to the current methods used by the PPB. We are waiting to see the results of months of hard work and excited to know how we compare to others. Take it easy my friend!

      • Reply
        not impatient, but curious
        One thing that I would be curious to know is how these submissions are being validated. In most ML competitions the rules are coded as software at the beginning of the competition and participants have access to this "rules" code to check their submissions throughout. In this competition, the rules are quite complex and changed several times. For example, the rules regarding the shape and orientation of the grid. What steps are being taken to ensure the rules are correctly translated into code to assess submissions? If one person is validating these results manually in arc gis (and that is what is taking so long), then that could be somewhat concerning.

    • Reply
      D
      Is there a specific date that the NIJ has in mind for announcing the winner? Will the solution be available to view by the public?

  • Show Replies [+]
    Mohammad Al Boni
    As we approach the end of the forecasting window and the NIJ start evaluating submissions, I am wondering if it is possible that the NIJ publish the PAI and PEI* scores for all submissions. This will be very helpful for us to know how we compare against other contestants. Also, is it possible whether or not the NIJ check if any of the submissions were able to predict any major incidents such as the train stabbing last Friday? Thank you!

    • Reply
      NIJ
      Answer in two parts: (1) : NIJ will post the top scores in each contestant category and for each call-for-service category. Along with the winning scores, we will post an as yet determined number of runner-up scores. (2) The intent of the Challenge was not to forecast single (even major) incidents, but rather the aggregate pattern.

  • Show Replies [+]
    NIJ
    The Challenge has closed. Thank you to everyone who took part. On March 6, 2017, we will release any PPB CFS data for 02/27/17-02/28/17 not yet released. On June 7th, 2017, we will release PPB CFS data for 03/01/17-05/31/17. We anticipate announcing all winners on June 30, 2017.​

    • Reply
      Paul Warren
      Hello, Is it possible to get the contact info for someone inside the Portland Police Bureau? I'd like to ask if it's okay to use this data for other purposes, but my three City of Portland TrackIT queries have been ignored and I'm not sure how else to ask permission. Thanks, Paul

  • Jo K.
    Do we need to submit a filled out FCQ (Financial Management and System of Internal Controls Questionnaire) form along with the .zip file with our submission? (Found on the Budget and Program Attachments" page.) Thanks!

  • Show Replies [+]
    Trevor
    A page on the DOJ Grant Management System mentions that "...all applicants are to download, complete, and submit the Financial Management and System of Internal Controls Questionnaire." Is submission of this form required for this Challenge?

    • Reply
      Mohammad Al Boni
      I believe so. Your submission will not be marked as complete until you complete all steps (highlighted by the green color). Upon completion, you should get an email confirming that they received your submission.

    • Reply
      David L
      Can someone provide more detailed instructions for filling this form? Does applicant organizational information have to match the one provided in GMS? Also, are non-business small teams required to fill out "audit information" and "accounting system"?

  • Show Replies [+]
    Mohammad Al Boni
    Hi, I am using postgis to generate my output shapefiles. However, the hotspot field is being generated as a "String" field. I noticed from the sample submission that the type of the hotspot field is "Short". I tried to use ArcMap and copy it to a new "Short" field. However, I am only able to create a field of type "Long". Is it acceptable to submit my shapefiles with hotspot fields being either "String" or "Long"? Thank you!

  • Show Replies [+]
    murray
    Can you clarify how the total forecast area is going to be handled during scoring? To explain: I assume there will be thousands of submissions; the requirements state that the total area must be equal to 147.71 square miles, +/- 0.02 square miles, with all internal cells equally sized. If two applicants submit the same prediction hotspots, with the same internal cell size, then (assuming identical placement of cells) the applicant who artificially inflates the total area by increasing the size of border cells will score slightly higher in the PAI category -- is that the intent of the judging criteria, or will the total area be normalized for the purposes of fairness? Additionally, I haven't seen it mentioned what will happen if there is a tie. In such a case, would the two (or more) applicants split the money?

    • Reply
      NIJ
      In answer to your first question, due to the sensitivity of cell creation techniques and trimming, the variance in total area was required. Submissions will be judged as they are if within the requirements.

    • Reply
      NIJ
      In answer to your second question, if two submissions score identical, we will reach out to verify independent IP was used. If so, the prize is split.

  • Show Replies [+]
    djsherwo
    The minimum cell height is listed as 125 feet. I'm not really sure how to add height to my polygon feature class, but I take this requirement to mean that simply changing the z value of the source will not be enough. Can you explain what is meant by this minimum height requirement? Thank you.

    • Reply
      Mohammad Al Boni
      The NIJ has answered this question in their Q&A page. The question was: " Q-40: As I understand, I am allowed to submit a grid consisting of very stretched rectangles, for example with height and width equal to 10 and 10000, respectively. It is written in the rules that "The Director of NIJ or their designee will make the final award determination. If the Director of NIJ or their designee determines that no entry is deserving of an award, no prizes will be awarded." Do submitted grids have to be reasonable sized to qualify for an award? If so, can NIJ provide guidance on the minimum size of the grids?" The NIJ's response:" A: We have added the requirement that the minimum grid height or width is 125 feet (the minimum of the smaller dimension needs to be 125 feet). Since the total cell area must be equal a minimum of 62,500ft2, two example shapes with minimum size would be: A rectangular cell with the minimum cell height or width of 125ft would have to have a corresponding width or height of at least 500ft. An equilateral triangle cell would have sides of 416.35 feet (the height of the tri​angle would be 357.64). Note: We have revised an earlier requirement that prohibited the rotation of cell to allow now for the rotation of triangular cells only. " ​

    • Reply
      Mohammad Al Boni
      They just responded earlier to the same question: "The existing agreement between NIJ and the Portland Police Bureau with regard to the data only concerns its use for the purposes of the Challenge. Questions regarding other uses of the data should be directed to the Portland Police Bureau."

  • I am creating an account for a small team of 5 people. Table 2 of this URL (https://nij.gov/funding/Pages/fy16-crime-forecasting-challenge-instructions.aspx#applying) does not state what is to be entered in the "Legal Name (legal jurisdiction name)" field of the Grants Management System (GMS) registration page. Table 2 also does not specify what a small team should select in the dropdown for the field "type of applicant." The help desk for the GMS (phone number 1-888-549-9901) informed me they are only able to assist with technical problems and can not provide information on how to appropriately fill the fields. The GMS help desk referred me to 3 people, only one of which I have been able to discuss this with. She was not able to answer the above questions and referred me to this discussion page. If you are not able to answer the above questions, could you please point me in the right direction. I want to make sure the fields are filled out appropriately to ensure eligibility for the contest. Thank you!

    • Reply
      Mohammad Al Boni
      Hi, Thank you for your efforts and for answering our questions! I have two more questions though. After the submission is over on March 1st, 1) will all participants' submitted shapefiles be available online? 2) are we allowed to use the data for research purposes? Thanks again!

      • Reply
        NIJ
        The existing agreement between NIJ and the Portland Police Bureau with regard to the data only concerns its use for the purposes of the Challenge. Questions regarding other uses of the data should be directed to the Portland Police Bureau.

    • Reply
      Warren Koontz
      Yesterday, I sent the following message to the GMS help desk: I am trying to submit an entry for the NIJ Real-Time Crime Forecasting Challenge. I have been instructed to create a GMS user account in order to enter. However, when I click the link to create an account, I get an error message from my browser (Google Chrome). I have tried the following: - Use another browser (Explorer, Edge, and Firefox) - Turn off my firewall - Access GMS from other links on USDOJ web sites - Restart my computer - Use my wife's computer The only way I can access the site is via my iPhone, but that is not very practical for me. Otherwise, I cannot access any site in the grants.ojp.usdoj.gov family. Would appreciate help. Thanks. The GMS help desk could not help me. They gave me a NIJ number to call for help, but there was no answer and the voicemail box was full. Please help me so I can submit my work on time.

  • Show Replies [+]
    pasdasci
    There is the possibility of events falling along a cell boundary. What will be the process for assigning these to one of the cells? https://blogs.esri.com/esri/arcgis/2013/12/17/point-in-polygon-overlay/

    • Reply
      NIJ
      A spatial join is done to aggregate the data. And it decides which cell the data belongs to. It is also extremely unlikely a CFS will be exactly on a boundary as a boundary is point thin.

      • Reply
        George Mohler
        This is not true. Given that the x y coordinates are rounded to a decimal place, it is possible that we may design cells such that a point is on a border. Given the existence of exact repeat crimes, it seems reasonably possible for this to happen. Can you clarify how you will count the crime if it does indeed fall on the border? Thanks

  • NIJ
    In response to concerns from potential applicants and to remove as many barriers to entry as possible, NIJ no longer will require the inclusion of .sbn and .sbx files in the submission. All submissions still must include .dbf, .prj, .shp, and .shx files. If possible, we ask that you include the .sbn and .sbx files but submissions without them will be accepted and scored. All entries still must include a .doc, .docx, or .pdf that includes Team/Business/Student Name, Submission Name (may be the same as above), Point of Contact, Phone Number, and Email. This file must be included in the main folder. This will only be used for any necessary communication regarding the Challenge (e.g., notifying winners). See "Submission Format and Naming" for naming conventions. Contestants entering as a team must submit a "Team Roster" that lists the individual team member names and what percentage of the prize they are eligible for. Percentages must total 100 percent. Find details on NIJ.gov. Team entries that do not submit a Team Roster signed by each member of the team will be disqualified.

  • Patryk
    It is hard to create files .sbn and .sbx. Are they really required? If they really are then could you please recommend any open source software able to create these types of files?

  • Show Replies [+]
    Charles
    Can you give us any info to help us know how severe our missing data problem will be for 2/28? For example, the daily fraction (and number) of calls for service across categories that come in by the time the last extract is likely to be generated since all time-of-day information has been stripped from the posted data?

  • Show Replies [+]
    Mohammad Al Boni
    Hi, In the naming convention section, it is mentioned that burglary should be coded as "Burg". However, in the example, it is mentioned as "BURG" ("Example: TEAMNIJ_BURG_2MO"). Also, in the sample submission for burglary, the 2MO directory has it written as "BURG" while the remaining ones as "Burg". Could you please let us know which abbreviation should we follow? Thank you!

    • Reply
      ConcernedCitizen
      Does this change mean that data files need to be updated? Does the "CATEGORY" field in the ShapeFiles no longer match the challenge definition for categories? As of what date will NIJ stop changing the competition design?

      • Reply
        NIJ
        The shapefiles have always been correct. It was the table on the website that was not updated to reflect that cold cases would only be part of the all calls-for-service category.

    • Reply
      Mohammad Al Boni
      In the Q&A page, specifically Q-53, the NIJ mentioned that "Applicants may use the cold case locations to aide in their forecasts but “Cold” calls will not be included in the scoring calculations for the individual categories." Just to clarify, by the individual categories, you mean the crime types categories, i.e., ACFS, Burg, SC, and TOA or the individual participants category, i.e., student? Also, can we just exclude all CFS of categories not mentioned in Table 1, and build our models only on the ones in Table1? Thank you!

      • Reply
        NIJ
        You can use the information from the cold records to inform any of your models; however, the cold records will only be used in the calculations for all calls-for-service regardless of what type of cold record it is.

        • Reply
          Mohammad Al Boni
          Thank you for your response! I got a bit confused about the cold records. In your response, you mentioned that "the cold records will only be used in the calculations for all calls-for-service". However, in the response to Q-53, the NIJ mentioned that “Cold” calls will not be included in the scoring calculations. Can you please clarify this? Thanks again!

          • Reply
            NIJ
            A cold call for theft of auto will not be used in the TOA calculations; only the ACFS calculations. A cold call on a burglary will not be in the burg calculations; only the ACFS calculations.

      • Reply
        NIJ
        In answer to the final part of your question: Table 1 only lists the call types for Burg, SC, and TOA; all call types/records are used for the ACFS category.

  • Show Replies [+]
    William Herlands
    Under the "Street Crime" category the code "ASSLTT ASSAULT –PRIORITY" is listed. Is this supposed to be "ASSLT**P** ASSAULT –PRIORITY" with a "P" instead of a "T"? In the data I have found crimes with code "ASSLTP" but not with code "ASSLTT."

    • Reply
      NIJ
      You are correct, the code for "ASSAULT -PRIORITY" should be "ASSLTP." We have updated the Challenge page on NIJ.gov. Thank you for bringing this to our attention.

  • Show Replies [+]
    Dan Garant
    Since not all forecast cells have the same area (thinking particularly about trimmed border cells), then following the definition in the original paper, PEI is not equal to n/n*. Using the original formulation, PEI is equal to: [ (n/N) / (a/A) ] / [ (n*/N) / (a*/A) ], where a* is the area of the post-hoc, "ground truth" hotspots. Could you clarify whether we will be judged according to the (n/n*) definition or according to the definition involving a*?

    • Reply
      Submissions will be judged by PEI* according to the (n/n*) definition as stated in the Challenge. The PEI* is based on measuring n*. Where n* is equal to the maximum number of CFS that could be forecast for that amount of area. The automated program to calculate PEI* does the following. It first calculates the "density" of crimes per unit (including trimmed cells) by dividing the number of CFS by the area. It then sorts the cells in descending order by density. It then takes cells in order of density and sums the area and the CFS. It does this until the area = a, or such that the next cell will be greater than a. In the latter case it calculates what fraction of that cell could be used (and the corresponding fraction of CFS to account for in that cell). This does make a minor assumption that the CFS in that final cell are evenly distributed. This is likely to make an extremely small error in the grand scale of the measure; however, this is the best approximation while allowing for trimming.

  • Show Replies [+]
    NIJ
    CORRECTION. The "hotspot" variable should be named "hotspot," not "hotspots" as stated incorrectly below.
    CLARIFICATION TO SUBMISSION REQUIREMENTS: In "Table 2: Requirements for Entries," NIJ has clarified that the required variables “hotspot” and “area” must be named “hotspots” and “area” respectively in your submission.

    • Reply
      Dan Garant
      It appears that ArcGIS online (which is what Esri is providing to challenge contestants) can only handle shapefiles with up to 1000 elements. For solutions with many thousands of cells, this is a problem. I had originally intended on using ArcGIS to generate these files, since open-source software such as RGDAL does not support this format. Given that ".sbn" and ".sbx" files are optional for ArcGIS, would NIJ consider lifting the requirement for these files to be in the submission?

  • Show Replies [+]
    Amber Keller
    Hello! Can you help me understand why the only deliverables are shapefiles? What about the methodology? How you not able to use esri's hot-spot analysis tool and call it a day. Also, we are "using" machine learning algorithms to predict crimes not "developing" or "creating " algorithms correct?

    • Reply
      NIJ
      As to why the entry is the shapefile and not the methodology: Through this Challenge NIJ seeks to gain a better understanding of the potential for crime forecasting in America. The Challenge offers the opportunity for a comprehensive comparative analysis between current "off-the-shelf" forecasting products and innovative forecasting methods to inform NIJ's research investments in this area. Requiring contestants to submit and disclose their intellectual property might discourage some contestants from submitting entries. NIJ may, in fact, contact selected contestants at a later date concerning their methodology. NIJ may, in fact, contact selected contestants at a later date concerning their methodology.

    • Reply
      NIJ
      As to why you are unable to simply use esri's analysis tools: Contestants may not simply submit an entry that uses esri’s analysis tool because by entering the Challenge, each contestant warrants that (a) he or she is the author and/or authorized owner of the entry; (b) that the entry is wholly original with the contestant (or is an improved version of an existing solution that the contestant is legally authorized to enter in the Challenge); (c) that the submitted entry does not infringe any copyright, patent, or any other rights of any third party; and (d) that the contestant has the legal authority to assign and transfer to NIJ all necessary rights and interest (past, present, and future) under copyright and other intellectual property law, for all material included in the Challenge proposal that may be held by the contestant and/or the legal holder of those rights. Each contestant agrees to hold the Released Parties harmless for any infringement of copyright, trademark, patent, and/or other real or intellectual property right, which may be caused, directly or indirectly, in whole or in part, from contestant’s participation in the Challenge. However, if you are using a unique machine learning algorithm that you or your team or business has the intellectual property rights to, that is acceptable.

      • Reply
        Thank you so much for the clarification! Good to know there are checks in place. I plan on using hot spot analysis as a helper for grid size and using open source machine learning algorithms from python APIs

    • Reply
      NIJ
      Regarding whether contestants are "using" or "developing" algorithms: If a contestant holds the intellectual property rights to an existing algorithm, the may use that algorithm. Otherwise, the contestant would have to first develop then use its own algorithm.

    • Reply
      NIJ
      The URL for the Real-time Crime Forecasting Challenge has not changed. If you can provide more details about what you are seeing, we can work to correct or clarify any issues.

  • Show Replies [+]
    N Tamer
    "Requirements for Entries" include .sbn and .sbx files. I have been unable to find software capable of creating these two file types other than ESRI ArcGIS, which is proprietary. I was hoping to use open source software, such as QGIS, which uses the .qix format for attribute and spatial indexes. Since it is easy to generate the indexes, would a submission without .sbn and .sbx files really be rejected?

    • Reply
      Vlad Morozov
      Agree. Forcing use of the proprietary/commercial software contradicts to the idea of openness that is claimed by the competition organaziers

    • Reply
      NIJ
      For submission to the Challenge, data developed by a contestant’s algorithm must be formatted using data files that are used in ArcGIS. NIJ did that for purposes of compatibility with the mapping software used by the majority of the law enforcement agencies in the United States. We estimate that roughly 95 percent of large law enforcement agencies (100 or more sworn officers) and 60 percent of small agencies conduct in-house crime mapping. Of those agencies, roughly 85 percent use ArcGIS. ESRI has agreed to provide fully functional evaluation ArcGIS software licenses to Challenge contestants for the duration of the Challenge. We expect those licenses to be available on or before January 18. We do not believe that that requirement will in anyway hinder the development of the forecasting algorithms themselves.

      • Reply
        Vlad Morozov
        ArcGIS can read shapefiles without .sbn and .sbx files. As wikipedia states "The .sbn file is not strictly necessary, since the .shp file contains all of the information necessary to successfully parse the spatial data". The requirements for .sbn and .sbx obviously create difficulties . Asking for NEW approaches implies that these approaches will be developed outside ArcGIS. Then participants are forced to load their predictions into ArcGIS just for saving in the proprietary format. There are open source indexing solutions for shapefiles. It took few seconds on a very modest computer to find overlap with a sample submission shapefile and the crime data shapefiles using open source libspatialindex

        • Reply
          NIJ
          It is correct that you do not need a .sbn or .sbx file in order for a product like Esri ArcMap to read and display a shapefile. However, these files are necessary to successfully parse the spatial data. This spatial data is necessary for scripts to be written that will automate the spatial join of the submitted shapefile with the the pointfile of the actual calls-for-service. Additionally, these files speed up rendering time and allow for spatial indexing. These two functionalities are also necessary for a script to work that will automate the measurement of the PAI and the PEI*.

  • The Steger foundation looks good. I think I want to put my effort into the group Mark Hertsgaard in forming because I especially like his point-of-view on climate change found in his book, &#;2&08Hot2#8221;.

  • NIJ
    In response to an earlier query, we had stated that cells "may not be stretched, rotated, or reflected." We have amended that statement to allow now for the rotation of triangular cells only. ​

  • Show Replies [+]
    Souf
    For teams, does each individual get a submission, and the team wins if any of the submissions is the best for a given prize? This would seem to dramatically favor larger teams, since they simply get more entries...

  • NIJ
    NIJ has posted clarification on the naming convention and structure of submission files. Please see https://nij.gov/funding/Pages/fy16-crime-forecasting-challenge.aspx#format

  • Show Replies [+]
    Jesse Bacon
    Are you sure that the sample data set is correct and adequate for machine learning purposes? I have concern about my state's crime data for the past five years. It may be incorrect, and if so, it should not be used for machine learning or clustered operations. Theoretically, this is problematic. https://www.facebook.com/photo.php?fbid=10207709636044916&set=a.1034436747178.2006283.1413230134&type=3&theater

    • Reply
      NIJ
      We cannot comment on the correctness or adequacy of the crime data provided by the prospective applicant’s state. That data set is correct and adequate for the purposes of this Challenge. The data presented is the calls for service received by the Portland Police Bureau as described in the Challenge. Contestants submissions will be judged against the calls for service that the Portland Police Bureau actually receives during the relevant period(s) specified in the Challenge.

  • Show Replies [+]
    Vlad Morozov
    Why a solution should be a shapefile with cells covering the whole Portland Police area? Obviously only hot-spot cells are used in the score calculation. I have found cumbersome to calculate overlap with the provided Portland Police District shapefile. For example QGIS complains about wrong geometry. Can we submit a shapefile with only hot-spot cells.

    • Reply
      NIJ
      In order to calculate the PEI*, we need the full set of grids. PEI* does a post hoc analysis of the grids to find the cells with the highest actual CFS.

    • Reply
      Vlad Morozov
      How about to simplify the submission by asking to submit just a grid? Then you will calculate overlap with the police district area. In this way you will be sure that grid trimming was done right.

    • Reply
      NIJ
      A: The student category is open only to individuals. A student team should enter as a Small Team/Business. If your team represents an educational institution, the institution can enter as a Large Business contestant (assuming it has more than 21 employees).

  • Show Replies [+]
    David
    In table 2, what does "Total forecasted area" = .25-.75 sq. miles, mean and how does it relate to the footnote which says that "the total area of all cells equals 147.71 square miles"? I take it to mean that all of the police districts areas is equal to 147.71 square miles. However, we are not supposed to forecast all of that, we are only supposed to forecast .25-.75 sq. miles. If this is correct, which .25-.75 miles are supposed to be forecasted? I also thought that all of the cells of the 147.71 square miles have a binary hot spot attribute which is the forecast. If that's the case, then we are forecasting 147.71 sq. miles not .25-.75 sq. miles as specified. If this is an incorrect understanding, how is it to be understood?

    • Reply
      Mohammad Al Boni
      I believe they mean the following: - "the total area of all cells equals 147.71 square miles": This is the total area of all cells that you should submit in your shapefile. - "Total forecasted area = .25-.75 sq. miles": The total area of all cells with hotspot = 1 should be within .25 and .75 sq. miles. In another words, you need to forecast for a total area of 147.71 square miles. However, you should assign 1 to cells which total area sum up to a value between .25 and .75 sq. miles. As for the remaining cells, they must have a value of 0 for the hotspot variable.

  • Show Replies [+]
    David
    How is the "Unique ID for each cell" mapped to a location on the map? From the requirements for the entry file, all (as far as I can tell) that need be in the entry file are the id, a binary variable and an area. Using this information how does the id reference a particular cell?

    • Reply
      Mohammad Al Boni
      You don't need the ID to know the location. By definition, a shapefile consists of a set of polygons defining the geometry of an area. I believe this is the reason why they did not include the polygons' geometry as a required variable. Please kindly refer to (http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf) for a detailed description of shapefiles and their format.

      • Reply
        David
        Thanks. The specifications threw me off but it sounds like you are correct that the set of polygons is a given. This was brought up in other questions, but it would make life a little easier if there was a sample submission available...

  • Show Replies [+]
    Mohammad Al Boni
    Hi, can you please provide us with a tool to test the validity of our submissions according to Table 2? It would be great if the tool can validate not only the require variables but also the Individual cell area, the Total forecasted area, and the total area of all cells. Thank you!

    • Reply
      George Mohler
      Such a tool would be useful for development, but it is also important for transparency in the judging process (for example what if the judge's software has a bug, this will catch it). Would have loved to see this contest utilizing Kaggle's platform, is it too late?

      • Reply
        Mohammad Al Boni
        The tool does not necessarily need to evaluate a submission, i.e., compute the PAI and the PEI*. It would be used to make sure that a submission meets the requirements so that we do not waste time developing and building models, and just simply because for example, the total forecasted area is larger than 0.75 mile squares, the submission would be rejected.

  • Happy to announce that, to support the NIJ Challenge, we have repackaged the underlying data in open formats and published on GitHub at https://github.com/mtna/opendata-us-nij-rtcfc. This includes links to a Google BigQuery public version and other options. This is an initial release. Suggestions/feedback to improve appreciated.

  • Show Replies [+]
    Is it correct that all census tract entries are all in the Multnomah County and therefore, to expand to a full US Census tract code, should be prefixed with "41051" (41=Oregon and 051=Multnomah) as well as converted to a zero padded six digit number. So the CFS dataset value "100" should be "41051000100". See for example http://www2.census.gov/geo/maps/dc10map/tract/st41_or/c41051_multnomah/ and http://www.census.gov/geo/maps-data/maps/2010ref/st41_tract.html

      • Reply
        NIJ
        The submissions will be judged against how well they forecast actual calls for service in Portland for the crime categories and time periods specified in the Challenge. A few minor parts of the Portland Police Bureau's (PPB's) jurisdiction fall outside of Multnomah County, but within the Multnomah County census track 41051000100. Also, the PPB does not have jurisdiction in all of the county, or of the entirety of the census tract. Contestants are free to consider factors outside Multnomah County if they believe that those factors may increase the accuracy of their submissions. It is up to the contestant to best decide how to join any additional data they would like to use.

  • Show Replies [+]
    Steve Warford
    A most basic question (I need to get up the learning curve to evaluate my interest). What software is readily available to work with the various file types used by Portland PD for the challenge data set?

    • Reply
      NIJ
      The data is provided as a "shapefile," which consists of collection of files using a common filename prefix stored in the same directory. An internet search of "shapefile" will provide greater details on the format and applications that can open shapefiles.

    • Reply
      NIJ
      ESRI has agreed to provide fully functional evaluation ArcGIS software licenses to Challenge contestants for the duration of the Challenge. We expect those licenses to be available on or before January 18.

  • Show Replies [+]
    David Skillicorn
    I can't participate, as I'm not American, but I've put some figures at http://research.cs.queensu.ca/home/skill/publicfigs.zip. There is a global clustering, which shows that there are some strange artifacts correlating occurrence date and census tract (including some very large census tract numbers that may be coding for something else). There is also a overlay of attributes by position in Portland. Nothing startling jumps out of either set of figures, but it shows how it's possible to get an overview of a city's policing even from relatively simple RMS data. With more detailed data, even more can be seen (happy to share how with law enforcement and intelligence folks who may be interested).

  • Boris
    What a wonderful way to potentially lock up more people for free prison labor corporations. Portland is such a hotbed of criminal activity. I don't even need the shape files to be sure of what they submitted. Way to go Office of Justice. Target, Aquire. Lockup. Yes another wonderful government targeting program. President Trump will personally hand out the award money. I cant wait.

  • Show Replies [+]
    John Hall
    The PEI has the following in its definition: Where n* equals the maximum obtainable n for the amount of area forecasted, a. So what grid/schema are the scorers using to determine the maximum for a forecasted area? The calls themselves are points and have no "area". Are you going to use irregular polygons within the competition rules to determine this (62,500 ft2 – 360,000 ft2)? If you use a grid, you will not likely to determine the maximum n obtainable given the total area of anybody's forecast.

  • Show Replies [+]
    John Hall
    The airport is not included in the shapefile for Portland Police District, yet it receives many calls for service. It also wasn't figured in (as far as I can tell) for your total study area. So, are we to assume that this analysis should be limited to ONLY area contained w/in a Portland Police District?

    • Reply
      NIJ
      Analysis should be limited to those areas included in the PPB shapefile provided. Some provided CFS may fall outside of that region; however, cell schema’s should only cover the area of the PPB jurisdiction.

    • Reply
      NIJ
      Answers: 1. The only distortion to the data is moving the CFS a few feet from the building footprint to the street segment directly in front of the address. If other distortions are noticed ensure all point and shapefiles are in the correct projected coordinate system. 2. Census tract refers to what federal US Census tract the CFS occurred in.

    • Reply
      NIJ
      The Challenge is open to individuals and teams of individuals who residents of the 50 United States, the District of Columbia, Puerto Rico, the U.S. Virgin Islands, Guam, and American Samoa who are at least 13 years old at the time of entry. The Challenge also is open to corporations or other legal entities (e.g., partnerships or nonprofit organizations) that are domiciled in 50 United States, the District of Columbia, Puerto Rico, the U.S. Virgin Islands, Guam, and American Samoa.

    • Reply
      NIJ
      The dates for predictions in each category are: One week: March 1 - 7, two weeks: March 1-14 One month: March 1-31, Two months: March 1-April 30, Three months: March 1-May 31

    • Reply
      NIJ
      The PAI was originally proposed to the field by Chainey, Tompson, and Uhlig (2008). They sought to define a measure for testing forecasting accuracy. They said it measures "the hit rate against the areas where crimes are predicted to occur with respect to the size of the study area;" the actual equation is given with the Challenge. The criminology field (specifically those working on forecasting and prediction) have principally relied on this measure since its inception. The PEI* (and PEI) were proposed by Hunt (2016) as a complimentary measure to the PAI. He sought to define a measure for testing forecasting efficiency. It is meant to measure how well a forecast does compared to how well it could have done (post hoc). It was only recently introduced, however it is the only known plausible alternative at this time to compliment the PAI. The equation is provided in the Challenge for how it is measured. References: Chainey, S., Thompson, L., & Uhligh, S. (2008). The Utility of Hotspot Mapping for Predicting Spatial Patterns of Crime. Security(21), 4-28. Hunt, J. (2016). Do Crime Hot Spots Move? Exploring the Effects of the Modifiable Areal Unit Problem and Modifiable Temporal Unit Problem on Crime Hot Spot Stability. Archived with ProQuest Dissertations & Theses.

  • Show Replies [+]
    FH
    I've tried to sign up, but when I get to the Grant Management System Home and search under the National Institute of Justice, I only see 4 funding opportunities: Office of Research and Evaluation Continuations Non-solicited Applications Office of Science and Technology Continuations Office of Investigative and Forensic Sciences Continuations Any idea where I've gone wrong in the process?

    • Reply
      NIJ
      You are not seeing the Challenge listed in the Grants Management System because we are not yet accepting applications. While you may download data and begin work on your submission now, you will not be able to submit until February 22, 2017, 12:00 a.m. ET. The submission period closes February 28, 2017, 11:59 p.m. ET. We do encourage you to register with the OJP Grants Management System in advance if you do not already have an account.

  • Show Replies [+]
    bt
    It seems to me like weather is an important factor in criminal activity. When its hot (but not too hot) crime goes up. Probably other factors like rainy days and you have increased break-ins because it covers the noises or something like that. If I understand the judging guidelines, you will have to predict the level of criminal activity out 6 weeks without the benefit of knowing the weather for a given day in the future? It seems to me like you could increase the accuracy of the results by allowing for weather forecast for the next 24 hours as an input into the next days forecast.

    • Reply
      NIJ
      From the Challenge:
      "Contestants should be aware that other entities may make other data available through free or fee-based services (e.g., cloud and data sharing sites) that may or may not also be useful in developing their algorithms. Contestants are permitted, but not required, to use any other data sets or services."
      "...other data sets..." could include weather data.

    • Reply
      NIJ
      From NIJ: The term hot spot is an arbitrary term. The simplest definition of hot spots is, “the cells that have the highest number of calls for service compared to the other cells.” There is no single threshold for being a hot spot. The threshold will change depending on the definition of time and place. Additionally, the number of cells forecasted to be hot spots will vary depending on the size of the areal size of the cell, and because of the restriction put on the range of the total forecasted area. The “product” that is to be submitted for the competition is a shapefile, using cells, covering the entire Portland Police Bureau. Within that shapefile contestants should include a binary variable “hotspot” where the contestant indicates if that cell is forecasted to be a hot spot. Contestants need to follow the requirements outlined in Table 2. Specifically that the area of each cell is between 62,500 – 360,000 sq.ft. (except those cells that need to be trimmed due to boundaries) and that the total forecasted area be between 0.25 – 0.75 sq.mi.

      • Reply
        Milosz
        NIJ, you write that "the total forecasted area be between 0.25 – 0.75 sq.mi.", yet the instructions state that "the total area of all cells equals 147.71 square miles (+/-0.02 square miles)". 147.71 square miles is the total area of the Portland Police Districts shapefile. Isn't the Portland Police Districts the total forecasted area? And if not, what defines the total forecasted area? Is it the contestants' choice of any 0.25-0.75 sq. mile area, comprised of cells between 62,500-360,000 sq. feet, within the larger 147.71 square miles of the Portland Police Districts? Thanks!

        • Reply
          Echizzle
          +1, I would like to know the answer to the question that Milosz asked. It would also be super helpful if sample submission files were provided by the NIJ, this might help to clear up some of the confusion regarding the entry requirements.

      • Reply
        Robert
        If we are only submitting hot spots, but the evaluation criteria require number of crimes, shouldn't we actually be predicting crimes per cell? This could possibly be in addition to specifying whether a cell is a hot spot according to some threshold, but I'm not clear on why hot spots are included in the submission criteria but crimes per cell are not. What are you really requesting we submit?

          • Reply
            NIJ
            Crime forecasts can be submitted for each crime category for periods of one week, two weeks, one month, two months, and three months. The dates for predictions in each category are: One week: March 1 - 7 Two weeks: March 1-14 One month: March 1-31 Two months: March 1-April 30 Three months: March 1-May 31

        • Reply
          NIJ
          Contestants are forecasting where calls for service are likely to cluster (form hotspots). In order to test the effectiveness and efficiency of these forecasts NIJ will test the number of CFS that these cells forecast by overlaying the forecast over the actual CFS.

          • Reply
            Robert
            I understand the dates we are predicting for. I just want to clarify so I don't spend time modeling the wrong aggregation of the data: for example, if I am submitting for the One Week: March 1-7 category, should the submission simply include the number of CFS for that entire week without differentiating between the days? Or, should I submit a prediction for the CFS in each cell for March 1st, a prediction for the CFS in each cell for March 2nd, a prediction for the CFS in each cell for March 3rd, etc.? i.e. am I predicting for 7 individual days, or for the week as a whole?

Add to the Discussion

Solutions
No solutions have been posted for this challenge yet.
Rules

See the Real-Time Crime Forecasting Challenge on NIJ.gov for rules.

Submit Solution
Submissions for this competition are being accepted on a third-party site. Please visit the external site for instructions on submitting: http://nij.gov/funding/Pages/fy16-crime-forecasting-challenge.aspx
Challenge Followers
Public Profile: 3
Private Profile: 34